GNU bug report logs - #51954
29.0.50; puny-encode doesn't normalize

Previous Next

Package: emacs;

Reported by: Lars Ingebrigtsen <larsi <at> gnus.org>

Date: Thu, 18 Nov 2021 17:08:01 UTC

Severity: normal

Found in version 29.0.50

Fixed in version 29.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 51954 in the body.
You can then email your comments to 51954 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#51954; Package emacs. (Thu, 18 Nov 2021 17:08:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Lars Ingebrigtsen <larsi <at> gnus.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 18 Nov 2021 17:08:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: bug-gnu-emacs <at> gnu.org
Subject: 29.0.50; puny-encode doesn't normalize
Date: Thu, 18 Nov 2021 18:06:47 +0100
I'm reading

https://www.unicode.org/reports/tr36/

which says that IDNA should normalise the strings before encoding (and
lowercase, too?)  This seems to agree:

https://en.wikipedia.org/wiki/Punycode

But:

(puny-encode-string "Bä.com")
=> "xn--Ba.com-xyd"

(puny-encode-string (ucs-normalize-NFKC-string "Bä.com"))
=> "xn--B.com-gra"

So I think puny-encode-string should do that first, if I'm reading TR36
right.


In GNU Emacs 29.0.50 (build 17, x86_64-pc-linux-gnu, GTK+ Version 3.24.30, cairo version 1.16.0)
 of 2021-11-18 built on xo
Repository revision: 7a1e5ac8b29b731e89cc9d5b498e31bd90840b9b
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Debian GNU/Linux bookworm/sid

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2 M17N_FLT MODULES NOTIFY
INOTIFY PDUMPER PNG RSVG SECCOMP SOUND THREADS TIFF TOOLKIT_SCROLL_BARS
X11 XDBE XIM XPM GTK3 ZLIB

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51954; Package emacs. (Thu, 18 Nov 2021 18:41:01 GMT) Full text and rfc822 format available.

Message #8 received at 51954 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 51954 <at> debbugs.gnu.org
Subject: Re: bug#51954: 29.0.50; puny-encode doesn't normalize
Date: Thu, 18 Nov 2021 20:40:46 +0200
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Date: Thu, 18 Nov 2021 18:06:47 +0100
> 
> I'm reading
> 
> https://www.unicode.org/reports/tr36/
> 
> which says that IDNA should normalise the strings before encoding (and
> lowercase, too?)

Yes.  See also http://www.unicode.org/reports/tr46/.

> (puny-encode-string (ucs-normalize-NFKC-string "Bä.com"))
> => "xn--B.com-gra"

NFKC or NFC?

> So I think puny-encode-string should do that first, if I'm reading TR36
> right.

Agreed.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51954; Package emacs. (Fri, 19 Nov 2021 06:46:02 GMT) Full text and rfc822 format available.

Message #11 received at 51954 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 51954 <at> debbugs.gnu.org
Subject: Re: bug#51954: 29.0.50; puny-encode doesn't normalize
Date: Fri, 19 Nov 2021 07:45:36 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

>> (puny-encode-string (ucs-normalize-NFKC-string "Bä.com"))
>> => "xn--B.com-gra"
>
> NFKC or NFC?

NFC.  I've now expanded on the doc strings of these functions, removed
the ;;;###autoloads since they're not actually used, and added two new
string-glyph-* functions (pointing to the NFC functions) for greater
discoverability.

>> So I think puny-encode-string should do that first, if I'm reading TR36
>> right.
>
> Agreed.

Now done.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




bug marked as fixed in version 29.1, send any further explanations to 51954 <at> debbugs.gnu.org and Lars Ingebrigtsen <larsi <at> gnus.org> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Fri, 19 Nov 2021 06:47:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51954; Package emacs. (Fri, 19 Nov 2021 07:45:02 GMT) Full text and rfc822 format available.

Message #16 received at 51954 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 51954 <at> debbugs.gnu.org
Subject: Re: bug#51954: 29.0.50; puny-encode doesn't normalize
Date: Fri, 19 Nov 2021 09:44:33 +0200
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: 51954 <at> debbugs.gnu.org
> Date: Fri, 19 Nov 2021 07:45:36 +0100
> 
> NFC.  I've now expanded on the doc strings of these functions, removed
> the ;;;###autoloads since they're not actually used

Isn't ucs-normalize used for accessing files on macOS?  Their
file-coding-system uses normalization.

In any case, I wouldn't remove the autoloads: they are harmless, but
removing them could cause breakage.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51954; Package emacs. (Fri, 19 Nov 2021 07:51:01 GMT) Full text and rfc822 format available.

Message #19 received at 51954 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 51954 <at> debbugs.gnu.org
Subject: Re: bug#51954: 29.0.50; puny-encode doesn't normalize
Date: Fri, 19 Nov 2021 08:50:31 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

> Isn't ucs-normalize used for accessing files on macOS?  Their
> file-coding-system uses normalization.

I grepped through the code base but couldn't find any usage of those
functions.  (But on Macos we preload ucs-normalize.)

> In any case, I wouldn't remove the autoloads: they are harmless, but
> removing them could cause breakage.

I found it confusing to have all these unused functions autoloaded, but
if there's actually any usage out there, I hope people will complain,
and we can put them back in.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 17 Dec 2021 12:24:13 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 92 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.