GNU bug report logs -
#51954
29.0.50; puny-encode doesn't normalize
Previous Next
Reported by: Lars Ingebrigtsen <larsi <at> gnus.org>
Date: Thu, 18 Nov 2021 17:08:01 UTC
Severity: normal
Found in version 29.0.50
Fixed in version 29.1
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 51954 in the body.
You can then email your comments to 51954 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#51954
; Package
emacs
.
(Thu, 18 Nov 2021 17:08:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Lars Ingebrigtsen <larsi <at> gnus.org>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Thu, 18 Nov 2021 17:08:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
I'm reading
https://www.unicode.org/reports/tr36/
which says that IDNA should normalise the strings before encoding (and
lowercase, too?) This seems to agree:
https://en.wikipedia.org/wiki/Punycode
But:
(puny-encode-string "Bä.com")
=> "xn--Ba.com-xyd"
(puny-encode-string (ucs-normalize-NFKC-string "Bä.com"))
=> "xn--B.com-gra"
So I think puny-encode-string should do that first, if I'm reading TR36
right.
In GNU Emacs 29.0.50 (build 17, x86_64-pc-linux-gnu, GTK+ Version 3.24.30, cairo version 1.16.0)
of 2021-11-18 built on xo
Repository revision: 7a1e5ac8b29b731e89cc9d5b498e31bd90840b9b
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Debian GNU/Linux bookworm/sid
Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2 M17N_FLT MODULES NOTIFY
INOTIFY PDUMPER PNG RSVG SECCOMP SOUND THREADS TIFF TOOLKIT_SCROLL_BARS
X11 XDBE XIM XPM GTK3 ZLIB
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#51954
; Package
emacs
.
(Thu, 18 Nov 2021 18:41:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 51954 <at> debbugs.gnu.org (full text, mbox):
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Date: Thu, 18 Nov 2021 18:06:47 +0100
>
> I'm reading
>
> https://www.unicode.org/reports/tr36/
>
> which says that IDNA should normalise the strings before encoding (and
> lowercase, too?)
Yes. See also http://www.unicode.org/reports/tr46/.
> (puny-encode-string (ucs-normalize-NFKC-string "Bä.com"))
> => "xn--B.com-gra"
NFKC or NFC?
> So I think puny-encode-string should do that first, if I'm reading TR36
> right.
Agreed.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#51954
; Package
emacs
.
(Fri, 19 Nov 2021 06:46:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 51954 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> (puny-encode-string (ucs-normalize-NFKC-string "Bä.com"))
>> => "xn--B.com-gra"
>
> NFKC or NFC?
NFC. I've now expanded on the doc strings of these functions, removed
the ;;;###autoloads since they're not actually used, and added two new
string-glyph-* functions (pointing to the NFC functions) for greater
discoverability.
>> So I think puny-encode-string should do that first, if I'm reading TR36
>> right.
>
> Agreed.
Now done.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
bug marked as fixed in version 29.1, send any further explanations to
51954 <at> debbugs.gnu.org and Lars Ingebrigtsen <larsi <at> gnus.org>
Request was from
Lars Ingebrigtsen <larsi <at> gnus.org>
to
control <at> debbugs.gnu.org
.
(Fri, 19 Nov 2021 06:47:01 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#51954
; Package
emacs
.
(Fri, 19 Nov 2021 07:45:02 GMT)
Full text and
rfc822 format available.
Message #16 received at 51954 <at> debbugs.gnu.org (full text, mbox):
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: 51954 <at> debbugs.gnu.org
> Date: Fri, 19 Nov 2021 07:45:36 +0100
>
> NFC. I've now expanded on the doc strings of these functions, removed
> the ;;;###autoloads since they're not actually used
Isn't ucs-normalize used for accessing files on macOS? Their
file-coding-system uses normalization.
In any case, I wouldn't remove the autoloads: they are harmless, but
removing them could cause breakage.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#51954
; Package
emacs
.
(Fri, 19 Nov 2021 07:51:01 GMT)
Full text and
rfc822 format available.
Message #19 received at 51954 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
> Isn't ucs-normalize used for accessing files on macOS? Their
> file-coding-system uses normalization.
I grepped through the code base but couldn't find any usage of those
functions. (But on Macos we preload ucs-normalize.)
> In any case, I wouldn't remove the autoloads: they are harmless, but
> removing them could cause breakage.
I found it confusing to have all these unused functions autoloaded, but
if there's actually any usage out there, I hope people will complain,
and we can put them back in.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 17 Dec 2021 12:24:13 GMT)
Full text and
rfc822 format available.
This bug report was last modified 2 years and 92 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.