GNU bug report logs - #52263
Stale comment in xsd-regexp.el about Emacs not supporting Unicode

Previous Next

Package: emacs;

Reported by: Stefan Kangas <stefan <at> marxist.se>

Date: Fri, 3 Dec 2021 18:38:01 UTC

Severity: minor

To reply to this bug, email your comments to 52263 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#52263; Package emacs. (Fri, 03 Dec 2021 18:38:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Stefan Kangas <stefan <at> marxist.se>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Fri, 03 Dec 2021 18:38:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: bug-gnu-emacs <at> gnu.org
Subject: Stale comment in xsd-regexp.el about Emacs not supporting Unicode
Date: Fri, 3 Dec 2021 10:37:10 -0800
Severity: minor

I believe this comment in lisp/nxml/xsd-regexp.el can be removed as
Emacs supports Unicode now:

    ;; The semantics of XSD regexps are defined in terms of Unicode.
    ;; Non-Unicode characters are not allowed in regular expressions and
    ;; will not match against the generated regular expressions.  A
    ;; Unicode character means a character in one of the Mule charsets
    ;; ascii, latin-iso8859-1, mule-unicode-0100-24ff,
    ;; mule-unicode-2500-33ff, mule-unicode-e000-ffff, eight-bit-control
    ;; or a character translatable to such a character (i.e a character
    ;; for which `encode-char' will return non-nil).
    ;;
    ;; Unfortunately, this means that this package is currently useless
    ;; for CJK characters, since there's no mule-unicode charset for the
    ;; CJK ranges of Unicode.  We should devise a workaround for this
    ;; until the fabled Unicode version of Emacs makes an appearance.

Is that correct?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#52263; Package emacs. (Fri, 03 Dec 2021 19:28:01 GMT) Full text and rfc822 format available.

Message #8 received at 52263 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Kangas <stefan <at> marxist.se>
Cc: 52263 <at> debbugs.gnu.org
Subject: Re: bug#52263: Stale comment in xsd-regexp.el about Emacs not
 supporting Unicode
Date: Fri, 03 Dec 2021 21:27:11 +0200
> From: Stefan Kangas <stefan <at> marxist.se>
> Date: Fri, 3 Dec 2021 10:37:10 -0800
> 
> I believe this comment in lisp/nxml/xsd-regexp.el can be removed as
> Emacs supports Unicode now:
> 
>     ;; The semantics of XSD regexps are defined in terms of Unicode.
>     ;; Non-Unicode characters are not allowed in regular expressions and
>     ;; will not match against the generated regular expressions.  A
>     ;; Unicode character means a character in one of the Mule charsets
>     ;; ascii, latin-iso8859-1, mule-unicode-0100-24ff,
>     ;; mule-unicode-2500-33ff, mule-unicode-e000-ffff, eight-bit-control
>     ;; or a character translatable to such a character (i.e a character
>     ;; for which `encode-char' will return non-nil).
>     ;;
>     ;; Unfortunately, this means that this package is currently useless
>     ;; for CJK characters, since there's no mule-unicode charset for the
>     ;; CJK ranges of Unicode.  We should devise a workaround for this
>     ;; until the fabled Unicode version of Emacs makes an appearance.
> 
> Is that correct?

Probably.  The mule-Unicode-* stuff is definitely obsolete.  The only
thing that bothers me is what happens with eight-bit characters in the
XSD regexps -- are they allowed?  Emacs in general does allow them.
If xsd-regexp.el doesn't, that should be stated there.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#52263; Package emacs. (Sat, 04 Dec 2021 13:08:01 GMT) Full text and rfc822 format available.

Message #11 received at 52263 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 52263 <at> debbugs.gnu.org
Subject: Re: bug#52263: Stale comment in xsd-regexp.el about Emacs not
 supporting Unicode
Date: Sat, 4 Dec 2021 14:07:46 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

>> I believe this comment in lisp/nxml/xsd-regexp.el can be removed as
>> Emacs supports Unicode now:
>>
>>     ;; The semantics of XSD regexps are defined in terms of Unicode.
>>     ;; Non-Unicode characters are not allowed in regular expressions and
>>     ;; will not match against the generated regular expressions.  A
>>     ;; Unicode character means a character in one of the Mule charsets
>>     ;; ascii, latin-iso8859-1, mule-unicode-0100-24ff,
>>     ;; mule-unicode-2500-33ff, mule-unicode-e000-ffff, eight-bit-control
>>     ;; or a character translatable to such a character (i.e a character
>>     ;; for which `encode-char' will return non-nil).
>>     ;;
>>     ;; Unfortunately, this means that this package is currently useless
>>     ;; for CJK characters, since there's no mule-unicode charset for the
>>     ;; CJK ranges of Unicode.  We should devise a workaround for this
>>     ;; until the fabled Unicode version of Emacs makes an appearance.
>>
>> Is that correct?
>
> Probably.  The mule-Unicode-* stuff is definitely obsolete.  The only
> thing that bothers me is what happens with eight-bit characters in the
> XSD regexps -- are they allowed?  Emacs in general does allow them.
> If xsd-regexp.el doesn't, that should be stated there.

Hmm, so probably more work is needed here than just removing the above
comment.  There is a lot of non-trivial mule and conversion stuff going
on in that library that might need a proper look by someone that knows
this stuff well.

Perhaps this bug should also be retitled accordingly.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#52263; Package emacs. (Sat, 04 Dec 2021 16:17:01 GMT) Full text and rfc822 format available.

Message #14 received at 52263 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Kangas <stefan <at> marxist.se>
Cc: 52263 <at> debbugs.gnu.org
Subject: Re: bug#52263: Stale comment in xsd-regexp.el about Emacs not
 supporting Unicode
Date: Sat, 04 Dec 2021 18:16:38 +0200
> From: Stefan Kangas <stefan <at> marxist.se>
> Date: Sat, 4 Dec 2021 14:07:46 +0100
> Cc: 52263 <at> debbugs.gnu.org
> 
> >> Is that correct?
> >
> > Probably.  The mule-Unicode-* stuff is definitely obsolete.  The only
> > thing that bothers me is what happens with eight-bit characters in the
> > XSD regexps -- are they allowed?  Emacs in general does allow them.
> > If xsd-regexp.el doesn't, that should be stated there.
> 
> Hmm, so probably more work is needed here than just removing the above
> comment.  There is a lot of non-trivial mule and conversion stuff going
> on in that library that might need a proper look by someone that knows
> this stuff well.

Mainly that file needs simplification: we in effect have a single
range of characters, with the possible exception of the codepoints
between 128 and 160.  Also, decode-char is now a no-op when the 1st
arg is 'ucs'.

I can offer help in those parts where you don't feel you understand
the issue well enough to make the simplifications.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#52263; Package emacs. (Sun, 05 Dec 2021 18:08:05 GMT) Full text and rfc822 format available.

Message #17 received at 52263 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 52263 <at> debbugs.gnu.org
Subject: Re: bug#52263: Stale comment in xsd-regexp.el about Emacs not
 supporting Unicode
Date: Sun, 5 Dec 2021 18:34:32 +0100
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:

> Also, decode-char is now a no-op when the 1st arg is 'ucs'.

Interesting.  So the attached cleanup should be okay to install, then?
[decode-char.diff (text/x-diff, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#52263; Package emacs. (Sun, 05 Dec 2021 18:09:11 GMT) Full text and rfc822 format available.

Message #20 received at 52263 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Kangas <stefan <at> marxist.se>
Cc: 52263 <at> debbugs.gnu.org
Subject: Re: bug#52263: Stale comment in xsd-regexp.el about Emacs not
 supporting Unicode
Date: Sun, 05 Dec 2021 19:48:29 +0200
> From: Stefan Kangas <stefan <at> marxist.se>
> Date: Sun, 5 Dec 2021 18:34:32 +0100
> Cc: 52263 <at> debbugs.gnu.org
> 
> > Also, decode-char is now a no-op when the 1st arg is 'ucs'.
> 
> Interesting.  So the attached cleanup should be okay to install, then?

Yes -- assuming that the 2nd argument is never a cons cell.  Which
AFAICT it never is in these places.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#52263; Package emacs. (Sun, 05 Dec 2021 18:21:02 GMT) Full text and rfc822 format available.

Message #23 received at 52263 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 52263 <at> debbugs.gnu.org
Subject: Re: bug#52263: Stale comment in xsd-regexp.el about Emacs not
 supporting Unicode
Date: Sun, 5 Dec 2021 19:20:54 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

>> Interesting.  So the attached cleanup should be okay to install, then?
>
> Yes -- assuming that the 2nd argument is never a cons cell.  Which
> AFAICT it never is in these places.

I looked over all cases too, and my conclusion is also that the second
argument can never be a cons cell there.  So I installed the cleanup on
master.  Thanks.




This bug report was last modified 2 years and 150 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.