GNU bug report logs - #16216
24.3.50; <control> entries in `ucs-names'

Package: emacs;

Reported by: Drew Adams <drew.adams <at> oracle.com>

Date: Sun, 22 Dec 2013 02:10:01 UTC

Severity: normal

Found in version 24.3.50

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 16216 in the body.
You can then email your comments to 16216 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#16216; Package emacs. (Sun, 22 Dec 2013 02:10:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Drew Adams <drew.adams <at> oracle.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 22 Dec 2013 02:10:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 24.3.50; <control> entries in `ucs-names'
Date: Sat, 21 Dec 2013 18:09:17 -0800 (PST)

The doc for `insert-char' and `ucs-names' is sketchy.  But it does at
least say that it is about inserting a character "using its UNICODE
name or its code point."

So what are all of those `<control>' character names about?  Many
characters are listed in `ucs-names' as having this same "character
name", `<control>':

 C-x 8 RET TAB C-g
 C-h v ucs-names
 C-s <control> C-s C-s...

And yet, AFAICT, there is no UNICODE character that has the name
`<control>', or even any name that has that as a substring.
http://www.unicode.org/charts/charindex.html

The seems like a bug.  But since the description of `ucs-names' is
so sketchy it's hard to assert that.  If this is not a bug, then:

1. In what way is `<control>' a "CHAR-NAME" for a character with any
   code point?  What does CHAR-NAME mean in this case?

2. What is the purpose of the multiple `<control>' CHAR-NAMEs?

3. Why are different CHAR-CODE values associated with the same
   CHAR-NAME, `<control>'?  What does that mean?

4. Try `C-x 8 RET <contr TAB RET'.  You get only one particular
   character "named" <control>, the one with code point decimal
   159.  That's the character named "APPLICATION PROGRAM COMMAND".
   Why that one?


In GNU Emacs 24.3.50.1 (i686-pc-mingw32)
 of 2013-12-16 on ODIEONE
Bzr revision: 115543 rudalics <at> gmx.at-20131216095844-lbjh5yerk6ff0tm7
Windowing system distributor `Microsoft Corp.', version 6.1.7601
Configured using:
 `configure --prefix=/c/Devel/emacs/binary --enable-checking=yes,glyphs
 'CFLAGS=-O0 -g3' LDFLAGS=-Lc:/Devel/emacs/lib
 CPPFLAGS=-Ic:/Devel/emacs/include'

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16216; Package emacs. (Sun, 22 Dec 2013 03:57:02 GMT) Full text and rfc822 format available.

Message #8 received at 16216 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: 16216 <at> debbugs.gnu.org
Subject: Re: bug#16216: 24.3.50; <control> entries in `ucs-names'
Date: Sun, 22 Dec 2013 05:55:56 +0200

> Date: Sat, 21 Dec 2013 18:09:17 -0800 (PST)
> From: Drew Adams <drew.adams <at> oracle.com>
> 
> 1. In what way is `<control>' a "CHAR-NAME" for a character with any
>    code point?  What does CHAR-NAME mean in this case?

Look at UnicodeData.txt, near the beginning of the file.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16216; Package emacs. (Sun, 22 Dec 2013 05:09:02 GMT) Full text and rfc822 format available.

Message #11 received at 16216 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 16216 <at> debbugs.gnu.org
Subject: RE: bug#16216: 24.3.50; <control> entries in `ucs-names'
Date: Sat, 21 Dec 2013 21:08:35 -0800 (PST)

> Look at UnicodeData.txt, near the beginning of the file.

I see; thanks.  And I recall now that you pointed me to that
file once before.

Still, that does not really answer the questions I posed, AFAICT.
At least not for a user of `ucs-names' or the other functions
mentioned.

If `ucs-names' essentially corresponds to UnicodeData.txt, how
about citing that in its doc?  Better yet, perhaps cite this,
which seems to be the place that the fields of UnicodeData.txt
are described:
http://www.unicode.org/Public/5.1.0/ucd/UCD.html#UnicodeData.txt

Still, part of my question is about `insert-char' and
`read-char-by-name', which is really what most users will see.
(Those are admittedly not the same as `ucs-names'.  But they are
currently the only consumers of the latter.)

Should the `<control>' entries of `ucs-names' be included for
the completion provided by `read-char-by-name'?  You can only
choose one of them, anyway.  What is the use case for that -
the reason it is included as a possibility for `C-x 8 RET'?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16216; Package emacs. (Sun, 22 Dec 2013 05:11:01 GMT) Full text and rfc822 format available.

Message #14 received at 16216 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 16216 <at> debbugs.gnu.org
Subject: RE: bug#16216: 24.3.50; <control> entries in `ucs-names'
Date: Sat, 21 Dec 2013 21:10:50 -0800 (PST)

> http://www.unicode.org/Public/5.1.0/ucd/UCD.html#UnicodeData.txt

(That seems to have been replaced by this:
http://www.unicode.org/reports/tr44/#UnicodeData.txt)

Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Sun, 22 Dec 2013 18:11:01 GMT) Full text and rfc822 format available.

Notification sent to Drew Adams <drew.adams <at> oracle.com>:
bug acknowledged by developer. (Sun, 22 Dec 2013 18:11:02 GMT) Full text and rfc822 format available.

Message #19 received at 16216-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: 16216-done <at> debbugs.gnu.org
Subject: Re: bug#16216: 24.3.50; <control> entries in `ucs-names'
Date: Sun, 22 Dec 2013 20:10:36 +0200

> Date: Sat, 21 Dec 2013 21:08:35 -0800 (PST)
> From: Drew Adams <drew.adams <at> oracle.com>
> Cc: 16216 <at> debbugs.gnu.org
> 
> > Look at UnicodeData.txt, near the beginning of the file.
> 
> I see; thanks.  And I recall now that you pointed me to that
> file once before.
> 
> Still, that does not really answer the questions I posed, AFAICT.
> At least not for a user of `ucs-names' or the other functions
> mentioned.

I looked deeper and decided that this was a bug.  The Unicode Standard
explicitly says that control characters have no 'name' property (see
Section 4.8 in the Standard), and that those "<control>" things are
just labels.  The 'name' property cannot have lower-case characters of
"<>" in it anyway.

So starting with trunk revision 115693, all control characters will
have nil as their 'name' property, and "C-x 8 RET < TAB" will say "No
match".  (Some of the control characters have 'old-name' property, so
they still can be called out by name.)

> If `ucs-names' essentially corresponds to UnicodeData.txt, how
> about citing that in its doc?

The exact file is an implementation detail (there's a corresponding
XML file, which could be used if we wanted); the ELisp manual
documents that the properties are derived from UCD, the Unicode
Character Database.

Thanks.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16216; Package emacs. (Sun, 22 Dec 2013 18:15:03 GMT) Full text and rfc822 format available.

Message #22 received at 16216 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: 16216 <at> debbugs.gnu.org
Subject: Re: bug#16216: 24.3.50; <control> entries in `ucs-names'
Date: Sun, 22 Dec 2013 20:13:32 +0200

> Date: Sat, 21 Dec 2013 21:10:50 -0800 (PST)
> From: Drew Adams <drew.adams <at> oracle.com>
> Cc: 16216 <at> debbugs.gnu.org
> 
> > http://www.unicode.org/Public/5.1.0/ucd/UCD.html#UnicodeData.txt
> 
> (That seems to have been replaced by this:
> http://www.unicode.org/reports/tr44/#UnicodeData.txt)

The best references are to the "latest" version:

  http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 20 Jan 2014 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 11 years and 324 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #16216 24.3.50; <control> entries in `ucs-names'

GNU bug report logs - #16216
24.3.50; <control> entries in `ucs-names'