GNU bug report logs -
#16216
24.3.50; <control> entries in `ucs-names'
Previous Next
Reported by: Drew Adams <drew.adams <at> oracle.com>
Date: Sun, 22 Dec 2013 02:10:01 UTC
Severity: normal
Found in version 24.3.50
Done: Eli Zaretskii <eliz <at> gnu.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 16216 in the body.
You can then email your comments to 16216 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#16216
; Package
emacs
.
(Sun, 22 Dec 2013 02:10:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Drew Adams <drew.adams <at> oracle.com>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Sun, 22 Dec 2013 02:10:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
The doc for `insert-char' and `ucs-names' is sketchy. But it does at
least say that it is about inserting a character "using its UNICODE
name or its code point."
So what are all of those `<control>' character names about? Many
characters are listed in `ucs-names' as having this same "character
name", `<control>':
C-x 8 RET TAB C-g
C-h v ucs-names
C-s <control> C-s C-s...
And yet, AFAICT, there is no UNICODE character that has the name
`<control>', or even any name that has that as a substring.
http://www.unicode.org/charts/charindex.html
The seems like a bug. But since the description of `ucs-names' is
so sketchy it's hard to assert that. If this is not a bug, then:
1. In what way is `<control>' a "CHAR-NAME" for a character with any
code point? What does CHAR-NAME mean in this case?
2. What is the purpose of the multiple `<control>' CHAR-NAMEs?
3. Why are different CHAR-CODE values associated with the same
CHAR-NAME, `<control>'? What does that mean?
4. Try `C-x 8 RET <contr TAB RET'. You get only one particular
character "named" <control>, the one with code point decimal
159. That's the character named "APPLICATION PROGRAM COMMAND".
Why that one?
In GNU Emacs 24.3.50.1 (i686-pc-mingw32)
of 2013-12-16 on ODIEONE
Bzr revision: 115543 rudalics <at> gmx.at-20131216095844-lbjh5yerk6ff0tm7
Windowing system distributor `Microsoft Corp.', version 6.1.7601
Configured using:
`configure --prefix=/c/Devel/emacs/binary --enable-checking=yes,glyphs
'CFLAGS=-O0 -g3' LDFLAGS=-Lc:/Devel/emacs/lib
CPPFLAGS=-Ic:/Devel/emacs/include'
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#16216
; Package
emacs
.
(Sun, 22 Dec 2013 03:57:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 16216 <at> debbugs.gnu.org (full text, mbox):
> Date: Sat, 21 Dec 2013 18:09:17 -0800 (PST)
> From: Drew Adams <drew.adams <at> oracle.com>
>
> 1. In what way is `<control>' a "CHAR-NAME" for a character with any
> code point? What does CHAR-NAME mean in this case?
Look at UnicodeData.txt, near the beginning of the file.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#16216
; Package
emacs
.
(Sun, 22 Dec 2013 05:09:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 16216 <at> debbugs.gnu.org (full text, mbox):
> Look at UnicodeData.txt, near the beginning of the file.
I see; thanks. And I recall now that you pointed me to that
file once before.
Still, that does not really answer the questions I posed, AFAICT.
At least not for a user of `ucs-names' or the other functions
mentioned.
If `ucs-names' essentially corresponds to UnicodeData.txt, how
about citing that in its doc? Better yet, perhaps cite this,
which seems to be the place that the fields of UnicodeData.txt
are described:
http://www.unicode.org/Public/5.1.0/ucd/UCD.html#UnicodeData.txt
Still, part of my question is about `insert-char' and
`read-char-by-name', which is really what most users will see.
(Those are admittedly not the same as `ucs-names'. But they are
currently the only consumers of the latter.)
Should the `<control>' entries of `ucs-names' be included for
the completion provided by `read-char-by-name'? You can only
choose one of them, anyway. What is the use case for that -
the reason it is included as a possibility for `C-x 8 RET'?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#16216
; Package
emacs
.
(Sun, 22 Dec 2013 05:11:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 16216 <at> debbugs.gnu.org (full text, mbox):
> http://www.unicode.org/Public/5.1.0/ucd/UCD.html#UnicodeData.txt
(That seems to have been replaced by this:
http://www.unicode.org/reports/tr44/#UnicodeData.txt)
Reply sent
to
Eli Zaretskii <eliz <at> gnu.org>
:
You have taken responsibility.
(Sun, 22 Dec 2013 18:11:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Drew Adams <drew.adams <at> oracle.com>
:
bug acknowledged by developer.
(Sun, 22 Dec 2013 18:11:02 GMT)
Full text and
rfc822 format available.
Message #19 received at 16216-done <at> debbugs.gnu.org (full text, mbox):
> Date: Sat, 21 Dec 2013 21:08:35 -0800 (PST)
> From: Drew Adams <drew.adams <at> oracle.com>
> Cc: 16216 <at> debbugs.gnu.org
>
> > Look at UnicodeData.txt, near the beginning of the file.
>
> I see; thanks. And I recall now that you pointed me to that
> file once before.
>
> Still, that does not really answer the questions I posed, AFAICT.
> At least not for a user of `ucs-names' or the other functions
> mentioned.
I looked deeper and decided that this was a bug. The Unicode Standard
explicitly says that control characters have no 'name' property (see
Section 4.8 in the Standard), and that those "<control>" things are
just labels. The 'name' property cannot have lower-case characters of
"<>" in it anyway.
So starting with trunk revision 115693, all control characters will
have nil as their 'name' property, and "C-x 8 RET < TAB" will say "No
match". (Some of the control characters have 'old-name' property, so
they still can be called out by name.)
> If `ucs-names' essentially corresponds to UnicodeData.txt, how
> about citing that in its doc?
The exact file is an implementation detail (there's a corresponding
XML file, which could be used if we wanted); the ELisp manual
documents that the properties are derived from UCD, the Unicode
Character Database.
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#16216
; Package
emacs
.
(Sun, 22 Dec 2013 18:15:03 GMT)
Full text and
rfc822 format available.
Message #22 received at 16216 <at> debbugs.gnu.org (full text, mbox):
> Date: Sat, 21 Dec 2013 21:10:50 -0800 (PST)
> From: Drew Adams <drew.adams <at> oracle.com>
> Cc: 16216 <at> debbugs.gnu.org
>
> > http://www.unicode.org/Public/5.1.0/ucd/UCD.html#UnicodeData.txt
>
> (That seems to have been replaced by this:
> http://www.unicode.org/reports/tr44/#UnicodeData.txt)
The best references are to the "latest" version:
http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Mon, 20 Jan 2014 12:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 11 years and 120 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.