GNU bug report logs -
#20499
[PROPOSED PATCH] C-x 8 shorthands for curved quotes, Euro, etc.
Previous Next
Reported by: Paul Eggert <eggert <at> cs.ucla.edu>
Date: Mon, 4 May 2015 01:15:03 UTC
Severity: wishlist
Tags: patch
Merged with 16082
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 20499 in the body.
You can then email your comments to 20499 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 04 May 2015 01:15:04 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Mon, 04 May 2015 01:15:04 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Although C-x 8 lets you insert arbitrary Unicode characters, it's
awkward to use this to insert commonly used symbols such as curved
quotes, the Euro symbol, etc. This patch adds simpler sequences for
ISO 8859-15 characters (which includes the Euro), plus characters that
are commonly found in English text and in basic math. For example,
assuming the Alt key works on your keyboard and iso-transl is loaded,
one can now type "A-[" instead of "A-RET LEFT SIN TAB RET" to get the
character "‘" (U+2018 LEFT SINGLE QUOTATION MARK).
* doc/emacs/mule.texi (Unibyte Mode), etc/NEWS: Latin-9 and a few
other printing characters now work too.
* lisp/international/iso-transl.el (iso-transl-char-map):
Also support ISO 8859-15 characters (e.g., "€"), plus the characters
"–—‘’“”†‡•′″←→↔−≈≠≤≥" which are commonly used in English text
or basic math.
This patch is a followup to Bug#20385; although it is a separate issue
and does not fix Bug#20385, it could make fixing Bug#20385 easier.
---
doc/emacs/mule.texi | 4 ++--
etc/NEWS | 2 ++
lisp/international/iso-transl.el | 33 ++++++++++++++++++++++++++++++++-
3 files changed, 36 insertions(+), 3 deletions(-)
diff --git a/doc/emacs/mule.texi b/doc/emacs/mule.texi
index de381df..03e70da 100644
--- a/doc/emacs/mule.texi
+++ b/doc/emacs/mule.texi
@@ -1660,8 +1660,8 @@ characters present directly on the keyboard or using @key{Compose} or
@cindex compose character
@cindex dead character
@item
-For Latin-1 only, you can use the key @kbd{C-x 8} as a ``compose
-character'' prefix for entry of non-@acronym{ASCII} Latin-1 printing
+You can use the key @kbd{C-x 8} as a ``compose character'' prefix for
+entry of non-@acronym{ASCII} Latin-1, Latin-9, and a few other printing
characters. @kbd{C-x 8} is good for insertion (in the minibuffer as
well as other buffers), for searching, and in any other context where
a key sequence is allowed.
diff --git a/etc/NEWS b/etc/NEWS
index 7497652..3313c56 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -213,6 +213,8 @@ successive char insertions.
** Unicode names entered via C-x 8 RET now use substring completion by default.
+** C-x 8 now has shorthands for Latin-9 and a few other commonly used chars.
+
** New minor mode global-eldoc-mode is enabled by default.
** Emacs now supports "bracketed paste mode" when running on a terminal
diff --git a/lisp/international/iso-transl.el b/lisp/international/iso-transl.el
index 73bcae0..ac91c1e 100644
--- a/lisp/international/iso-transl.el
+++ b/lisp/international/iso-transl.el
@@ -1,4 +1,4 @@
-;;; iso-transl.el --- keyboard input definitions for ISO 8859-1 -*- coding: utf-8 -*-
+;;; iso-transl.el --- keyboard input for ISO characters -*- coding: utf-8 -*-
;; Copyright (C) 1987, 1993-1999, 2001-2015 Free Software Foundation,
;; Inc.
@@ -36,6 +36,10 @@
;; to make all of the Alt keys autoload, and it is not clear
;; that the dead accent keys SHOULD autoload this package.
+;; This package supports all characters defined by ISO 8859-1 and ISO 8859-15,
+;; along with a few other ISO 10646 characters commonly used in English
+;; and computing text.
+
;;; Code:
;;; Provide some binding for startup:
@@ -192,6 +196,33 @@
("~o" . [?õ])
("~t" . [?þ])
("~~" . [?¬])
+ ("OE" . [?Œ])
+ ("Oe" . [?œ])
+ ("vS" . [?Š])
+ ("vs" . [?š])
+ ("\"Y" . [?Ÿ])
+ ("vZ" . [?Ž])
+ ("vz" . [?ž])
+ ("_n" . [?–])
+ ("_m" . [?—])
+ ("[" . [?‘])
+ ("]" . [?’])
+ ("{" . [?“])
+ ("}" . [?”])
+ ("1+" . [?†])
+ ("2+" . [?‡])
+ ("**" . [?•])
+ ("*'" . [?′])
+ ("*\"" . [?″])
+ ("*E" . [?€])
+ ("a<" . [?←])
+ ("a>" . [?→])
+ ("a=" . [?↔])
+ ("_-" . [?−])
+ ("~=" . [?≈])
+ ("/=" . [?≠])
+ ("_<" . [?≤])
+ ("_>" . [?≥])
("' " . "'")
("` " . "`")
("\" " . "\"")
--
2.1.0
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 04 May 2015 14:23:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Sun, 3 May 2015 18:13:10 -0700
> Cc: Paul Eggert <eggert <at> cs.ucla.edu>
>
> Although C-x 8 lets you insert arbitrary Unicode characters, it's
> awkward to use this to insert commonly used symbols such as curved
> quotes, the Euro symbol, etc. This patch adds simpler sequences for
> ISO 8859-15 characters (which includes the Euro), plus characters that
> are commonly found in English text and in basic math. For example,
> assuming the Alt key works on your keyboard and iso-transl is loaded,
> one can now type "A-[" instead of "A-RET LEFT SIN TAB RET" to get the
> character "‘" (U+2018 LEFT SINGLE QUOTATION MARK).
> * doc/emacs/mule.texi (Unibyte Mode), etc/NEWS: Latin-9 and a few
> other printing characters now work too.
> * lisp/international/iso-transl.el (iso-transl-char-map):
> Also support ISO 8859-15 characters (e.g., "€"), plus the characters
> "–—‘’“”†‡•′″←→↔−≈≠≤≥" which are commonly used in English text
> or basic math.
Shouldn't we prefer input methods instead? We already have a plethora
of Latin-N-something input methods (including latin-9-prefix), so why
not add more characters there, instead of using iso-transl?
I think input methods generally get less in your way.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 04 May 2015 15:22:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 20499 <at> debbugs.gnu.org (full text, mbox):
severity 20499 wishlist
merge 16082 20499
thanks
>>>>> Eli Zaretskii <eliz <at> gnu.org> writes:
>>>>> From: Paul Eggert Date: Sun, 3 May 2015 18:13:10 -0700
>> Although C-x 8 lets you insert arbitrary Unicode characters, it's
>> awkward to use this to insert commonly used symbols such as curved
>> quotes, the Euro symbol, etc. This patch adds simpler sequences for
>> ISO 8859-15 characters (which includes the Euro), plus characters
>> that are commonly found in English text and in basic math. For
>> example, assuming the Alt key works on your keyboard and iso-transl
>> is loaded, one can now type "A-[" instead of "A-RET LEFT SIN TAB
>> RET" to get the character "‘" (U+2018 LEFT SINGLE QUOTATION MARK).
First of all, isn’t this essentially the same suggestion as the
one of bug#16082? (FWIW, I’ve requested the reports to be
merged; feel free to unmerge if I’ve missed something.)
[…]
> Shouldn't we prefer input methods instead? We already have a
> plethora of Latin-N-something input methods (including
> latin-9-prefix), so why not add more characters there, instead of
> using iso-transl?
> I think input methods generally get less in your way.
I tend to agree with that, but is there currently an easy way to
switch between /two/ input methods? For one thing, I currently
use “no” input method for typing English /and/
russian-typewriter to type Russian.
With the proper Unicode quotes being available via some other
input method, how would I configure Emacs to switch between
/that/ input method and russian-typewriter?
The other side of the issue is that the dashes, arrows,
mathematical symbols, and the likes of them are cross-lingual,
and making them available via input methods will involve
duplication of many of the individual quail-define-rules entries
all around leim/quail/*.el. (If done the straightforward way;
AIUI, anyway.)
--
FSF associate member #7257 http://am-1.org/~ivan/ … 3013 B6A0 230E 334A
Severity set to 'wishlist' from 'normal'
Request was from
Ivan Shmakov <ivan <at> siamics.net>
to
control <at> debbugs.gnu.org
.
(Mon, 04 May 2015 15:22:03 GMT)
Full text and
rfc822 format available.
Merged 16082 20499.
Request was from
Ivan Shmakov <ivan <at> siamics.net>
to
control <at> debbugs.gnu.org
.
(Mon, 04 May 2015 15:22:04 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 04 May 2015 15:43:02 GMT)
Full text and
rfc822 format available.
Message #18 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> From: Ivan Shmakov <ivan <at> siamics.net>
> Date: Mon, 04 May 2015 15:20:56 +0000
>
> > Shouldn't we prefer input methods instead? We already have a
> > plethora of Latin-N-something input methods (including
> > latin-9-prefix), so why not add more characters there, instead of
> > using iso-transl?
>
> > I think input methods generally get less in your way.
>
> I tend to agree with that, but is there currently an easy way to
> switch between /two/ input methods?
I simply use "C-u C-\". Granted, if every 2nd character you type is
U+2018, switching input methods is gonna hurt. But that's not wwhat
happens normally, at least not to me, and you save those Alt-[
etc. for more useful tasks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 04 May 2015 16:13:02 GMT)
Full text and
rfc822 format available.
Message #21 received at 20499 <at> debbugs.gnu.org (full text, mbox):
On 05/04/2015 07:22 AM, Eli Zaretskii wrote:
> Shouldn't we prefer input methods instead?
Typically yes, but for common characters it's better to have a standard
way to input them in any context. The exact set of such characters is
of course debatable (and you could easily talk me out of the
more-obscure characters proposed), but quotes, dashes, and the Euro are
pretty basic to ordinary English text.
Also, Emacs has no English input method, which means Emacs users
currently have trouble writing good English text outside the ASCII
character set. I suppose we could add such a method, but that would
require more user training than the proposed approach. Anyway, Emacs is
natively English and support for basic English text should be available
everywhere.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 04 May 2015 16:13:03 GMT)
Full text and
rfc822 format available.
Message #24 received at 20499 <at> debbugs.gnu.org (full text, mbox):
>>>>> Eli Zaretskii <eliz <at> gnu.org> writes:
>>>>> From: Ivan Shmakov Date: Mon, 04 May 2015 15:20:56 +0000
[…]
>> I tend to agree with that, but is there currently an easy way to
>> switch between /two/ input methods?
> I simply use "C-u C-\".
Given that I edit texts which may be deemed bilingual (Russian
prose interspersed with source code or command line examples)
not just occasionally, /and/ need C-s, C-r at that, – no,
I don’t think it’d work all that well for me.
> Granted, if every 2nd character you type is U+2018, switching input
> methods is gonna hurt.
It’s not that bad, but still; consider, e. g.:
«Ты пророк», вскричал я, «вещий! Птица ты иль дух зловещий,
Этим Небом, что над нами — Богом скрытым навсегда —
Заклинаю, умоляя, мне сказать, — в пределах Рая
Мне откроется ль святая, что средь ангелов всегда,
Та, которую Ленорой в небесах зовут всегда?»
Каркнул Ворон: «Никогда».
Nine such characters per 43 words.
> But that's not what happens normally, at least not to me, and you
> save those Alt-[ etc. for more useful tasks.
My ‘Alt’ is ‘Meta’ most of the time, so it’s rather C-x 8 [,
C-x 8 ], etc. for me, and reserving that for typography isn’t
really a big deal.
--
FSF associate member #7257 http://am-1.org/~ivan/ … 3013 B6A0 230E 334A
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 04 May 2015 16:16:03 GMT)
Full text and
rfc822 format available.
Message #27 received at 20499 <at> debbugs.gnu.org (full text, mbox):
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
How about also adding s, t, S, T with cedilla, dotless i, and I with dot.
Also c and C with a hacek.
C-x 8 C-h is a good way of seeing what all the options are.
It may be worth documenting.
It would be nice to have C-u C-x = show the specific C-x 8 sequence
for a character, if there is one.
By the way, it would be good to have a file that consists of all of
unicode in numeric order. That would provide an easy way to pick some
unicode character (whose code you don't remember) and copying it into
some text.
¬
--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org www.gnu.org
Skype: No way! See stallman.org/skype.html.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 04 May 2015 16:32:02 GMT)
Full text and
rfc822 format available.
Message #30 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> From: Ivan Shmakov <ivan <at> siamics.net>
> Date: Mon, 04 May 2015 16:12:28 +0000
>
> >>>>> Eli Zaretskii <eliz <at> gnu.org> writes:
> >>>>> From: Ivan Shmakov Date: Mon, 04 May 2015 15:20:56 +0000
>
> […]
>
> >> I tend to agree with that, but is there currently an easy way to
> >> switch between /two/ input methods?
>
> > I simply use "C-u C-\".
>
> Given that I edit texts which may be deemed bilingual (Russian
> prose interspersed with source code or command line examples)
> not just occasionally, /and/ need C-s, C-r at that, – no,
> I don’t think it’d work all that well for me.
Don't you have a dual-language keyboard on your system that can switch
languages without Emacs being involved? Input methods are for
characters not directly supported by your keyboard; most systems have
at least 2, sometimes 3 different languages switchable by a hot key.
IOW, I won't expect you to need an input method to type Cyrillic
characters.
> > Granted, if every 2nd character you type is U+2018, switching input
> > methods is gonna hurt.
>
> It’s not that bad, but still; consider, e. g.:
>
> «Ты пророк», вскричал я, «вещий! Птица ты иль дух зловещий,
> Этим Небом, что над нами — Богом скрытым навсегда —
> Заклинаю, умоляя, мне сказать, — в пределах Рая
> Мне откроется ль святая, что средь ангелов всегда,
> Та, которую Ленорой в небесах зовут всегда?»
> Каркнул Ворон: «Никогда».
>
> Nine such characters per 43 words.
Those aren't quotes Paul was talking about. Those are Cyrillic-style
quotes frequently used in Cyrillic languages, and I'd expect them to
be directly available from your keyboard.
Paul's use case is with the original of this poem.
> > But that's not what happens normally, at least not to me, and you
> > save those Alt-[ etc. for more useful tasks.
>
> My ‘Alt’ is ‘Meta’ most of the time, so it’s rather C-x 8 [,
> C-x 8 ], etc. for me, and reserving that for typography isn’t
> really a big deal.
That's exactly the issue: most keyboards will have Alt taken already,
and typing "C-x 8 [" is a PITA, IMO. By contrast, 'C-\ "' is easy.
But if there are people who'd like to go iso-transl way, who am I to
object?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 04 May 2015 16:35:02 GMT)
Full text and
rfc822 format available.
Message #33 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> Date: Mon, 04 May 2015 12:15:07 -0400
> From: Richard Stallman <rms <at> gnu.org>
> Cc: eggert <at> cs.ucla.edu, 20499 <at> debbugs.gnu.org
>
> By the way, it would be good to have a file that consists of all of
> unicode in numeric order.
Would admin/unidata/UnicodeData.txt do?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 04 May 2015 16:49:02 GMT)
Full text and
rfc822 format available.
Message #36 received at 20499 <at> debbugs.gnu.org (full text, mbox):
>>>>> Eli Zaretskii <eliz <at> gnu.org> writes:
>>>>> Date: Mon, 04 May 2015 12:15:07 -0400 From: Richard Stallman
>> By the way, it would be good to have a file that consists of all of
>> unicode in numeric order. That would provide an easy way to pick
>> some unicode character (whose code you don't remember) and copying
>> it into some text.
> Would admin/unidata/UnicodeData.txt do?
I guess given the “copying” part, the request is more along the
lines of, say:
(let ((i #x100))
(while (< i #x180)
(when (zerop (mod i #x20))
(unless (eq ?\n (preceding-char))
(insert ?\n))
(insert (format "%06x" i) ?\s))
(insert ?\s i)
(setq i (+ 1 i))))
000100 Ā ā Ă ă Ą ą Ć ć Ĉ ĉ Ċ ċ Č č Ď ď Đ đ Ē ē Ĕ ĕ Ė ė Ę ę Ě ě Ĝ ĝ Ğ ğ
000120 Ġ ġ Ģ ģ Ĥ ĥ Ħ ħ Ĩ ĩ Ī ī Ĭ ĭ Į į İ ı IJ ij Ĵ ĵ Ķ ķ ĸ Ĺ ĺ Ļ ļ Ľ ľ Ŀ
000140 ŀ Ł ł Ń ń Ņ ņ Ň ň ʼn Ŋ ŋ Ō ō Ŏ ŏ Ő ő Œ œ Ŕ ŕ Ŗ ŗ Ř ř Ś ś Ŝ ŝ Ş ş
000160 Š š Ţ ţ Ť ť Ŧ ŧ Ũ ũ Ū ū Ŭ ŭ Ů ů Ű ű Ų ų Ŵ ŵ Ŷ ŷ Ÿ Ź ź Ż ż Ž ž ſ
I doubt we really need a file for that, though; rather, some
kind of a “Unicode browser” facility. (Not entirely unlike
list-colors-display, but with a dynamic list.)
--
FSF associate member #7257 http://am-1.org/~ivan/ … 3013 B6A0 230E 334A
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 04 May 2015 17:04:02 GMT)
Full text and
rfc822 format available.
Message #39 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> From: Ivan Shmakov <ivan <at> siamics.net>
> Date: Mon, 04 May 2015 16:48:39 +0000
>
> >>>>> Eli Zaretskii <eliz <at> gnu.org> writes:
> >>>>> Date: Mon, 04 May 2015 12:15:07 -0400 From: Richard Stallman
>
> >> By the way, it would be good to have a file that consists of all of
> >> unicode in numeric order. That would provide an easy way to pick
> >> some unicode character (whose code you don't remember) and copying
> >> it into some text.
>
> > Would admin/unidata/UnicodeData.txt do?
>
> I guess given the “copying” part, the request is more along the
We distribute that file with Emacs, so "copying" is irrelevant, I
think.
> (let ((i #x100))
> (while (< i #x180)
> (when (zerop (mod i #x20))
> (unless (eq ?\n (preceding-char))
> (insert ?\n))
> (insert (format "%06x" i) ?\s))
> (insert ?\s i)
> (setq i (+ 1 i))))
> 000100 Ā ā Ă ă Ą ą Ć ć Ĉ ĉ Ċ ċ Č č Ď ď Đ đ Ē ē Ĕ ĕ Ė ė Ę ę Ě ě Ĝ ĝ Ğ ğ
> 000120 Ġ ġ Ģ ģ Ĥ ĥ Ħ ħ Ĩ ĩ Ī ī Ĭ ĭ Į į İ ı IJ ij Ĵ ĵ Ķ ķ ĸ Ĺ ĺ Ļ ļ Ľ ľ Ŀ
> 000140 ŀ Ł ł Ń ń Ņ ņ Ň ň ʼn Ŋ ŋ Ō ō Ŏ ŏ Ő ő Œ œ Ŕ ŕ Ŗ ŗ Ř ř Ś ś Ŝ ŝ Ş ş
> 000160 Š š Ţ ţ Ť ť Ŧ ŧ Ũ ũ Ū ū Ŭ ŭ Ů ů Ű ű Ų ų Ŵ ŵ Ŷ ŷ Ÿ Ź ź Ż ż Ž ž ſ
Did you try to make this longer than 4 lines in a well-covered part of
the BMP? Most of Unicode codepoints on most end-user machines will
display as glyphless boxes, and that's _after_ Emacs searches like
hell after each character system-wide. IOW, such a feature would be
an annoyance, IMO.
By contrast UnicodeData.txt is a pure-ASCII file, and includes
everything except the glyphs themselves.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 04 May 2015 17:41:02 GMT)
Full text and
rfc822 format available.
Message #42 received at 20499 <at> debbugs.gnu.org (full text, mbox):
>>>>> Eli Zaretskii <eliz <at> gnu.org> writes:
>>>>> From: Ivan Shmakov Date: Mon, 04 May 2015 16:48:39 +0000
>>>>> Eli Zaretskii <eliz <at> gnu.org> writes:
>>>>> Date: Mon, 04 May 2015 12:15:07 -0400 From: Richard Stallman
>>>> By the way, it would be good to have a file that consists of all
>>>> of unicode in numeric order. That would provide an easy way to
>>>> pick some unicode character (whose code you don't remember) and
>>>> copying it into some text.
>>> Would admin/unidata/UnicodeData.txt do?
>> I guess given the “copying” part, the request is more along the
>> lines of, say:
> We distribute that file with Emacs, so "copying" is irrelevant,
> I think.
You cannot /copy/ a random Unicode character from
UnicodeData.txt – precisely because there’re /no/ non-ASCII
characters in that file in the first place.
Arguably, you cannot pick one, either, if you only know how it
/looks/ – not how it’s named. (As in: named in English.)
Otherwise, I tend to keep a copy of [1] at hand, sure.
[1] http://unicode.org/Public/UNIDATA/NamesList.txt
[…]
> Did you try to make this longer than 4 lines in a well-covered part
> of the BMP? Most of Unicode codepoints on most end-user machines
> will display as glyphless boxes, and that's _after_ Emacs searches
> like hell after each character system-wide. IOW, such a feature
> would be an annoyance, IMO.
On a tty frame, it surely wouldn’t. But I’ve got your point.
One more reason to use a dynamic list, BTW. Even more so if
there’s a way to check whether the glyph is available (or,
rather, was available when Emacs last checked) from Lisp.
[…]
--
FSF associate member #7257 http://am-1.org/~ivan/ … 3013 B6A0 230E 334A
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 04 May 2015 18:13:02 GMT)
Full text and
rfc822 format available.
Message #45 received at 20499 <at> debbugs.gnu.org (full text, mbox):
>>>>> Eli Zaretskii <eliz <at> gnu.org> writes:
>>>>> From: Ivan Shmakov Date: Mon, 04 May 2015 16:12:28 +0000
>>>>> Eli Zaretskii <eliz <at> gnu.org> writes:
>>>>> From: Ivan Shmakov Date: Mon, 04 May 2015 15:20:56 +0000
[…]
>>>> I tend to agree with that, but is there currently an easy way to
>>>> switch between /two/ input methods?
>>> I simply use "C-u C-\".
>> Given that I edit texts which may be deemed bilingual (Russian prose
>> interspersed with source code or command line examples) not just
>> occasionally, /and/ need C-s, C-r at that, – no, I don’t think it’d
>> work all that well for me.
> Don't you have a dual-language keyboard on your system that can
> switch languages without Emacs being involved? Input methods are for
> characters not directly supported by your keyboard; most systems have
> at least 2, sometimes 3 different languages switchable by a hot key.
> IOW, I won't expect you to need an input method to type Cyrillic
> characters.
With tty frames, it /does/ make sense to use an input method.
Besides, C-u C-\ tends to be easier to use than the system’s
facility when I need to use some layout not otherwise typical to
my work. (Although I /do/ use setxkbmap(1) when it becomes
really necessary.)
[…]
>> «Ты пророк», вскричал я, «вещий! Птица ты иль дух зловещий,
>> Этим Небом, что над нами — Богом скрытым навсегда —
>> Заклинаю, умоляя, мне сказать, — в пределах Рая
>> Мне откроется ль святая, что средь ангелов всегда,
>> Та, которую Ленорой в небесах зовут всегда?»
>> Каркнул Ворон: «Никогда».
>> Nine such characters per 43 words.
> Those aren't quotes Paul was talking about. Those are Cyrillic-style
> quotes frequently used in Cyrillic languages, and I'd expect them to
> be directly available from your keyboard.
> Paul's use case is with the original of this poem.
There’re no such quotation marks on the Cyrillic keyboard
layouts I’m aware of. It really is no different to the English
case — the only quotation mark you get “for free” is the good
old ‘"’. (And given that the Russian alphabet is 33 characters
– versus 26 for English – with the physical keyboard layout
being the same 104 keys, it’s actually a tad worse, with even
the comma typically bound to a shifted – Shift-. – key.)
These aren’t exactly “Cyrillic”, either, as both German and
French use exactly the same quotation marks.
Then, there’re the en and em dash characters, even though they
may not be (easily) discernible with a fixed-width font.
[…]
>> My ‘Alt’ is ‘Meta’ most of the time, so it’s rather C-x 8 [,
>> C-x 8 ], etc. for me, and reserving that for typography isn’t really
>> a big deal.
> That's exactly the issue: most keyboards will have Alt taken already,
> and typing "C-x 8 [" is a PITA, IMO.
FWIW, I use C-x 8 <, > for years now.
> By contrast, 'C-\ "' is easy.
How do I define an input method so that ‘"’ is mapped to either
“ or ” depending on the context?
> But if there are people who'd like to go iso-transl way, who am I to
> object?
I’m unsure on how much should the current list be expanded, but
I see no reason /not/ to support, say, C-x 8 1 / 8 for ⅛ when we
already support C-x 8 1 / 2, 4 for ½, ¼.
--
FSF associate member #7257 http://am-1.org/~ivan/ … 3013 B6A0 230E 334A
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 04 May 2015 18:30:06 GMT)
Full text and
rfc822 format available.
Message #48 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> From: Ivan Shmakov <ivan <at> siamics.net>
> Date: Mon, 04 May 2015 18:12:27 +0000
>
> How do I define an input method so that ‘"’ is mapped to either
> “ or ” depending on the context?
See texinfo.el for some ideas.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 04 May 2015 18:41:02 GMT)
Full text and
rfc822 format available.
Message #51 received at 20499 <at> debbugs.gnu.org (full text, mbox):
On 05/04/2015 09:15 AM, Richard Stallman wrote:
> [[[ To any NSA and FBI agents reading my email: please consider ]]]
> [[[ whether defending the US Constitution against all enemies, ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
> How about also adding s, t, S, T with cedilla, dotless i, and I with dot.
> Also c and C with a hacek.
Sure, I can look into that. Also the slashed L and l, perhaps, so that
we can spell names like Łukasiewicz. If we want to be more ambitious,
we could support the Latin letters in any ISO 8859 variant, which would
include the following additions (this includes all the letters you
mentioned):
ă Ă ą Ą ā Ā ḃ Ḃ ć Ć ĉ Ĉ č Č ċ Ċ ď Ď ḋ Ḋ đ Đ ě Ě ė Ė ę Ę ē Ē ḟ Ḟ ğ Ğ ĝ Ĝ
ġ Ġ ģ Ģ ĥ Ĥ ħ Ħ ĩ Ĩ į Į ī Ī ı İ ĵ Ĵ ķ Ķ ĺ Ĺ ľ Ľ ł Ł ļ Ļ ṁ Ṁ ń Ń ň Ň ņ Ņ
ŋ Ŋ ő Ő ō Ō ṗ Ṗ ĸ ŕ Ŕ ř Ř ŗ Ŗ ś Ś ŝ Ŝ ṡ Ṡ ş Ş ť Ť ṫ Ṫ ŧ Ŧ ţ Ţ ŭ Ŭ ů Ů ű
Ű ũ Ũ ų Ų ū Ū ẃ Ẃ ẁ Ẁ ŵ Ŵ ẅ Ẅ ỳ Ỳ ŷ Ŷ ź Ź ż Ż
It may be difficult to fit all these into the existing C-x 8 space, though.
> C-x 8 C-h is a good way of seeing what all the options are.
> It may be worth documenting.
It is documented in the manual now.
> It would be nice to have C-u C-x = show the specific C-x 8 sequence
> for a character, if there is one.
Yes, that'd be nice to add.
> it would be good to have a file that consists of all of
> unicode in numeric order. That would provide an easy way to pick some
> unicode character (whose code you don't remember) and copying it into
> some text.
Although Eli mentioned that we already have such a file, it isn't
installed. Perhaps we could install it in the etc directory (next to
AUTHORS, CONTRIBUTE, etc.) and then have 'C-h u' visit it.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 04 May 2015 22:01:02 GMT)
Full text and
rfc822 format available.
Message #54 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> Arguably, you cannot pick one, either, if you only know how it
> /looks/ – not how it’s named. (As in: named in English.)
BTW, the completion in C-x 8 RET will not only show you the character
name but will also (try to) display the actual character as an
annotation in the *Completions* buffer.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 04 May 2015 22:01:03 GMT)
Full text and
rfc822 format available.
Message #57 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> First of all, isn’t this essentially the same suggestion as the
> one of bug#16082? (FWIW, I’ve requested the reports to be
> merged; feel free to unmerge if I’ve missed something.)
Indeed. I'm not opposed to adding such things. I do wish C-x 8 was
changed to make use of the quail code somehow.
Also, I think it would be good to construct this table
semi-automatically, along the lines of what I've done for latin-ltx.el.
> I tend to agree with that, but is there currently an easy way to
> switch between /two/ input methods? For one thing, I currently
> use “no” input method for typing English /and/
> russian-typewriter to type Russian.
Indeed. IIUC it would be trivial to let C-\ cycle between
a user-selected set of default input methods. Patch welcome.
I also wish it were possible to activate several input methods
at the same time. I don't (know how to) use state-based methods, but
for input methods like French or TeX, it isn't that hard to come up with
ways to create new input methods by combining or shifting (e.g. add
a prefix key, or drop a prefix) existing ones.
> The other side of the issue is that the dashes, arrows,
> mathematical symbols, and the likes of them are cross-lingual,
> and making them available via input methods will involve
> duplication of many of the individual quail-define-rules entries
> all around leim/quail/*.el. (If done the straightforward way;
> AIUI, anyway.)
Indeed. Which is why I think it makes sense to try and develop ways to
create "partial input methods" and then combine them.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Tue, 05 May 2015 06:04:01 GMT)
Full text and
rfc822 format available.
Message #60 received at 20499 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
>> How about also adding s, t, S, T with cedilla, dotless i, and I with dot.
>> Also c and C with a hacek.
>
> Sure, I can look into that. Also the slashed L and l, perhaps, so that we can
> spell names like Łukasiewicz.
Attached is a revised patch that adds support for the abovementioned characters,
plus other Latin characters that might be encountered by people mentioning
foreign names. It makes room by rejiggering three of the less-commonly used
entries in the C-x 8 table.
[0001-C-x-8-shorthands-for-curved-quotes-Euro-etc.patch (text/x-patch, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Tue, 05 May 2015 14:40:03 GMT)
Full text and
rfc822 format available.
Message #63 received at 20499 <at> debbugs.gnu.org (full text, mbox):
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> > Would admin/unidata/UnicodeData.txt do?
It doesn't do the job, becuase it doesn't contain the characters
themselves.
> 000100 Ā ā Ă ă Ą ą Ć ć Ĉ ĉ Ċ ċ Č č Ď ď Đ đ Ē ē Ĕ ĕ Ė ė Ę ę Ě ě Ĝ ĝ Ğ ğ
That's what I have in mind. Perhaps we should have a command that
generates it.
However, in addition to these lines of characters, it should have
other lines with the names of the scripts and the languages they
belong to, so you can search for those.
If you type RET on a character, it should visit
admin/unidata/UnicodeData.txt and move to the corresponding line.
Likewise, admin/unidata/UnicodeData.txt could have a special major
mode, so that typing RET on the line describing some character
switches to the all-of-unicode buffer and goes to the right character
in it.
--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org www.gnu.org
Skype: No way! See stallman.org/skype.html.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Tue, 05 May 2015 14:50:03 GMT)
Full text and
rfc822 format available.
Message #66 received at 20499 <at> debbugs.gnu.org (full text, mbox):
On Tue, 05 May 2015 10:38:53 -0400 Richard Stallman <rms <at> gnu.org> wrote:
RS> If you type RET on a character, it should visit
RS> admin/unidata/UnicodeData.txt and move to the corresponding line.
Could something like eldoc be used instead to show the information and
the all the shortcuts to that character without switching buffers?
Ted
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Tue, 05 May 2015 15:32:02 GMT)
Full text and
rfc822 format available.
Message #69 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> Date: Tue, 05 May 2015 10:38:53 -0400
> From: Richard Stallman <rms <at> gnu.org>
> Cc: 20499 <at> debbugs.gnu.org
>
> > > Would admin/unidata/UnicodeData.txt do?
>
> It doesn't do the job, becuase it doesn't contain the characters
> themselves.
You mean, the glyphs? (It does show the codepoint, so you can easily
display the character via "C-x 8 RET".)
As for showing the glyphs, visiting a file with large number of
characters runs a high risk of being an annoyance due to the
corresponding fonts being unavailable on the system. E.g., "C-h H",
which only shows a small part of those, takes 4 sec on my system with
an optimized build, and about 6 in a non-optimized build.
So if we provide such a command, IMO we should prompt for a block of
codepoints, and display only that block.
> If you type RET on a character, it should visit
> admin/unidata/UnicodeData.txt and move to the corresponding line.
I'm not sure showing UnicodeData.txt in its raw form will be useful.
Most people won't know how to interpret the attributes encoded there,
about the only understandable parts are the codepoint and the name.
And we already show this in human-readable form in "C-u C-x =", so we
could simply reuse the same code here.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Tue, 05 May 2015 15:33:02 GMT)
Full text and
rfc822 format available.
Message #72 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> Date: Tue, 05 May 2015 10:49:36 -0400
> Cc: Ivan Shmakov <ivan <at> siamics.net>, 20499 <at> debbugs.gnu.org
>
> Could something like eldoc be used instead to show the information and
> the all the shortcuts to that character without switching buffers?
Sounds like a natural extension of "C-x =".
(And no, I don't think that showing that info without an explicit user
command is a good idea in this case. Eldoc has a very different use
case in mind.)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Tue, 05 May 2015 16:06:01 GMT)
Full text and
rfc822 format available.
Message #75 received at 20499 <at> debbugs.gnu.org (full text, mbox):
>>>>> Eli Zaretskii <eliz <at> gnu.org> writes:
>>>>> Date: Tue, 05 May 2015 10:49:36 -0400
>> Could something like eldoc be used instead to show the information
>> and the all the shortcuts to that character without switching
>> buffers?
> Sounds like a natural extension of "C-x =".
Agreed.
> (And no, I don't think that showing that info without an explicit
> user command is a good idea in this case. Eldoc has a very different
> use case in mind.)
I’m not fond of Eldoc, but I presume that after an explicit user
M-x unicode-data-mode command – it could be fine.
I’d also prefer for that same mode to support NamesList.txt.
--
FSF associate member #7257 http://am-1.org/~ivan/ … 3013 B6A0 230E 334A
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Tue, 05 May 2015 16:22:02 GMT)
Full text and
rfc822 format available.
Message #78 received at 20499 <at> debbugs.gnu.org (full text, mbox):
>>>>> Eli Zaretskii <eliz <at> gnu.org> writes:
>>>>> Date: Tue, 05 May 2015 10:38:53 -0400 From: Richard Stallman
[…]
> As for showing the glyphs, visiting a file with large number of
> characters runs a high risk of being an annoyance due to the
> corresponding fonts being unavailable on the system. E. g., "C-h H",
> which only shows a small part of those, takes 4 sec on my system with
> an optimized build, and about 6 in a non-optimized build.
> So if we provide such a command, IMO we should prompt for a block of
> codepoints, and display only that block.
No objection on my part, but I’d rather provide the “buttons” to
move to the previous and next blocks in that same buffer.
OTOH, what would it take to improve the display time in such a
case? Unless I be mistaken, other (as in: mainstream; think of,
say, Firefox) software generally /does/ handle that case
reasonably well.
>> If you type RET on a character, it should visit
>> admin/unidata/UnicodeData.txt and move to the corresponding line.
> I'm not sure showing UnicodeData.txt in its raw form will be useful.
> Most people won't know how to interpret the attributes encoded there,
> about the only understandable parts are the codepoint and the name.
What about NamesList.txt?
> And we already show this in human-readable form in "C-u C-x =", so we
> could simply reuse the same code here.
The problem with C-u C-x = is that it describes a single
character a time, while it may be beneficial to see some
“related” (in either name or number) characters as well.
--
FSF associate member #7257 http://am-1.org/~ivan/ … 3013 B6A0 230E 334A
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Tue, 05 May 2015 16:43:02 GMT)
Full text and
rfc822 format available.
Message #81 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> From: Ivan Shmakov <ivan <at> siamics.net>
> Date: Tue, 05 May 2015 16:20:50 +0000
>
> > So if we provide such a command, IMO we should prompt for a block of
> > codepoints, and display only that block.
>
> No objection on my part, but I’d rather provide the “buttons” to
> move to the previous and next blocks in that same buffer.
That could be okay, too, but it cannot be instead of a directly going
to a block. Imagine going all the way to, say, the Aegean Numbers
block by clicking Next, Next, Next, ...
> OTOH, what would it take to improve the display time in such a
> case?
How can you improve it when fonts don't exist on the target machine?
> Unless I be mistaken, other (as in: mainstream; think of,
> say, Firefox) software generally /does/ handle that case
> reasonably well.
I don't know anything about that, except that Emacs uses the same
libraries for accessing fonts. Unfortunately, we don't have on board
an active enough maintainer who is knowledgeable about font handling
(both in general and in Emacs). Feel free to fill the niche.
> >> If you type RET on a character, it should visit
> >> admin/unidata/UnicodeData.txt and move to the corresponding line.
>
> > I'm not sure showing UnicodeData.txt in its raw form will be useful.
> > Most people won't know how to interpret the attributes encoded there,
> > about the only understandable parts are the codepoint and the name.
>
> What about NamesList.txt?
What do you mean? NamesList.txt contains a different information, and
once again at least part of it will not be easily understood, or even
useful to most people, I think.
> > And we already show this in human-readable form in "C-u C-x =", so we
> > could simply reuse the same code here.
>
> The problem with C-u C-x = is that it describes a single
> character a time, while it may be beneficial to see some
> “related” (in either name or number) characters as well.
Well, loops are available... But I very much doubt you'll be able to
display enough useful information in a single line that way.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Wed, 06 May 2015 13:10:04 GMT)
Full text and
rfc822 format available.
Message #84 received at 20499 <at> debbugs.gnu.org (full text, mbox):
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> > I'm not sure showing UnicodeData.txt in its raw form will be useful.
> > Most people won't know how to interpret the attributes encoded there,
> > about the only understandable parts are the codepoint and the name.
Even if the user understands only those two, the feature is useful
nonetheless.
Some slightly different feature might be better. I am not addressing those
details.
> What about NamesList.txt?
I don't see a file named NamesList.txt there.
--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org www.gnu.org
Skype: No way! See stallman.org/skype.html.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Wed, 06 May 2015 13:10:08 GMT)
Full text and
rfc822 format available.
Message #87 received at 20499 <at> debbugs.gnu.org (full text, mbox):
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> > > > Would admin/unidata/UnicodeData.txt do?
> >
> > It doesn't do the job, becuase it doesn't contain the characters
> > themselves.
> You mean, the glyphs?
Yes, exactly.
(It does show the codepoint, so you can easily
> display the character via "C-x 8 RET".)
You mean, one character at a time?
I want to be able to scan quickly through the buffer looking at
lots of characters to find the one I want. If I have to type
a command for _each character_, just to see it, that is useless
for the purpose.
C-x 8 RET is even worse than that, because it requires
_copying_ the name of the character. To actually see the character
point is on requires
M-f C-f C-SPC C-s ; C-b M-w C-a C-x 8 RET C-y SPC
I could make that a keyboard macro and repeat it many times
to get all these codes into the buffer. It would take a long time.
Furthermore, it would show only one character per line,
so few characters would appear on the screen at any time.
To look at them all would require lots of scrolling.
To do this job well requires output like that of the short Lisp
program someone sent, showing only characters and NOT the names,
with many characters per line.
The buffer shoulod be divided into stanzas, each one labeled with the
name of its script or portion thereof.
> As for showing the glyphs, visiting a file with large number of
> characters runs a high risk of being an annoyance due to the
> corresponding fonts being unavailable on the system.
We could set up a way to test whether a code point can be
displayed, and skip scripts that can't be displayed.
So if we provide such a command, IMO we should prompt for a block of
codepoints, and display only that block.
It is inconvenient to expect users to know the codepoint values.
Suppose I want to see Greek letters -- I have no idea what codepoints
those are, and I should not need to know them in order to specify
"Greek letters".
To specify a script by name as an argument would be ok,
but not very convenient. Here's a simpler and more convenient interface:
The header line for each script could have a [hide] or [show] button
to select visibility of that script. Initially they could all be
hidden, and the user would expose those that she is interested in.
--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org www.gnu.org
Skype: No way! See stallman.org/skype.html.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Wed, 06 May 2015 15:34:02 GMT)
Full text and
rfc822 format available.
Message #90 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> Date: Wed, 06 May 2015 09:09:09 -0400
> From: Richard Stallman <rms <at> gnu.org>
> Cc: 20499 <at> debbugs.gnu.org
>
> > > I'm not sure showing UnicodeData.txt in its raw form will be useful.
> > > Most people won't know how to interpret the attributes encoded there,
> > > about the only understandable parts are the codepoint and the name.
>
> Even if the user understands only those two, the feature is useful
> nonetheless.
Then perhaps we should show only the parts that are easily
understandable.
> > What about NamesList.txt?
>
> I don't see a file named NamesList.txt there.
It's part of the Unicode Standard, you can find it here:
http://unicode.org/Public/UNIDATA/NamesList.txt
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Wed, 06 May 2015 16:28:02 GMT)
Full text and
rfc822 format available.
Message #93 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> Date: Wed, 06 May 2015 09:09:26 -0400
> From: Richard Stallman <rms <at> gnu.org>
> CC: ivan <at> siamics.net, 20499 <at> debbugs.gnu.org
>
> > > > > Would admin/unidata/UnicodeData.txt do?
> > >
> > > It doesn't do the job, becuase it doesn't contain the characters
> > > themselves.
>
> > You mean, the glyphs?
>
> Yes, exactly.
>
> (It does show the codepoint, so you can easily
> > display the character via "C-x 8 RET".)
>
> You mean, one character at a time?
>
> I want to be able to scan quickly through the buffer looking at
> lots of characters to find the one I want. If I have to type
> a command for _each character_, just to see it, that is useless
> for the purpose.
Maybe I don't understand the use case you have in mind. I thought the
use case was that you already know the character's name, at least
approximately, and want to look up its code, to type is faster.
> C-x 8 RET is even worse than that, because it requires
> _copying_ the name of the character. To actually see the character
> point is on requires
> M-f C-f C-SPC C-s ; C-b M-w C-a C-x 8 RET C-y SPC
"C-x 8 RET" accepts the codepoint in hex, so if you are already
looking at the line that defines the character, all you need is to
type a 4-, sometimes 5-hex-digit number.
And if you want to type the name, "C-x 8 RET" provides completion, so
no need for such a complicated dance for copying the name.
> I could make that a keyboard macro and repeat it many times
> to get all these codes into the buffer. It would take a long time.
> Furthermore, it would show only one character per line,
> so few characters would appear on the screen at any time.
> To look at them all would require lots of scrolling.
I don't really see how looking for a character with your eyes could be
a convenient feature, except in very corner situations with a small
number of simply-looking characters. Even for Latin characters, there
are many similar shapes, like Ả and Ă or Ő and Ố, and they are spread
all over the Unicode range. How would you go about finding your
character, if all you have is some vague idea of its shape (which,
btw, could look quite different with different fonts)? Sounds like a
very inefficient way to me.
I think we must assume the user has some idea about the character:
either its approximate name, or at least the block or script to which
it belongs. Then we could display some reasonably manageable subset
of characters. We could further help by asking about the base
character (the above examples have either A or O as their base
character), because if the user knows that, with some scripts the
number of potential candidates will go down drastically. But even
when the base character is known, the number of candidates is not
negligible: e.g., there are 46 characters in the Unicode database that
are somehow related to A.
> The buffer shoulod be divided into stanzas, each one labeled with the
> name of its script or portion thereof.
Not sure what you mean by "script" here. Emacs currently knows about
almost 100 scripts defined by Unicode, so even displaying a couple of
lines for each one will make a large buffer. Isn't it better to allow
the user to specify one, with completion?
> > As for showing the glyphs, visiting a file with large number of
> > characters runs a high risk of being an annoyance due to the
> > corresponding fonts being unavailable on the system.
>
> We could set up a way to test whether a code point can be
> displayed, and skip scripts that can't be displayed.
Alas, we don't know which cannot be displayed until we've tried and
failed.
> So if we provide such a command, IMO we should prompt for a block of
> codepoints, and display only that block.
>
> It is inconvenient to expect users to know the codepoint values.
Unicode blocks have names, so providing completion for them would do
the job, I think. The entire Unicode codespace is divided into about
200 blocks, so if the user knows, or can guess the one she needs, that
will probably limit the search for the character to some reasonable
quantity.
Moreover, some scripts share the same blocks, and vice versa. So
being able to specify just scripts or just blocks is not enough; we
need both.
I think we need all these methods, possibly more, because you may not
necessarily know or guess easily where to look. For example, there
are certain characters that appear as mathematical symbols in addition
to their "normal" places, so unless the user already knows in which
block to look, they will find the "base character" method very useful,
and without it could very well miss their character.
> Suppose I want to see Greek letters -- I have no idea what codepoints
> those are, and I should not need to know them in order to specify
> "Greek letters".
You'd only need to know "Greek", and all the Greek blocks will be
displayed. If you happen to know more, like "Greek Extended", it will
further limit the number of characters to view. And, of course, there
are complications: you might think it's a Greek character, but it
could really be a math symbol or a Cyrillic character instead.
> The header line for each script could have a [hide] or [show] button
> to select visibility of that script. Initially they could all be
> hidden, and the user would expose those that she is interested in.
A 100-button buffer is not very convenient, especially when you have
only an approximate idea about the script you are after (e.g., is that
funny shape part of "Miscellaneous Technical" block or "Geometric
Shapes"?)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Wed, 06 May 2015 22:22:02 GMT)
Full text and
rfc822 format available.
Message #96 received at 20499 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
>>>>> Paul Eggert <eggert <at> cs.ucla.edu> writes:
>>> How about also adding s, t, S, T with cedilla, dotless i, and I
>>> with dot. Also c and C with a hacek.
>> Sure, I can look into that. Also the slashed L and l, perhaps, so
>> that we can spell names like Łukasiewicz.
> Attached is a revised patch that adds support for the abovementioned
> characters, plus other Latin characters that might be encountered by
> people mentioning foreign names. It makes room by rejiggering three
> of the less-commonly used entries in the C-x 8 table.
> --------------090904020002020306060104
> Content-Type: text/x-patch;
> name="0001-C-x-8-shorthands-for-curved-quotes-Euro-etc.patch"
This MIME part sure wants ‘; charset=UTF-8’. Otherwise, Gnus
does no decoding, and Emacs shows the contents with the likes of
\304\260.
> Content-Transfer-Encoding: 8bit
> Content-Disposition: attachment;
> filename="0001-C-x-8-shorthands-for-curved-quotes-Euro-etc.patch"
> From aafde36c45bd0341b07707409873fb93cbbb33f1 Mon Sep 17 00:00:00 2001
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Mon, 4 May 2015 22:41:20 -0700
> Subject: [PATCH] C-x 8 shorthands for curved quotes, Euro, etc.
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
I presume that /this/ was intended to be the MIME part /header/,
yet it ended up being in the part /body./
> + withdrawn still works character
> + C-x 8 . C-x 8 . SPC · U+00B7 MIDDLE DOT
> + C-x 8 = C-x 8 = SPC ¯ U+00AF SPACING MACRON
> + C-x 8 u C-x 8 m µ U+00B5 MICRO SIGN
I believe that both C-x 8 . and C-x 8 u are too convenient to be
dropped without more discussion. For one thing, · seems more
“common” a character than İ. Other than that, C-x 8 . . feels
easier to type than C-x 8 SPC.
> -;;; iso-transl.el --- keyboard input definitions for ISO 8859-1 -*- coding: utf-8 -*-
> +;;; iso-transl.el --- keyboard input for ISO characters -*- coding: utf-8 -*-
I guess we may safely state “ISO 10646” here.
> +;; This package supports all characters defined by ISO 8859-1,
> +;; along with many other Latin characters and a few other characters
> +;; commonly used in English and basic math.
… And may also mention it here.
> ("-" . [?])
> - ("*." . [?·])
The removal above doesn’t seem to be strictly necessary. The
same for the *= and *u ones.
> ("~~" . [?¬])
> + ("=A" . [?Ā])
> + ("=a" . [?ā])
> + ("uA" . [?Ă])
> + ("ua" . [?ă])
> + ("gA" . [?Ą])
… Also, did you consider generating this list automatically,
based on the codepoint properties already known to Emacs?
Something along the lines of the function MIMEd, which readily
produces a list of entries for the following 133 characters.
(Three spaces added for symmetry purposes.)
À Á Â Ã Ä È É Ê Ë Ì Í Î Ï Ñ Ò Ó Ô Õ Ö Ù Ú Û Ü Ý
à á â ã ä è é ê ë ì í î ï ñ ò ó ô õ ö ù ú û ü ý
ÿ Ā ā Ć ć Ĉ ĉ Č č Ď ď Ē ē Ě ě Ĝ ĝ Ĥ ĥ Ĩ ĩ Ī ī Ĵ ĵ Ĺ ĺ
Ľ ľ Ń ń Ň ň Ō ō Ŕ ŕ Ř ř Ś ś Ŝ ŝ Š š Ť ť Ũ ũ Ū ū Ŵ ŵ Ŷ ŷ
Ÿ Ź ź Ž ž Ǎ ǎ Ǐ ǐ Ǒ ǒ Ǔ ǔ Ǧ ǧ Ǩ ǩ ǰ Ǵ ǵ Ǹ ǹ Ș ș Ț ț
Ȟ ȟ Ȳ ȳ
--
FSF associate member #7257 http://am-1.org/~ivan/ … 3013 B6A0 230E 334A
[Message part 2 (text/emacs-lisp, inline)]
(defun code-decomposition-to-iso-transl-map (&optional from to)
(unless from (setq from #xa8))
(unless to (setq to #x2b0))
(let ((acc nil)
(i from))
(while (< i to)
(let* ((deco (get-char-code-property i 'decomposition))
;; FIXME: handle the (eq 'compat (car deco)) case here
(str (pcase deco
(`(,c #x300) (string ?` c))
(`(,c #x301) (string ?' c))
(`(,c #x302) (string ?^ c))
(`(,c #x303) (string ?~ c))
(`(,c #x304) (string ?= c))
(`(,c #x308) (string 34 c))
(`(,c #x30c) (string ?v c))
(`(,c #x326) (string 59 c))
(`(,c #x326) (string ?, c)))))
(when (and str (< (aref str 1) #x7f)) ; Is an ASCII character?
(setq acc (cons (cons str (vector i)) acc))))
(setq i (+ 1 i)))
;; .
acc))
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Thu, 07 May 2015 04:06:02 GMT)
Full text and
rfc822 format available.
Message #99 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> From: Ivan Shmakov <ivan <at> siamics.net>
> Date: Wed, 06 May 2015 22:20:54 +0000
>
> > -;;; iso-transl.el --- keyboard input definitions for ISO 8859-1 -*- coding: utf-8 -*-
> > +;;; iso-transl.el --- keyboard input for ISO characters -*- coding: utf-8 -*-
>
> I guess we may safely state “ISO 10646” here.
Actually, we should drop the "ISO" part completely. Characters don't
belong to any encoding, they are entities that exists independently of
any encoding.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Thu, 07 May 2015 07:15:02 GMT)
Full text and
rfc822 format available.
Message #102 received at 20499 <at> debbugs.gnu.org (full text, mbox):
>>>>> Eli Zaretskii <eliz <at> gnu.org> writes:
>>>>> From: Ivan Shmakov Date: Wed, 06 May 2015 22:20:54 +0000
>>> -;;; iso-transl.el --- keyboard input definitions for ISO 8859-1 -*- coding: utf-8 -*-
>>> +;;; iso-transl.el --- keyboard input for ISO characters -*- coding: utf-8 -*-
>> I guess we may safely state “ISO 10646” here.
> Actually, we should drop the "ISO" part completely. Characters don't
> belong to any encoding, they are entities that exists independently
> of any encoding.
ISO 10646 is also a /repertoire/ of characters; so unless
'iso-transl is going to get support for characters outside this
particular set, the above will still be justified. Albeit
mildly redundant, I guess.
--
FSF associate member #7257 np. Computer Eyes — Ayreon … 3013 B6A0 230E 334A
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Thu, 07 May 2015 07:54:02 GMT)
Full text and
rfc822 format available.
Message #105 received at 20499 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
> I believe that both C-x 8 . and C-x 8 u are too convenient to be
> dropped without more discussion. For one thing, · seems more
> “common” a character than İ.
In Turkish and Azerbaijani the reverse is true. And since RMS requested dotted
I and dotless i my assumption was that Turkish is of some importance. Dotted
sequences are the natural ways to type these characters as well as other dotted
letters ĊċĖėĠġĿŀŻż in the proposal (used variously in Lithuanian, Maltese, and
Polish), so there is a pretty strong case to usurp "C-x 8 .".
The case for usurping "C-x 8 u" is even stronger, since it's equivalent to the
equally-short "C-x 8 m", some easily-typed symbol is needed to denote breve, and
"u" looks more like breve than any other ASCII character does.
> Other than that, C-x 8 . . feels
> easier to type than C-x 8 SPC.
Good point, and I've done this in the attached patch.
> > -;;; iso-transl.el --- keyboard input definitions for ISO 8859-1 -*- coding: utf-8 -*-
> > +;;; iso-transl.el --- keyboard input for ISO characters -*- coding: utf-8 -*-
>
> I guess we may safely state “ISO 10646” here.
Thanks, done in the attached patch.
> > +;; This package supports all characters defined by ISO 8859-1,
> > +;; along with many other Latin characters and a few other characters
> > +;; commonly used in English and basic math.
>
> … And may also mention it here.
Thanks, also done.
> > ("-" . [?])
> > - ("*." . [?·])
>
> The removal above doesn’t seem to be strictly necessary. The
> same for the *= and *u ones.
Thanks, fixed in the attached patch.
> … Also, did you consider generating this list automatically,
> based on the codepoint properties already known to Emacs?
> Something along the lines of the function MIMEd, which readily
> produces a list of entries for the following 133 characters.
> (Three spaces added for symmetry purposes.)
>
> À Á Â Ã Ä È É Ê Ë Ì Í Î Ï Ñ Ò Ó Ô Õ Ö Ù Ú Û Ü Ý
> à á â ã ä è é ê ë ì í î ï ñ ò ó ô õ ö ù ú û ü ý
> ÿ Ā ā Ć ć Ĉ ĉ Č č Ď ď Ē ē Ě ě Ĝ ĝ Ĥ ĥ Ĩ ĩ Ī ī Ĵ ĵ Ĺ ĺ
> Ľ ľ Ń ń Ň ň Ō ō Ŕ ŕ Ř ř Ś ś Ŝ ŝ Š š Ť ť Ũ ũ Ū ū Ŵ ŵ Ŷ ŷ
> Ÿ Ź ź Ž ž Ǎ ǎ Ǐ ǐ Ǒ ǒ Ǔ ǔ Ǧ ǧ Ǩ ǩ ǰ Ǵ ǵ Ǹ ǹ Ș ș Ț ț
> Ȟ ȟ Ȳ ȳ
Sorry, I don't really follow the code that you attached. Although I suppose it
comes from a decomposition table, I don't know what the table was designed for,
and it's not clear to me how it's relevant. Anyway, most of those letters are
either in iso-transl.el now, or are in the previously proposed patch. Here are
the exceptional (i.e., missing even in the previously proposed patch) letters,
along with some comments about these exceptions:
> Ǎ ǎ Ǐ ǐ Ǒ ǒ Ǔ ǔ Ǹ ǹ
These are for toned Pinyin but this list is incomplete. If we wanted to cover
toned Pinyin, we'd also need Ǖ ǖ Ǘ ǘ Ǚ ǚ Ǜ ǜ. Coming up with two-character
abbreviations for all these might be tricky. Most Pinyin usage omits the tones.
> Ǧ ǧ Ǩ ǩ
These are Skolt Sami but this list is also incomplete; we'd also need Ʒ Ǥ ǥ Ǯ ǯ
ʒ at least.
> ǰ
What language uses this? I couldn't find one.
> Ǵ ǵ
Good catch. These are used for transliteration from Serbian and Macedonian. We
should also include Ḱ ḱ as they are also needed. Included in the attached patch.
> Ȟ ȟ
Used in Finnish Kalo, which is quite obscure.
> Ȳ ȳ
Used in Livonian, but for that we'd also need a whole bunch of other letters,
including Ǟ ǟ Ḑ ḑ Ȫ ȫ Ȭ ȭ Ȯ ȯ Ȱ and I've probably omitted some. Plus, modern
Livonian doesn't seem to be using Ȳ ȳ any more....
Anyway, part of what's going on here is that the proposed list doesn't cover
every Latin character in the ISO 10646 repertoire (that'd be a large set), but
instead is limited to what appear to be reasonably commonly letters. Admittedly
this is not universal but one must cut things off somewhere, and it would be odd
to add only partial coverage for toned Pinyin, Livonian, etc.
> > --------------090904020002020306060104
> > Content-Type: text/x-patch;
> > name="0001-C-x-8-shorthands-for-curved-quotes-Euro-etc.patch"
>
> This MIME part sure wants ‘; charset=UTF-8’. Otherwise, Gnus
> does no decoding, and Emacs shows the contents with the likes of
> \304\260.
Hmm, it works for me. I use Thunderbird to read the top level message, and it
spins off an Emacs to display the attachment with no problem. The web-site
archive at <http://bugs.gnu.org/20499#60> also works for me with Firefox.
It's common for people to send the output of "git send-email" as attachments; if
this doesn't work with Gnus I suppose a Gnus user (i.e. not me :-) should file a
bug report. I looked around the net and found other Gnus users with similar
problems and some code that worked for them; please see
<http://bewatermyfriend.org/p/2011/00a/> and/or
<http://blog.printf.net/articles/tag/emacs/>. But this stuff appeared to be
several years old and this leads me to hope that maybe recent-enough Gnus
versions will do the right thing already.
[0001-C-x-8-shorthands-for-curved-quotes-Euro-etc.patch (text/x-patch, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Thu, 07 May 2015 10:01:01 GMT)
Full text and
rfc822 format available.
Message #108 received at 20499 <at> debbugs.gnu.org (full text, mbox):
>>>>> Paul Eggert <eggert <at> cs.ucla.edu> writes:
[…]
>> … Also, did you consider generating this list automatically, based
>> on the codepoint properties already known to Emacs? Something along
>> the lines of the function MIMEd, which readily produces a list of
>> entries for the following 133 characters. (Three spaces added for
>> symmetry purposes.)
>> À Á Â Ã Ä È É Ê Ë Ì Í Î Ï Ñ Ò Ó Ô Õ Ö Ù Ú Û Ü Ý
>> à á â ã ä è é ê ë ì í î ï ñ ò ó ô õ ö ù ú û ü ý
>> ÿ Ā ā Ć ć Ĉ ĉ Č č Ď ď Ē ē Ě ě Ĝ ĝ Ĥ ĥ Ĩ ĩ Ī ī Ĵ ĵ Ĺ ĺ
>> Ľ ľ Ń ń Ň ň Ō ō Ŕ ŕ Ř ř Ś ś Ŝ ŝ Š š Ť ť Ũ ũ Ū ū Ŵ ŵ Ŷ ŷ
>> Ÿ Ź ź Ž ž Ǎ ǎ Ǐ ǐ Ǒ ǒ Ǔ ǔ Ǧ ǧ Ǩ ǩ ǰ Ǵ ǵ Ǹ ǹ Ș ș Ț ț
>> Ȟ ȟ Ȳ ȳ
> Sorry, I don't really follow the code that you attached.
Which part, specifically?
It just iterates over the range given (or U+00A8 through U+02AF
by default) and maps “LATIN + COMBINING” decompositions to
'iso-transl entries. For example, it maps the (?g #x327)
decomposition (U+0327 being COMBINING CEDILLA) for U+0123 into
an (",g" . ģ) entry.
Or, rather, it /should/, for my code has an obvious typo:
(`(,c #x30c) (string ?v c))
(`(,c #x326) (string 59 c))
- (`(,c #x326) (string ?, c)))))
+ (`(,c #x327) (string ?, c)))))
Other possible additions (assuming we’ll agree on C-x 8 u,
C-x 8 .) are:
(`(,c #x304) (string ?= c))
+ (`(,c #x306) (string ?u c))
+ (`(,c #x307) (string ?. c))
(`(,c #x308) (string 34 c))
+ (`(,c #x30b) (string ?2 c))
(`(,c #x30c) (string ?v c))
> Although I suppose it comes from a decomposition table, I don't know
> what the table was designed for, and it's not clear to me how it's
> relevant.
I hope someone more knowledgeable could comment on this. Still,
this (ab)use of the data seem to work well in practice.
> Anyway, most of those letters are either in iso-transl.el now,
The point is to /remove/ them from 'iso-transl, as these entries
duplicate, in a way, a part of the decomposition table already
present in Emacs.
[…]
>> Ǎ ǎ Ǐ ǐ Ǒ ǒ Ǔ ǔ Ǹ ǹ
> These are for toned Pinyin but this list is incomplete. If we wanted
> to cover toned Pinyin, we'd also need Ǖ ǖ Ǘ ǘ Ǚ ǚ Ǜ ǜ. Coming up
> with two-character abbreviations for all these might be tricky.
But are we actually limited to two-character abbreviations only?
Why not allow for, say, C-x 8 " ' u?
[…]
>> ǰ
> What language uses this? I couldn't find one.
To quote NamesList.txt:
01F0 LATIN SMALL LETTER J WITH CARON
* IPA and many languages
>> Ǵ ǵ
> Good catch. These are used for transliteration from Serbian and
> Macedonian. We should also include Ḱ ḱ as they are also needed.
> Included in the attached patch.
The code I’ve suggested could be used to scan the U+1Exx range
just as well, thus resulting in the following set.
Ḑ ḑ Ḡ ḡ Ḧ ḧ Ḩ ḩ Ḱ ḱ Ḿ ḿ Ṕ ṕ Ṽ ṽ Ẁ ẁ Ẃ ẃ Ẅ ẅ Ẍ ẍ Ẑ ẑ ẗ Ẽ ẽ Ỳ ỳ Ỹ ỹ
[…]
> Anyway, part of what's going on here is that the proposed list
> doesn't cover every Latin character in the ISO 10646 repertoire
> (that'd be a large set), but instead is limited to what appear to be
> reasonably commonly letters. Admittedly this is not universal but
> one must cut things off somewhere, and it would be odd to add only
> partial coverage for toned Pinyin, Livonian, etc.
When it comes to the LATIN … LETTER WITH … letters, my proposal
for such a cut off would be to satisfy /both/ of the following
criteria:
• only cover specific Unicode ranges; such as, for instance,
U+00A8 through U+02AF, U+1E00 … U+1EFF, perhaps 2C60 … 2C7F;
• only cover the letters which can be represented with a
sufficiently general C-x 8 ⟨diacritic⟩+ ⟨ASCII-latin⟩ pattern.
Other characters deemed common may be added to the list.
>>> --------------090904020002020306060104
>>> Content-Type: text/x-patch;
>>> name="0001-C-x-8-shorthands-for-curved-quotes-Euro-etc.patch"
>> This MIME part sure wants ‘; charset=UTF-8’. Otherwise, Gnus does
>> no decoding, and Emacs shows the contents with the likes of
>> \304\260.
> Hmm, it works for me. I use Thunderbird to read the top level
> message, and it spins off an Emacs to display the attachment with no
> problem.
I can “spin off” cat(1) to read the offending MIME part, too:
Emacs will feed it raw-text, and interpret the result as UTF-8
(the default.)
It still does /not/ comply with the MIME specification.
Consider section 4.1.2 of RFC 2046:
RFC> […] The default character set, which must be assumed in the
RFC> absence of a charset parameter, is US-ASCII.
RFC 6657 updates this as follows:
RFC> Each subtype of the "text" media type that uses the "charset"
RFC> parameter can define its own default value for the "charset"
RFC> parameter, including the absence of any default.
However, given that ‘text/x-patch’ is not a /registered/ MIME
type, I believe the above does not apply.
> The web-site archive at <http://bugs.gnu.org/20499#60> also works for
> me with Firefox.
> It's common for people to send the output of "git send-email" as
> attachments;
If Thunderbird /knows/ the encoding (“character set”) of the
contents of the MIME part, it /should/ specify it in the MIME
part header. If the said contents is strictly 7-bit, it /could/
omit that (given that it’s more than likely to be US-ASCII.)
Otherwise, I guess Thunderbird should either ask the user for
the encoding /or/ send the part as application/octet-stream.
[…]
--
FSF associate member #7257 np. Satellite one — Purple Motion B6A0 230E 334A
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Thu, 07 May 2015 14:34:02 GMT)
Full text and
rfc822 format available.
Message #111 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> From: Ivan Shmakov <ivan <at> siamics.net>
> Date: Thu, 07 May 2015 07:14:34 +0000
>
> > Actually, we should drop the "ISO" part completely. Characters don't
> > belong to any encoding, they are entities that exists independently
> > of any encoding.
>
> ISO 10646 is also a /repertoire/ of characters; so unless
> 'iso-transl is going to get support for characters outside this
> particular set, the above will still be justified. Albeit
> mildly redundant, I guess.
We are splitting hair. But as long as we do, I see no reason to
promise or assume that iso-transl will always support only Unicode
codepoints; e.g., "C-x 8 RET" already supports more.
So I'd rather we dropped that reference entirely.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Thu, 07 May 2015 14:45:03 GMT)
Full text and
rfc822 format available.
Message #114 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> From: Ivan Shmakov <ivan <at> siamics.net>
> Date: Thu, 07 May 2015 10:00:38 +0000
>
> > Although I suppose it comes from a decomposition table, I don't know
> > what the table was designed for, and it's not clear to me how it's
> > relevant.
>
> I hope someone more knowledgeable could comment on this.
I'm not sure I'm your man, or what needs to be commented on, but I
will try nonetheless ;-)
The 'decomposition property of a character (as every other property
accessed by get-char-code-property) comes directly from Unicode
database. In this case, you will see that some characters in
UnicodeData.txt have this part non-empty:
1E99;LATIN SMALL LETTER Y WITH RING ABOVE;Ll;0;L;0079 030A;;;;N;;;;;
^^^^^^^^^
This gives the so-called "canonical decomposition" of the character;
in this case, we are told that U+1E99's decomposition is a sequence of
U+0079 (lower-case y) followed by U+030A (combining ring above).
Some characters have "compatibility decompositions" instead, like
this:
1E9A;LATIN SMALL LETTER A WITH RIGHT HALF RING;Ll;0;L;<compat> 0061 02BE;;;;N;;;;;
^^^^^^^^^^^^^^^^^^
which is useful for collation-driven sorting and for loose comparisons
a-la string-collate-lessp.
For more details about this, see http://unicode.org/reports/tr44/, the
Unicode Technical Report that describes the Unicode Character
Database.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Thu, 07 May 2015 17:05:03 GMT)
Full text and
rfc822 format available.
Message #117 received at 20499 <at> debbugs.gnu.org (full text, mbox):
>> … Also, did you consider generating this list automatically,
>> based on the codepoint properties already known to Emacs?
[...]
> Sorry, I don't really follow the code that you attached. Although I suppose
> it comes from a decomposition table, I don't know what the table was
> designed for, and it's not clear to me how it's relevant. Anyway, most of
I'm not sure exactly what he wanted to say, but it sounds to me like
it's going in the same direction as my earlier request to replace the
hard-coded table by code that auto-generates the cases.
There is already similar code in latin-ltx.el (written by yours truly).
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Thu, 07 May 2015 22:23:02 GMT)
Full text and
rfc822 format available.
Message #120 received at 20499 <at> debbugs.gnu.org (full text, mbox):
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> Maybe I don't understand the use case you have in mind. I thought the
> use case was that you already know the character's name, at least
> approximately, and want to look up its code, to type is faster.
I know what the character looks like. It is NOT easy to guess
what the name would be. There are many possibilities.
> > C-x 8 RET is even worse than that, because it requires
> > _copying_ the name of the character. To actually see the character
> > point is on requires
> > M-f C-f C-SPC C-s ; C-b M-w C-a C-x 8 RET C-y SPC
> "C-x 8 RET" accepts the codepoint in hex, so if you are already
> looking at the line that defines the character, all you need is to
> type a 4-, sometimes 5-hex-digit number.
> And if you want to type the name, "C-x 8 RET" provides completion, so
> no need for such a complicated dance for copying the name.
Are you kidding? Just to see 32 characters' glyphs
I'd have to type 128 input characters.
The feature I want would show 32 glyphs on each line,
and many lines would fit on the screen at once.
--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org www.gnu.org
Skype: No way! See stallman.org/skype.html.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Fri, 08 May 2015 05:49:02 GMT)
Full text and
rfc822 format available.
Message #123 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> Date: Thu, 07 May 2015 18:22:25 -0400
> From: Richard Stallman <rms <at> gnu.org>
> CC: ivan <at> siamics.net, 20499 <at> debbugs.gnu.org
>
> > Maybe I don't understand the use case you have in mind. I thought the
> > use case was that you already know the character's name, at least
> > approximately, and want to look up its code, to type is faster.
>
> I know what the character looks like. It is NOT easy to guess
> what the name would be. There are many possibilities.
If that's the use case (I don't think you described it before), then
we indeed need a convenient facility to browse character glyphs. But
that facility should allow to specify additional information, such as
the script name, or block name, or the base character, otherwise you
are likely to give up due to the sheer number of characters to view.
> > > C-x 8 RET is even worse than that, because it requires
> > > _copying_ the name of the character. To actually see the character
> > > point is on requires
> > > M-f C-f C-SPC C-s ; C-b M-w C-a C-x 8 RET C-y SPC
>
> > "C-x 8 RET" accepts the codepoint in hex, so if you are already
> > looking at the line that defines the character, all you need is to
> > type a 4-, sometimes 5-hex-digit number.
>
> > And if you want to type the name, "C-x 8 RET" provides completion, so
> > no need for such a complicated dance for copying the name.
>
> Are you kidding? Just to see 32 characters' glyphs
> I'd have to type 128 input characters.
No, you need to type much less. A codepoint, if you know it, is at
most 5 characters, and for name completion, typing something like
C-x 8 RET greek <TAB> <TAB>
(all in all 10 characters) will have the completions buffer pop up.
Each completion candidate has the character glyph displayed right next
to it, so you could use that for finding the one you are looking for.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Fri, 08 May 2015 18:48:02 GMT)
Full text and
rfc822 format available.
Message #126 received at 20499 <at> debbugs.gnu.org (full text, mbox):
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> If that's the use case (I don't think you described it before), then
> we indeed need a convenient facility to browse character glyphs. But
> that facility should allow to specify additional information, such as
> the script name, or block name, or the base character, otherwise you
> are likely to give up due to the sheer number of characters to view.
I agree that those additional features would make it better.
--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org www.gnu.org
Skype: No way! See stallman.org/skype.html.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Fri, 08 May 2015 18:48:03 GMT)
Full text and
rfc822 format available.
Message #129 received at 20499 <at> debbugs.gnu.org (full text, mbox):
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> > Are you kidding? Just to see 32 characters' glyphs
> > I'd have to type 128 input characters.
> No, you need to type much less. A codepoint, if you know it, is at
> most 5 characters,
I miscalculated. C-x 8 RET codepoint RET is 8 characters (or 9).
Thus, to see 32 characters' glyphs that way, I'd need to type
between 256 and 288 input characters.
> and for name completion, typing something like
> C-x 8 RET greek <TAB> <TAB>
That is a lot less input than the other method, and is sort of usable,
but inconvenient. I tried it in that very case.
It includes Coptic characters as well as Greek; I don't know why.
It also includes many punctuation characters, and letters with diacritics,
that are in a different part of Unicode, and are not normal Greek letters.
If I could see the glyphs of the area of Unicode which alpha is in, I could
easily see the character I want.
And when I want to enter some non-ASCII punctuator, if I could see
the glyphs of that part of Unicode, it would be easy.
I don't want to have to remember their official names.
--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org www.gnu.org
Skype: No way! See stallman.org/skype.html.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Fri, 08 May 2015 20:19:02 GMT)
Full text and
rfc822 format available.
Message #132 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> > We could set up a way to test whether a code point can be
> > displayed, and skip scripts that can't be displayed.
>
> Alas, we don't know which cannot be displayed until we've tried and
> failed.
Where is this try-and-fail done? Is it only in C code, or is
there some Lisp function (predicate) that you can call to tell
you whether a given char can be displayed in a given (e.g. the
current) font.
Even if such a predicate would need to try displaying, to find
out whether it is possible, this could be useful.
It would be good if we could, for example, optionally show only
chars that the current font can display.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Sat, 09 May 2015 00:05:02 GMT)
Full text and
rfc822 format available.
Message #135 received at 20499 <at> debbugs.gnu.org (full text, mbox):
The discussion has gone in a few directions beyond `C-x 8 shorthands'.
I understand that Richard would like a help buffer that groups
multiple glyphs together in blocks or in categories of various kinds.
I don't have that to offer, but maybe this would help in a different
way: library `apu.el' provides apropos help for Unicode chars.
Command `apropos-unicode' shows you the Unicode chars that match
an apropos pattern you specify: a regexp or a space-separated list
of words. The chars whose names match are shown in a help buffer,
along with the names and code points (decimal and hex).
You can keep several such buffers open, for use with different
subsets of chars you are interested in.
In the help buffer, you can use these keys to act on the char
described on the current line:
* `RET' or `mouse-2' - see info about it (`C-u C-x =' output).
* `i' - google for more information about it.
* `^' - insert it at point in the buffer where you invoked
`apropos-unicode'.
* `c' - define a command to insert it that has the same name.
E.g. `greek-small-letter-phi'. (You need library
`ucs-cmds.el' for this.)
* `k' - globally bind a key to insert it.
* `l' - locally bind a key to insert it.
* `M-w' - copy it to the `kill-ring'.
* `M-y' - copy it to the secondary selection.
The library is here: http://www.emacswiki.org/emacs/download/apu.el.
TODO maybe:
* Pop-up a glyph enlargement (e.g., by mouseover or key).
* Be able to match code points too in the pattern.
* Be able to choose chars of a given syntax class or other group.
* Add a header line and use it to sort by different columns.
* Add an option of patterns to exclude from matches, to exclude
things like `TAG' and `VARIATION SELECTOR'.
* Be able to easily match a base char. You can do this OK now
using a regexp such as ` \(BASE-CHAR \|$\)', but maybe there
is a better way.
Is there a good way to exclude chars whose glyphs are essentially
(apparently) whitespace, e.g., `MUSICAL SYMBOL END TIE'?
Is there a way to exclude chars that cannot be shown in the current
font? (Asked previously.)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Sat, 09 May 2015 07:45:04 GMT)
Full text and
rfc822 format available.
Message #138 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> Date: Fri, 08 May 2015 14:46:58 -0400
> From: Richard Stallman <rms <at> gnu.org>
> CC: ivan <at> siamics.net, 20499 <at> debbugs.gnu.org
>
> > > Are you kidding? Just to see 32 characters' glyphs
> > > I'd have to type 128 input characters.
>
> > No, you need to type much less. A codepoint, if you know it, is at
> > most 5 characters,
>
> I miscalculated. C-x 8 RET codepoint RET is 8 characters (or 9).
> Thus, to see 32 characters' glyphs that way, I'd need to type
> between 256 and 288 input characters.
If you are not looking for a single specific character by its
codepoint, then typing the codepoint makes no sense.
> > and for name completion, typing something like
>
> > C-x 8 RET greek <TAB> <TAB>
>
> That is a lot less input than the other method, and is sort of usable,
> but inconvenient. I tried it in that very case.
>
> It includes Coptic characters as well as Greek; I don't know why.
I don't know either. If I type TAB after just "greek", then I see no
Coptic characters in completion candidates. What did you type before
asking for completion?
> It also includes many punctuation characters, and letters with
> diacritics, that are in a different part of Unicode, and are not
> normal Greek letters.
This is simple Emacs completion at work: it brings you every character
whose name begins with "GREEK".
In any case, when I complete on "greek", I see only punctuation and
diacriticals from the same block as alpha, so I don't think we show
irrelevant punctuation. We do show some ancient characters from other
Greek blocks than the one where alpha lives, but they are not
punctuation.
As for letters with diacriticals, how would Emacs know that you don't
need those? I think the use case where the user looks for characters
with diacriticals is much more plausible than when she looks for some
simple character like alpha. But if we think that looking for
characters "with diacriticals" or "without diacriticals" is an
important use case, we could provide that as well, based on the
'decomposition' property of the characters.
> If I could see the glyphs of the area of Unicode which alpha is in, I could
> easily see the character I want.
If you only want letters, you can give a more accurate spec to
completion: "C-x 8 RET greek*letter <TAB> <TAB>". (The asterisk is a
wildcard character.) That still produces quite a long list, but no
symbols, punctuation, or lone diacriticals.
Alternatively, you'd need to know the Unicode block in which those
characters live, or find it by completing on block names. (This
block's name is "Greek and Coptic".)
> And when I want to enter some non-ASCII punctuator, if I could see
> the glyphs of that part of Unicode, it would be easy.
> I don't want to have to remember their official names.
Only a small part of (language- and script-agnostic) punctuation
characters have their own block. The language-specific punctuation is
in the same block as their main characters.
We could have a feature which would display punctuation characters,
either specific to a language/script or not. Such a feature would
need to use [:punct:] regexp (we'd need to extend [:punct:] to use
Unicode character properties). Similarly, using [:alpha:] would bring
only letters.
I hope you now agree that the use case of searching for a character
with only some vague idea about its appearance and/or name needs some
pretty sophisticated (and overlapping) capabilities for allowing the
user to specify what she knows, before showing the possible
candidates. I'm not really sure what would be a good UI for such
specifications; perhaps something using the widget library a-la
Customize, where you can check or uncheck certain options and specify
values for non-boolean fields.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Sat, 09 May 2015 08:00:07 GMT)
Full text and
rfc822 format available.
Message #141 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> Date: Fri, 8 May 2015 13:18:40 -0700 (PDT)
> From: Drew Adams <drew.adams <at> oracle.com>
> Cc: 20499 <at> debbugs.gnu.org
>
> > > We could set up a way to test whether a code point can be
> > > displayed, and skip scripts that can't be displayed.
> >
> > Alas, we don't know which cannot be displayed until we've tried and
> > failed.
>
> Where is this try-and-fail done? Is it only in C code, or is
> there some Lisp function (predicate) that you can call to tell
> you whether a given char can be displayed in a given (e.g. the
> current) font.
These two are not alternatives, they can (and do) live together.
The search for a suitable font is mostly in C, but we do have a
capability to test from Lisp whether a given character can be
displayed: 'char-displayable-p'. If you are interested in a specific
font, you can use 'font-get-glyphs' for a similar info.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Sat, 09 May 2015 08:23:03 GMT)
Full text and
rfc822 format available.
Message #144 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> Date: Fri, 8 May 2015 17:03:53 -0700 (PDT)
> From: Drew Adams <drew.adams <at> oracle.com>
> Cc: 20499 <at> debbugs.gnu.org
>
> I understand that Richard would like a help buffer that groups
> multiple glyphs together in blocks or in categories of various kinds.
>
> I don't have that to offer, but maybe this would help in a different
> way: library `apu.el' provides apropos help for Unicode chars.
>
> Command `apropos-unicode' shows you the Unicode chars that match
> an apropos pattern you specify: a regexp or a space-separated list
> of words. The chars whose names match are shown in a help buffer,
> along with the names and code points (decimal and hex).
I hope I've succeeded to explain in my previous messages that just
matching the name against a regexp is not enough: you will most of the
time get a lot of candidates. IOW, it's not focused enough, and the
reason is that the name of a character doesn't tell enough about the
character to be able to filter them only based on their names.
What we need is selection of candidates based on the character
attributes, and their language/script/block. This could, of course,
use the completion/apropos infrastructure, but the completion
predicates must be smarter, and we should have a suitable UI for the
user to specify her partial knowledge of the characters she is after.
If you or someone else wants to work on this, I can provide advice as
to how to use Unicode character properties for such filtering.
> * Add an option of patterns to exclude from matches, to exclude
> things like `TAG' and `VARIATION SELECTOR'.
The UI cannot be in these technical terms, because the user will most
probably fail to understand what that means for the search results.
E.g., it's quite probable that someone who wants an emoji characters
_will_ want the VARIATION SELECTOR included, but how many users will
understand that excluding it will not allow them to specify emoji
style of certain characters?
> * Be able to easily match a base char. You can do this OK now
> using a regexp such as ` \(BASE-CHAR \|$\)', but maybe there
> is a better way.
I suggested the Custom-style interface using widgets.
> Is there a good way to exclude chars whose glyphs are essentially
> (apparently) whitespace, e.g., `MUSICAL SYMBOL END TIE'?
I'm not sure "mostly whitespace" is a good specification for those. I
suppose someone who wants musical symbols will want this one as well.
> Is there a way to exclude chars that cannot be shown in the current
> font? (Asked previously.)
Answered previously.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Sat, 09 May 2015 14:18:02 GMT)
Full text and
rfc822 format available.
Message #147 received at 20499 <at> debbugs.gnu.org (full text, mbox):
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> > That is a lot less input than the other method, and is sort of usable,
> > but inconvenient. I tried it in that very case.
> >
> > It includes Coptic characters as well as Greek; I don't know why.
> I don't know either. If I type TAB after just "greek", then I see no
> Coptic characters in completion candidates. What did you type before
> asking for completion?
I typed C-x 8 RET greek TAB TAB.
All the NAMES that appear start with "Greek", but when I inserted
GREEK CAPITAL LETTER HORI and examined it with C-u C-x =,
it said
name: COPTIC CAPITAL LETTER HORI
old-name: GREEK CAPITAL LETTER HORI
I didn't notice the old-name field the previous time. I suppose that
explains why it was included in that completion table. Anyway that
completion list is over 440 lines long, and not very useful.
> > It also includes many punctuation characters, and letters with
> > diacritics, that are in a different part of Unicode, and are not
> > normal Greek letters.
> This is simple Emacs completion at work: it brings you every character
> whose name begins with "GREEK".
Do you think I don't know that?
_Why_ it does what it does is not the issue. The only pertinent point
is that that it isn't a convenient way to do what I want to do.
> As for letters with diacriticals, how would Emacs know that you don't
> need those?
That question is spurious. Remember, I don't want to enter a
character name at all. I want to see all the glyphs.
Someone else suggested that C-x 8 RET might be a convenient alternate
method. I am explaining why it isn't.
If I had the feature I want, I would see the segment including the
usual Greek letters, and the far more numerous diacriticalized ones
would not be there (because they come later in Unicode).
> If you only want letters, you can give a more accurate spec to
> completion: "C-x 8 RET greek*letter <TAB> <TAB>". (The asterisk is a
> wildcard character.) That still produces quite a long list,
Indeed, it is still inconvenient.
> I hope you now agree that the use case of searching for a character
> with only some vague idea about its appearance and/or name needs some
> pretty sophisticated (and overlapping) capabilities for allowing the
> user to specify what she knows, before showing the possible
> candidates.
We seem to be totally miscommunicating. I DON'T WANT to search for
them by name. I never asked for that.
--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org www.gnu.org
Skype: No way! See stallman.org/skype.html.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Sat, 09 May 2015 14:37:02 GMT)
Full text and
rfc822 format available.
Message #150 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> Date: Sat, 09 May 2015 10:17:15 -0400
> From: Richard Stallman <rms <at> gnu.org>
> CC: ivan <at> siamics.net, 20499 <at> debbugs.gnu.org
>
> I typed C-x 8 RET greek TAB TAB.
>
> All the NAMES that appear start with "Greek", but when I inserted
> GREEK CAPITAL LETTER HORI and examined it with C-u C-x =,
> it said
>
> name: COPTIC CAPITAL LETTER HORI
> old-name: GREEK CAPITAL LETTER HORI
>
> I didn't notice the old-name field the previous time. I suppose that
> explains why it was included in that completion table.
Yes. Greek and Coptic characters share the same Unicode block.
> > > It also includes many punctuation characters, and letters with
> > > diacritics, that are in a different part of Unicode, and are not
> > > normal Greek letters.
>
> > This is simple Emacs completion at work: it brings you every character
> > whose name begins with "GREEK".
>
> Do you think I don't know that?
Do you think I don't know you know?
You asked me some questions that you should be sure I knew also, and
yet I didn't react like that.
I find your attitude in this thread unnecessarily offensive.
> > I hope you now agree that the use case of searching for a character
> > with only some vague idea about its appearance and/or name needs some
> > pretty sophisticated (and overlapping) capabilities for allowing the
> > user to specify what she knows, before showing the possible
> > candidates.
>
> We seem to be totally miscommunicating. I DON'T WANT to search for
> them by name. I never asked for that.
Where did I mentioned search by name? I didn't, because I really
don't think it's convenient enough. It's what we have now, but it is
not what I think should be the method of looking up an unknown
character.
But your idea of showing dozens or hundreds of characters isn't
workable, either.
Like I wrote elsewhere, we need a way for the user to specify what she
knows, and then show the characters that match the spec. The
specification could include one or more of the following:
. Script name
. Language name
. Unicode block name
. Character class (alphabetical, numerical, punctuation, etc.)
. Base character
. With/without diacriticals
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 11 May 2015 00:53:02 GMT)
Full text and
rfc822 format available.
Message #153 received at 20499 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Stefan Monnier wrote:
> I'm not sure exactly what he wanted to say, but it sounds to me like
> it's going in the same direction as my earlier request to replace the
> hard-coded table by code that auto-generates the cases.
> There is already similar code in latin-ltx.el (written by yours truly).
OK, thanks, in that case this will need some thinking, since the code in
latin-ltx.el suffers from the same problems I mentioned in
<http://bugs.gnu.org/20499#105>: from a user's point of view the supported
characters are a haphazard list. E.g., it adds some chars for Pinyin tones but
not others. Partly the problem is that it adds "easy" Latin letters like ȳ even
though nobody uses them, but not "hard" ones like ǚ even though they're actually
used on occasion.
Fixing this will take some thinking, because we'll need to devise ways to type
the "hard" Latin letters. I suppose latin-ltx and iso-transl should use similar
approaches here.
In the meantime, though, there is a need to type non-Latin punctuation like
dashes and quotation marks. That part of the patch seems relatively independent
of the Latin-letter issue, so I installed the attached. I hope to look into the
Latin-letter issue later.
[0001-C-x-8-shorthands-for-curved-quotes-Euro-etc.patch (text/x-patch, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 11 May 2015 01:29:02 GMT)
Full text and
rfc822 format available.
Message #156 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> your idea of showing dozens or hundreds of characters isn't
> workable, either.
It sounds workable to me, as I've used similar interfaces elsewhere, and they
work reasonably well. They're not as good as an input method if you're an
expert in the method, but they're much better than nothing when you're a
non-expert and don't have the time to learn an input method but just want to
enter a few unusual characters.
For example, if I visit English Wikipedia page for Emacs:
http://en.wikipedia.org/wiki/Emacs
and push the "Edit" button, I'll get to this page:
http://en.wikipedia.org/w/index.php?title=Emacs&action=edit
which gives me a list of buttons for inserting any of "– — ° ′ ″ ≈ ≠ ≤ ≥ ± − × ÷
← → · §", which I can just push directly to insert the corresponding character.
Or I can push the "Latin" button and then insert any of:
A a Á á À à Â â Ä ä Ǎ ǎ Ă ă Ā ā Ã ã Å å Ą ą Æ æ Ǣ ǣ B b C c Ć ć Ċ ċ Ĉ ĉ Č č
Ç ç D d Ď ď Đ đ Ḍ ḍ Ð ð E e É é È è Ė ė Ê ê Ë ë Ě ě Ĕ ĕ Ē ē Ẽ ẽ Ę ę Ẹ ẹ Ɛ ɛ
Ǝ ǝ Ə ə F f G g Ġ ġ Ĝ ĝ Ğ ğ Ģ ģ H h Ĥ ĥ Ħ ħ Ḥ ḥ I i İ ı Í í Ì ì Î î Ï ï
Ǐ ǐ Ĭ ĭ Ī ī Ĩ ĩ Į į Ị ị J j Ĵ ĵ K k Ķ ķ L l Ĺ ĺ Ŀ ŀ Ľ ľ Ļ ļ Ł ł Ḷ ḷ Ḹ ḹ
M m Ṃ ṃ N n Ń ń Ň ň Ñ ñ Ņ ņ Ṇ ṇ Ŋ ŋ O o Ó ó Ò ò Ô ô Ö ö Ǒ ǒ Ŏ ŏ Ō ō Õ õ Ǫ
ǫ Ọ ọ Ő ő Ø ø Œ œ Ɔ ɔ P p Q q R r Ŕ ŕ Ř ř Ŗ ŗ Ṛ ṛ Ṝ ṝ S s Ś ś Ŝ ŝ Š š
Ş ş Ș ș Ṣ ṣ ß T t Ť ť Ţ ţ Ț ț Ṭ ṭ Þ þ U u Ú ú Ù ù Û û Ü ü Ǔ ǔ Ŭ ŭ Ū ū Ũ ũ Ů
ů Ų ų Ụ ụ Ű ű Ǘ ǘ Ǜ ǜ Ǚ ǚ Ǖ ǖ V v W w Ŵ ŵ X x Y y Ý ý Ŷ ŷ Ÿ ÿ Ỹ ỹ Ȳ ȳ
Z z Ź ź Ż ż Ž ž ß Ð ð Þ þ Ŋ ŋ Ə ə
This is all easy to do even if I don't remember the editing interface, and
unlike Emacs's C-x 8 it handles Pinyin tones, dotless i, etc., etc. This seems
to be the sort of thing that RMS is asking for, and I don't see why it wouldn't
work for Emacs.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 11 May 2015 01:56:03 GMT)
Full text and
rfc822 format available.
Message #159 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> It just iterates over the range given (or U+00A8 through U+02AF
> by default) and maps “LATIN + COMBINING” decompositions to
> 'iso-transl entries.
Thanks for the explanation.
> But are we actually limited to two-character abbreviations only?
> Why not allow for, say, C-x 8 " ' u?
We can do that, but only if the combining prefixes are distinct from the letters
themselves. My previous proposal didn't do that, e.g., it used "u" for breve,
which would make things like "C-x 8 , u E" ambiguous (is that u with a cedilla
followed by plain E, or E with a cedilla and breve?). So I guess more thought
is needed.
PS. about patches and attachments:
> However, given that ‘text/x-patch’ is not a /registered/ MIME
> type, I believe the above does not apply.
Once one starts using x-* types anything goes, is my impression.
> If Thunderbird /knows/ the encoding (“character set”) of the
> contents of the MIME part,
It doesn't, which is why Thunderbird doesn't say.
Regardless of what one's opinion of what the standard says or should say, it's
pretty clear that these sorts of attachments are often sent and generally work;
if they don't work with Gnus then that's probably a Gnus bug report worth
filing. The Gnus manual says one should report a bug with "M-x gnus-bug". I
tried that, but it complained "Gnus has been shut down", so I gave up. Since
you're a Gnus user, I hope you can take on the task of filing a bug report.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 11 May 2015 02:26:02 GMT)
Full text and
rfc822 format available.
Message #162 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> Fixing this will take some thinking, because we'll need to devise ways to
> type the "hard" Latin letters.
Indeed.
> I suppose latin-ltx and iso-transl should use similar approaches here.
Of course, in my ideal world, iso-transl and latin-ltx should not just
use similar approaches, but C-x 8 should basically work like a kind of
"enable TeX input method just for this char, and pre-insert \".
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 11 May 2015 14:56:02 GMT)
Full text and
rfc822 format available.
Message #165 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> Date: Sun, 10 May 2015 18:28:17 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: 20499 <at> debbugs.gnu.org, Richard Stallman <rms <at> gnu.org>
>
> your idea of showing dozens or hundreds of characters isn't
> workable, either.
>
> It sounds workable to me, as I've used similar interfaces elsewhere, and they work reasonably well. They're not as good as an input method if you're an expert in the method, but they're much better than nothing when you're a non-expert and don't have the time to learn an input method but just want to enter a few unusual characters.
At least the last part of this thread was about _finding_ the
character, if you have only partial information about it. My comment
above was about that use case, and that use case only. You seem to be
talking about a different use case: when the user already knows quite
well which character she wants.
> For example, if I visit English Wikipedia page for Emacs:
>
> http://en.wikipedia.org/wiki/Emacs
>
> and push the "Edit" button, I'll get to this page:
>
> http://en.wikipedia.org/w/index.php?title=Emacs&action=edit
>
> which gives me a list of buttons for inserting any of "– — ° ′ ″ ≈ ≠ ≤ ≥ ± − × ÷ ← → · §", which I can just push directly to insert the corresponding character.
This is the case where you know a very small subset of characters from
which to choose. But even here, how do you know whether you need '–',
'—', or '−'? Or maybe you want '⸺' or even '⸻' instead (they are not
shown in the list offered by Wikipedia)? Likewise, there are many
more quote characters than the above offers.
In general, punctuation characters fill 2 full blocks of codepoints,
so finding the one you need is more than just selecting out of less
than 20 characters someone decided for you they are all you'll need.
> Or I can push the "Latin" button and then insert any of:
>
> A a Á á À à  â Ä ä Ǎ ǎ Ă ă Ā ā à ã Å å Ą ą Æ æ Ǣ ǣ B b C c Ć ć Ċ ċ Ĉ ĉ Č č Ç ç D d Ď ď Đ đ Ḍ ḍ Ð ð E e É é È è Ė ė Ê ê Ë ë Ě ě Ĕ ĕ Ē ē Ẽ ẽ Ę ę Ẹ ẹ Ɛ ɛ Ǝ ǝ Ə ə F f G g Ġ ġ Ĝ ĝ Ğ ğ Ģ ģ H h Ĥ ĥ Ħ ħ Ḥ ḥ I i İ ı Í í Ì ì Î î Ï ï Ǐ ǐ Ĭ ĭ Ī ī Ĩ ĩ Į į Ị ị J j Ĵ ĵ K k Ķ ķ L l Ĺ ĺ Ŀ ŀ Ľ ľ Ļ ļ Ł ł Ḷ ḷ Ḹ ḹ M m Ṃ ṃ N n Ń ń Ň ň Ñ ñ Ņ ņ Ṇ ṇ Ŋ ŋ O o Ó ó Ò ò Ô ô Ö ö Ǒ ǒ Ŏ ŏ Ō ō Õ õ Ǫ ǫ Ọ ọ Ő ő Ø ø Œ œ Ɔ ɔ P p Q q R r Ŕ ŕ Ř ř Ŗ ŗ Ṛ ṛ Ṝ ṝ S s Ś ś Ŝ ŝ Š š Ş ş Ș ș Ṣ ṣ ß T t Ť ť Ţ ţ Ț ț Ṭ ṭ Þ þ U u Ú ú Ù ù Û û Ü ü Ǔ ǔ Ŭ ŭ Ū ū Ũ ũ Ů ů Ų ų Ụ ụ Ű ű Ǘ ǘ Ǜ ǜ Ǚ ǚ Ǖ ǖ V v W w Ŵ ŵ X x Y y Ý ý Ŷ ŷ Ÿ ÿ Ỹ ỹ Ȳ ȳ Z z Ź ź Ż ż Ž ž ß Ð ð Þ þ Ŋ ŋ Ə ə
Again, this is a different use case: you need already to know your
character is one of the "Latin" characters. And they cheat: what you
see is a subset of the characters that someone decided for you they
are all you need. (For example, "Math and logic" has ∫, ∬, and ∭, but
not ⨌; "Latin" lacks the entire Latin Extended-B, -C, -D, and Latin
Extended Additional blocks; etc.)
IOW, the above selection is highly filtered using some unspecified
rules, and therefore it at best emulates a use case where the user has
a pretty good knowledge about what she wants to find. And still, you
need to select out of about 300 characters.
How's that workable, except in very simple use cases?
> This is all easy to do even if I don't remember the editing interface, and unlike Emacs's C-x 8 it handles Pinyin tones, dotless i, etc., etc. This seems to be the sort of thing that RMS is asking for, and I don't see why it wouldn't work for Emacs.
It would work for Emacs. The question is, would it be convenient for
users?
We should be able to do better than the example you show, i.e. allow
the user to define what she knows about the character she is looking
for, and then present the characters matching that description. (I
presented earlier the provisional list of attributes I think will be
useful as part of such a description.) We definitely shouldn't assume
we know better than the user which characters she might or might not
want the way Wikipedia does. And we should allow the users to
leverage more accurate information, if they have it. For example, if
you know that the character you are looking for is some form of a
Latin 'a', then we could present only those (there are 36 of them in
the current UCD).
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 11 May 2015 15:53:02 GMT)
Full text and
rfc822 format available.
Message #168 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> IOW, the above selection is highly filtered using some unspecified
> rules, and therefore it at best emulates a use case where the user has
> a pretty good knowledge about what she wants to find. And still, you
> need to select out of about 300 characters.
> How's that workable, except in very simple use cases?
It's workable in the following way:
- first time around, you'll have to scan all those chars, which will
take a little while.
- second time around you'll also have to scan them, but it will take
a bit less time.
- ...
- Nth time around, you'll either know more or less where the char is so
you don't need to scan all those chars any more, or you'll have
learned some other way to insert the char.
That's what I do every once in a while using the symbols.dvi document,
looking for how to enter some funny-looking math symbols in LaTeX.
I generally have no clue whatsoever how the symbol might be called when
I do such searches.
And I agree that further refinement (such as restricting the display to
those glyphs that have an "e" in them, which would include all the
weirdly accented forms of "e" and probably the upper case forms as
well) would be a nice addition.
E.g. it would be great to be able to say "it's char that has a > in its
glyph" and then be presented with things like ≥, right angle brackets,
right arrows, ...
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 11 May 2015 16:18:01 GMT)
Full text and
rfc822 format available.
Message #171 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: Paul Eggert <eggert <at> cs.ucla.edu>, rms <at> gnu.org, 20499 <at> debbugs.gnu.org
> Date: Mon, 11 May 2015 11:52:40 -0400
>
> > IOW, the above selection is highly filtered using some unspecified
> > rules, and therefore it at best emulates a use case where the user has
> > a pretty good knowledge about what she wants to find. And still, you
> > need to select out of about 300 characters.
>
> > How's that workable, except in very simple use cases?
>
> It's workable in the following way:
> - first time around, you'll have to scan all those chars, which will
> take a little while.
> - second time around you'll also have to scan them, but it will take
> a bit less time.
> - ...
> - Nth time around, you'll either know more or less where the char is so
> you don't need to scan all those chars any more, or you'll have
> learned some other way to insert the char.
>
> That's what I do every once in a while using the symbols.dvi document,
> looking for how to enter some funny-looking math symbols in LaTeX.
I admire your patience. When I need to do this, I generally give up
in despair very quickly. And unless I need the same character over
and over again, my Nth time looks very similar to my first.
> And I agree that further refinement (such as restricting the display to
> those glyphs that have an "e" in them, which would include all the
> weirdly accented forms of "e" and probably the upper case forms as
> well) would be a nice addition.
I can try writing a back-end (that thing that takes a list of criteria
and returns a list of codepoints or ranges to display) for this, if
someone will then add a UI for the user to specify the constraints and
for display of the results.
> E.g. it would be great to be able to say "it's char that has a > in its
> glyph" and then be presented with things like ≥, right angle brackets,
> right arrows, ...
Yep.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 11 May 2015 18:28:02 GMT)
Full text and
rfc822 format available.
Message #174 received at 20499 <at> debbugs.gnu.org (full text, mbox):
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> Or I can push the "Latin" button and then insert any of:
Indeed, that is what I'd like.
--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org www.gnu.org
Skype: No way! See stallman.org/skype.html.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 11 May 2015 18:49:02 GMT)
Full text and
rfc822 format available.
Message #177 received at 20499 <at> debbugs.gnu.org (full text, mbox):
On 05/11/2015 07:54 AM, Eli Zaretskii wrote:
> IOW, the above selection is highly filtered using some unspecified rules
Sure, and I expect that what Wikipedia has done is seen which characters
get used the most, give a trivial UI for the most-commonly used dozen or
so non-ASCII characters, a simple UI for the most-commonly used
few-hundred non-ASCII characters, and a more-complex UI for the rest.
It's a reasonable design approach.
> For example, if you know that the character you are looking for is
> some form of a Latin 'a', then we could present only those (there are
> 36 of them in the current UCD).
That all sounds good, for users who know that there's a way to get that
list of "A"-like characters. It would be good also to cater to people
who are less expert, and who only know something simple like "type the
Alt-FOO key if you want to type weird characters". Perhaps a top-level
menu that gives a dozen or so of the most-common characters and also
says "type an "A" to get the "A"-like letters", and "press this button
to get Greek", etc.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Mon, 11 May 2015 19:12:01 GMT)
Full text and
rfc822 format available.
Message #180 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> Date: Mon, 11 May 2015 11:48:36 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: 20499 <at> debbugs.gnu.org, rms <at> gnu.org
>
> On 05/11/2015 07:54 AM, Eli Zaretskii wrote:
> > IOW, the above selection is highly filtered using some unspecified rules
>
> Sure, and I expect that what Wikipedia has done is seen which characters
> get used the most, give a trivial UI for the most-commonly used dozen or
> so non-ASCII characters, a simple UI for the most-commonly used
> few-hundred non-ASCII characters, and a more-complex UI for the rest.
> It's a reasonable design approach.
But it's not Emacsy, not to my palate. Emacs never arbitrarily limits
the user without offering some ways to lift the limits.
> > For example, if you know that the character you are looking for is
> > some form of a Latin 'a', then we could present only those (there are
> > 36 of them in the current UCD).
>
> That all sounds good, for users who know that there's a way to get that
> list of "A"-like characters.
The way I envision it, the UI to specify the characters you are
looking for will have a widget named "Looks like ..." or "Base
character", and users who are looking for 'a' with some diacriticals
will type "a" there.
> Perhaps a top-level menu that gives a dozen or so of the most-common
> characters
I think "most-common characters" can only be reasonably offered once
the user supplied a language or script. Most-common Latin characters
are different from most-common Cyrillic characters or Greek or Hebrew
or Math symbols.
> and also says "type an "A" to get the "A"-like letters", and "press
> this button to get Greek", etc.
I don't think a single button will do. At least it should be possible
to press both "Greek" and "with/without diacriticals", and possibly
also other constraints, like with/without punctuation.
IOW, we need to let users specify several constraints, and display
whatever matches them. If they only specify the script, like "Latin",
they will see the list similar to what you presented, perhaps in
several parts with a "more" button.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Tue, 12 May 2015 08:57:02 GMT)
Full text and
rfc822 format available.
Message #183 received at 20499 <at> debbugs.gnu.org (full text, mbox):
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> At least the last part of this thread was about _finding_ the
> character, if you have only partial information about it. My comment
> above was about that use case, and that use case only. You seem to be
> talking about a different use case: when the user already knows quite
> well which character she wants.
This seems like a misunderstanding about the word "find".
In general I know what the character looks like.
I expect I would spot it immediately if I saw it.
For instance, it wouldn't be hard to recognize the dotless i
in a list of lowercase non-ASCII letters. Especially if it is
in some sort of order.
I'm afraid you've been looking for a solution to some problem
that I wasn't talking about.
--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org www.gnu.org
Skype: No way! See stallman.org/skype.html.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Tue, 12 May 2015 16:15:02 GMT)
Full text and
rfc822 format available.
Message #186 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> Date: Tue, 12 May 2015 04:56:20 -0400
> From: Richard Stallman <rms <at> gnu.org>
> CC: eggert <at> cs.ucla.edu, 20499 <at> debbugs.gnu.org
>
> > At least the last part of this thread was about _finding_ the
> > character, if you have only partial information about it. My comment
> > above was about that use case, and that use case only. You seem to be
> > talking about a different use case: when the user already knows quite
> > well which character she wants.
>
> This seems like a misunderstanding about the word "find".
I don't think so.
> In general I know what the character looks like.
> I expect I would spot it immediately if I saw it.
> For instance, it wouldn't be hard to recognize the dotless i
> in a list of lowercase non-ASCII letters.
I presume that when you say "non-ASCII" you really mean "non-ASCII
Latin", since the number of lowercase non-ASCII characters is rather
large (about 1400, if I'm not mistaken).
There are 581 characters in the Unicode database that are lowercase
non-ASCII Latin letters. While it's possible to go through this long
list looking for the one character you are after, it's hardly
convenient or efficient, IMO.
So I think IWBNI Emacs could help the user by showing less than this
amount. For example, if you know it's some form of i, IWBNI Emacs
allowed you to say that, and be presented only with characters which
match that description (there are only 29 of them).
> Especially if it is in some sort of order.
The order in which to present the characters is also not trivial. The
easiest one is the order of codepoints, but I presume it would be
better to group characters by their base character, i.e. all forms of
i together.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Wed, 26 Jun 2019 15:13:02 GMT)
Full text and
rfc822 format available.
Message #189 received at 20499 <at> debbugs.gnu.org (full text, mbox):
This bug report thread is huge. As far as I can tell, shorthands for
Euro etc was added (just look: C-x 8 * E => €; didn't know about that),
but I'm not sure whether there's anything remaining to be done here.
(The last message here is four years old.)
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#20499
; Package
emacs
.
(Thu, 13 Aug 2020 08:50:02 GMT)
Full text and
rfc822 format available.
Message #192 received at 20499 <at> debbugs.gnu.org (full text, mbox):
Lars Ingebrigtsen <larsi <at> gnus.org> writes:
> This bug report thread is huge. As far as I can tell, shorthands for
> Euro etc was added (just look: C-x 8 * E => €; didn't know about that),
> but I'm not sure whether there's anything remaining to be done here.
> (The last message here is four years old.)
And this was a year ago, with no further comments, so I'm closing this
bug report. If there's anything further to do in this bug report,
please reopen.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
bug closed, send any further explanations to
20499 <at> debbugs.gnu.org and Paul Eggert <eggert <at> cs.ucla.edu>
Request was from
Lars Ingebrigtsen <larsi <at> gnus.org>
to
control <at> debbugs.gnu.org
.
(Thu, 13 Aug 2020 08:50:03 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 10 Sep 2020 11:24:14 GMT)
Full text and
rfc822 format available.
This bug report was last modified 4 years and 86 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.