GNU bug report logs - #33796
27.0.50; Use utf-8 is all our Elisp files

Package: emacs;

Reported by: Stefan Monnier <monnier <at> iro.umontreal.ca>

Date: Tue, 18 Dec 2018 18:48:01 UTC

Severity: normal

Found in version 27.0.50

Done: Stefan Monnier <monnier <at> IRO.UMontreal.CA>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 33796 in the body.
You can then email your comments to 33796 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#33796; Package emacs. (Tue, 18 Dec 2018 18:48:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Stefan Monnier <monnier <at> iro.umontreal.ca>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Tue, 18 Dec 2018 18:48:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: bug-gnu-emacs <at> gnu.org
Subject: 27.0.50; Use utf-8 is all our Elisp files
Date: Tue, 18 Dec 2018 13:46:45 -0500

[Message part 1 (text/plain, inline)]

Package: Emacs
Version: 27.0.50


Since Emacs-25, UTF-8 is the standard/default encoding for Elisp files.
The attached patch changes the few non-utf-8 Elisp files to use utf-8.

AFAICT, this patch is safe in the sense that the resulting .elc files
are identical (except for titdic-cnv.elc obviously, since I not only
changed the encoding but also the code, but I also checked that the
change of encoding itself does not affect the resulting .elc file).

In this patch, I made titdic-cnv.el use utf-8-emacs instead of utf-8
since it includes chars that can't be encoded with utf-8.  I'm not sure
why the same does not apply to the files it generates, but in my tests all
the quail files it generates can use utf-8 (rather than utf-8-emacs)
without affecting the generated .elc files (although the non-utf-8
chars of titdic-cnv.el seem to be inserted into some of the generated files
according to my reading of the code).

Any comments on the patch, or objection to installing it?


        Stefan

[0001-Convert-remaining-non-utf-8-Elisp-files-to-utf-8.patch (application/octet-stream, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33796; Package emacs. (Tue, 18 Dec 2018 19:23:02 GMT) Full text and rfc822 format available.

Message #8 received at 33796 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 33796 <at> debbugs.gnu.org
Subject: Re: bug#33796: 27.0.50; Use utf-8 is all our Elisp files
Date: Tue, 18 Dec 2018 21:22:02 +0200

> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Date: Tue, 18 Dec 2018 13:46:45 -0500
> 
> Since Emacs-25, UTF-8 is the standard/default encoding for Elisp files.
> The attached patch changes the few non-utf-8 Elisp files to use utf-8.
> 
> AFAICT, this patch is safe in the sense that the resulting .elc files
> are identical (except for titdic-cnv.elc obviously, since I not only
> changed the encoding but also the code, but I also checked that the
> change of encoding itself does not affect the resulting .elc file).

The .elc files are identical, but visiting the .el files will (or
might) use different fonts, because the charset information is lost.
(You will see that I jumped through some hoops to do something similar
with etc/HELLO.)

So I don't think we should make this change without considering
whether the charset information is as important nowadays as it was
back then.  And I'm not really sure who to ask about this.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33796; Package emacs. (Tue, 18 Dec 2018 19:47:01 GMT) Full text and rfc822 format available.

Message #11 received at 33796 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: "K. Handa" <handa <at> gnu.org>, 33796 <at> debbugs.gnu.org
Subject: Re: bug#33796: 27.0.50; Use utf-8 is all our Elisp files
Date: Tue, 18 Dec 2018 14:46:17 -0500

> The .elc files are identical, but visiting the .el files will (or
> might) use different fonts, because the charset information is lost.
> (You will see that I jumped through some hoops to do something similar
> with etc/HELLO.)

That's indeed what I understand of the situation.  But I don't think
it's a good reason to keep supporting non-utf-8 encoding for ever
(many/most programming languages only support a single encoding,
typically ASCII or utf-8 nowadays).
Part of the purpose of this bug-report is to try and come up with a plan ;-)

Hence, there are some questions:
- Do those people who edit those files really care about the difference?
  After all, IIUC utf-8 is becoming standard even in the CJK world so
  maybe the change is not that terrible (or at least, users have gotten
  used to lowering their expectations in this respect).
- If the change is indeed problematic, can we adjust it by using
  a file-global language tag?
- If that's not sufficient, can we use a scheme like that
  of etc/HELLO but to keep the files directly usable as Elisp (so as to
  have our cake and eat it too)?

> So I don't think we should make this change without considering
> whether the charset information is as important nowadays as it was
> back then.

How 'bout installing the titdic-cnv.el part which changes the coding
system used for the generated quail files (being auto-generated their
rending as source files shouldn't matter nearly as much since noone
should edit them)?

> And I'm not really sure who to ask about this.

I added Handa in the Cc, since I had forgotten to add him to the
X-Debbugs-Cc.


        Stefan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33796; Package emacs. (Wed, 19 Dec 2018 17:56:01 GMT) Full text and rfc822 format available.

Message #14 received at 33796 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 33796 <at> debbugs.gnu.org
Subject: 27.0.50; Use utf-8 is all our Elisp files
Date: Wed, 19 Dec 2018 09:54:40 -0800

> I'm not really sure who to ask about this.

You can ask me (:-). Although I can't read east-Asian languages I do 
have significant experience with CJK text as my previous (15-year) job 
was in a company whose customers were almost all CJK and where CJK 
internationalization was essential and where I regularly dealt with 
weird encodings and displays. And this one is an easy call: for 
maintaining these particular files, UTF-8 is an improvement and this 
patch should go in.

To take just one example, titdic-cnv.el: people who are seriously 
maintaining it and who need to read the Chinese text will almost surely 
have their environment set up to display UTF-8 Chinese text well 
already. Furthermore, if you take a look at all the changes made to this 
file in the last decade, here are the statistics:

  edits contributor
     15 Author: Paul Eggert <eggert <at> cs.ucla.edu>
     10 Author: Glenn Morris <rgm <at> gnu.org>
      2 Author: Stefan Monnier <monnier <at> iro.umontreal.ca>
      2 Author: Juanma Barranquero <lekktu <at> gmail.com>
      1 Author: Phillip Lord <phillip.lord <at> russet.org.uk>
      1 Author: Kenichi Handa <handa <at> m17n.org>
      1 Author: Andreas Schwab <schwab <at> linux-m68k.org>

Only one edit was made by a CJK user, and handa's edit involved only 
ASCII characters. Switching this file to UTF-8 would not have made any 
of our maintenance any more difficult in the last decade.

Conversely, I commonly use tools like 'git grep' to look for issues in 
the code, and these tools mishandle non-UTF-8 files and I see mojibake 
on my screen because of this. So it will be a significant win for me 
(and I suspect others) when we switch these files to UTF-8.

To try to answer Stefan's questions:

> - Do those people who edit those files really care about the difference?

No, almost always: see above.

>   utf-8 is becoming standard even in the CJK world so
>   maybe the change is not that terrible (or at least, users have gotten
>   used to lowering their expectations in this respect).

Yes, that’s happened. I looked for recent reports about this, and it 
appears that the controversy is mostly over. For example, 
<https://gihyo.jp/lifestyle/serial/01/ganshiki-soushi/0069> (dated 2015) 
lamented the demise of Japanese Knoppix and said that Plamo Linux had 
problems with EUC-JP and suggested users switch to UTF-8. More recently 
<https://qiita.com/tenforward/items/5e353f290f0b401139cb> (dated this 
year) says that the choice of EUC-JP or UTF-8 is user-specific for Plamo 
Linux, and that applications like Firefox have problems with EUC-JP so 
discretion is advised if you choose EUC-JP. If even hardcore holdouts 
like Plamo are folding....

> - If the change is indeed problematic, can we adjust it by using
>   a file-global language tag?

I hope that’s not necessary, but it’d be OK if we have to do it.

> - If that's not sufficient, can we use a scheme like that
>   of etc/HELLO but to keep the files directly usable as Elisp (so as to
>   have our cake and eat it too)?

etc/HELLO is pretty much a disaster for me now, as I can’t use any tool 
other than Emacs to look at it, and even Emacs screws up if I do 
something like 'M-x grep RET hello etc/HELLO RET'. I’d rather not extend 
this disaster to other files.

PS. One minor suggestion for your patch: please also update the list of 
files in admin/notes/unicode to remove mention of the files in question.

PPS. How about also converting etc/tutorials/TUTORIAL.ja, 
lisp/leim/quail/hanja-jis.el, lisp/leim/quail/japanese.el, 
lisp/leim/quail/py-punct.el, and lisp/leim/quail/pypunct-b5.el?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33796; Package emacs. (Wed, 19 Dec 2018 18:13:01 GMT) Full text and rfc822 format available.

Message #17 received at 33796 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: monnier <at> iro.umontreal.ca, 33796 <at> debbugs.gnu.org
Subject: Re: 27.0.50; Use utf-8 is all our Elisp files
Date: Wed, 19 Dec 2018 20:11:49 +0200

> Cc: 33796 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Wed, 19 Dec 2018 09:54:40 -0800
> 
>  > I'm not really sure who to ask about this.
> 
> You can ask me (:-). Although I can't read east-Asian languages I do 
> have significant experience with CJK text as my previous (15-year) job 
> was in a company whose customers were almost all CJK and where CJK 
> internationalization was essential and where I regularly dealt with 
> weird encodings and displays. And this one is an easy call: for 
> maintaining these particular files, UTF-8 is an improvement and this 
> patch should go in.

Thanks.

I could predict your answers in advance.  I need to hear a second
opinion, from someone who does read these languages, because the issue
at hand is how the charset information affects the font(s) selected
for displaying the text, and how important are the differences in
those fonts to CJK users.

> etc/HELLO is pretty much a disaster for me now, as I can’t use any tool 
> other than Emacs to look at it

??? It's a UTF-8 file with markup.  Do you have the same problems with
HTML and XML files?

(I'm not saying that we should use the same technique for Lisp files,
of course.)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33796; Package emacs. (Wed, 19 Dec 2018 21:17:01 GMT) Full text and rfc822 format available.

Message #20 received at 33796 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 33796 <at> debbugs.gnu.org
Subject: Re: 27.0.50; Use utf-8 is all our Elisp files
Date: Wed, 19 Dec 2018 16:16:45 -0500

> PPS. How about also converting etc/tutorials/TUTORIAL.ja,
> lisp/leim/quail/hanja-jis.el, lisp/leim/quail/japanese.el,
> lisp/leim/quail/py-punct.el, and lisp/leim/quail/pypunct-b5.el?

I don't see how we'll ever get rid of support for iso-2022 encoding, so
I'm not terribly concerned about converting files like TUTORIAL.ja.
If you think it's a good idea, of course, I'm very much in favor of such
a change, but I focused on .el files because I'm interested in
standardizing Elisp files to utf-8 and get rid of
load-with-code-conversion (a distant target, admittedly, but at least
I can see a path that can get us there).

I missed the above 4 Elisp files because my regexp fu was too weak.
I'll update my patch, thanks,


        Stefan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33796; Package emacs. (Wed, 19 Dec 2018 22:15:02 GMT) Full text and rfc822 format available.

Message #23 received at 33796 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: monnier <at> iro.umontreal.ca, 33796 <at> debbugs.gnu.org
Subject: Re: 27.0.50; Use utf-8 is all our Elisp files
Date: Wed, 19 Dec 2018 14:13:59 -0800

On 12/19/18 10:11 AM, Eli Zaretskii wrote:
> I need to hear a second opinion,

That would actually be a third opinion, as Stefan's opinion surely 
counts too and he has good reasons to prefer UTF-8 here. And to some 
extent opinions should be weighted for the kind of maintenance that is 
actually done with these files as opposed to the rare cases where the 
font's style might annoy a language-expert developer if the wrong 
language environment were used.

>> etc/HELLO is pretty much a disaster for me now, as I can’t use any tool
>> other than Emacs to look at it
>
> ??? It's a UTF-8 file with markup.  Do you have the same problems with
> HTML and XML files?

No, because when I visit those files I see the same thing in my Emacs 
editing buffer that I see after using common keystrokes like 'C-x v =' 
or standard tools like "git diff", and it's easy to use Emacs to edit 
these files in the usual way without becoming expert in html-mode etc. 
In contrast, with etc/HELLO standard tools and common keystrokes give me 
gibberish, and one must gain expertise in enriched-mode to make 
nontrivial changes.

A primary goal of Emacs is to have source code that the user can change 
easily, and using enriched-text mode in etc/HELLO works against this. It 
might be OK just for that one file (as a demonstration of enriched-text 
mode perhaps) but as things stand we shouldn't let these issues infect 
the rest of the Emacs sources.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33796; Package emacs. (Thu, 20 Dec 2018 16:07:04 GMT) Full text and rfc822 format available.

Message #26 received at 33796 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Kenichi Handa <handa <at> gnu.org>
Cc: monnier <at> iro.umontreal.ca, 33796 <at> debbugs.gnu.org
Subject: Re: 27.0.50; Use utf-8 is all our Elisp files
Date: Thu, 20 Dec 2018 18:06:32 +0200

> Cc: monnier <at> iro.umontreal.ca, 33796 <at> debbugs.gnu.org
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Wed, 19 Dec 2018 14:13:59 -0800
> 
> On 12/19/18 10:11 AM, Eli Zaretskii wrote:
>  > I need to hear a second opinion,
> 
> That would actually be a third opinion, as Stefan's opinion surely 
> counts too and he has good reasons to prefer UTF-8 here.

Technically, it's the forth, because my opinion should also count,
right?

But this is besides the point, because we need the opinion of people
who might be actually affected by the proposed change, and none of us
qualify.  All 3 of us simply don't care, because we don't read these
scripts and don't distinguish the various fonts used to display the
same Unicode codepoints under different cultural conventions.  At some
point in the past that distinction was very important.  If nowadays it
no longer is, then I see no problems making the change.  Otherwise,
the change will lose information important to some of our users.

We need someone to advise us what is the actual state of the affairs.
I hope Handa-san will (please don't drop him from the CC list).  Or
maybe someone here can propose other experts or even just users with
relevant experience.

> And to some extent opinions should be weighted for the kind of
> maintenance that is actually done with these files as opposed to the
> rare cases where the font's style might annoy a language-expert
> developer if the wrong language environment were used.

This is also beyond the point, because we have nothing to weigh this
against for now.  When we do, we will.

>  >> etc/HELLO is pretty much a disaster for me now, as I can’t use any tool
>  >> other than Emacs to look at it
>  >
>  > ??? It's a UTF-8 file with markup.  Do you have the same problems with
>  > HTML and XML files?
> 
> No, because when I visit those files I see the same thing in my Emacs 
> editing buffer that I see after using common keystrokes like 'C-x v =' 
> or standard tools like "git diff", and it's easy to use Emacs to edit 
> these files in the usual way without becoming expert in html-mode etc. 
> In contrast, with etc/HELLO standard tools and common keystrokes give me 
> gibberish, and one must gain expertise in enriched-mode to make 
> nontrivial changes.

This line of reasoning makes little sense to me:

 . Displaying HELLO doesn't show "gibberish", it shows UTF-8 encoded
   text with pure-ASCII markup.  If your terminal can display these
   characters, you should see legible marked-up text, whereas the
   ISO-2022 encoded file of yore would display as illegible escape
   sequences.  But since in your opinion the current situation is a
   "disaster", you seem to be saying that we should go back to
   ISO-2022?
 . By the above reasoning, if Emacs is enhanced to interpret HTML/XML
   and show typefaces instead of markup, you will see that as a
   regression and complain that raw HTML files are "gibberish"?
 . You have find-file-literally to show you HELLO exactly as any
   text-mode tool will see it, if you really need that.
 . No experience in Enriched mode is needed to edit HELLO, you just
   need to apply text properties (via facemenu.el commands or the
   menu-bar's Edit->Text Properties menu).  And these properties are
   optional.

> A primary goal of Emacs is to have source code that the user can change 
> easily, and using enriched-text mode in etc/HELLO works against this. It 
> might be OK just for that one file (as a demonstration of enriched-text 
> mode perhaps) but as things stand we shouldn't let these issues infect 
> the rest of the Emacs sources.

etc/HELLO is not a demonstration of Enriched mode, it is a
demonstration of facilities to edit and display many different scripts
and character sets in the same buffer.  We use Enriched mode there
because we have no other feature which allows us to save 'charset'
text property to a disk file.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33796; Package emacs. (Thu, 20 Dec 2018 21:50:01 GMT) Full text and rfc822 format available.

Message #29 received at 33796 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>, Kenichi Handa <handa <at> gnu.org>
Cc: monnier <at> iro.umontreal.ca, 33796 <at> debbugs.gnu.org
Subject: Re: 27.0.50; Use utf-8 is all our Elisp files
Date: Thu, 20 Dec 2018 13:49:44 -0800

On 12/20/18 8:06 AM, Eli Zaretskii wrote:

> my opinion should also count, right?

Of course, although my impression was that you weren't expressing an 
opinion and were soliciting opinions. If your opinion is that we should 
not make the change, then of course that counts.

> we need the opinion of people
> who might be actually affected by the proposed change,

I assume you mean that we need the opinion of people who would be 
affected _negatively_. Stefan and I would actually be affected 
_positively_ by the proposed change, for the reasons we stated.

> All 3 of us simply don't care,

No, actually I do care. Non-UTF-8 source files are a real annoyance for 
me, on a fairly regular basis. Stefan seems to care too, though I 
suspect he doesn't care as much as I do.

>  . Displaying HELLO doesn't show "gibberish", it shows UTF-8 encoded
>    text with pure-ASCII markup.

You're right. My apologies: when I wrote "gibberish" I was looking at 
the output of "git diff emacs-26..master etc/HELLO", which does indeed 
display gibberish but that's not the current encoding's fault.

> But since in your opinion the current situation is a
>    "disaster", you seem to be saying that we should go back to ISO-2022?

Not at all, but I do think we should cut down on the unnecessary markup 
in that file. The markup should be used only when it helps. Text like 
"<x-charset><param>mule-unicode-0100-24ff</param> </x-charset>" is not 
helping anybody; the file should just contain " " there. Most of the 
markup in that file is not necessary for proper display, and just gets 
in the way when using tools other than Emacs.

>  . By the above reasoning, if Emacs is enhanced to interpret HTML/XML
>    and show typefaces instead of markup, you will see that as a
>    regression and complain that raw HTML files are "gibberish"?

I hope Emacs doesn't do any such thing by default. I often use Emacs to 
edit .html and .xml files, and if it attempted to render these files by 
default I would be inconvenienced. Presumably there would be an option 
to keep the old behavior, and I'd use that option.

>  . You have find-file-literally to show you HELLO exactly as any
>    text-mode tool will see it

No, because find-file-literally shows hard-to-read stuff like this:

</x-charset><x-charset><param>greek-iso8859-7</param>Greek 
(\316\265\316\273\316\273\316\267\316\275\316\271\316\272\316\254) 
\316\223\316\265\316\271\316\254 \317\203\316\261\317\202

which differs from (and is even worse than) what an ordinary tool like 
git or cat shows:

</x-charset><x-charset><param>greek-iso8859-7</param>Greek (ελληνικά)   
Γειά σας

It would be better to remove this particular markup, so that git etc. 
would show this:

Greek (ελληνικά)    Γειά σας

which is what Emacs ordinarily shows.

>  . No experience in Enriched mode is needed to edit HELLO, you just
>    need to apply text properties (via facemenu.el commands or the
>    menu-bar's Edit->Text Properties menu).  And these properties are
>    optional.

Let's leave most of them out then, as they're not working well in 
etc/HELLO. I don't use that menu, but I took your hint and just now 
tried it, by selecting the abovementioned word "ελληνικά" and menuing to 
Edit > Text Properties > Describe Properties, but all it said was 'Text 
content at position 1530: There are text properties here: unknown 
("x-charset")'. This missed the point that the word's character set is 
greek-iso8859-7 which is a special hack that hints to Emacs (and nobody 
else, I guess? I couldn't find documentation for this stuff even in the 
Emacs manuals) that the text should be displayed with a Greek font 
instead of the same Greek font that Emacs would be using anyway. And I 
didn't see an easy way to see visually that the this (unnecessary) 
<x-charset> hint is misplaced, since it should be placed so that it 
applies only to the Greek text and not to the surrounding English text 
in the same line.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33796; Package emacs. (Fri, 21 Dec 2018 07:30:02 GMT) Full text and rfc822 format available.

Message #32 received at 33796 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: handa <at> gnu.org, monnier <at> iro.umontreal.ca, 33796 <at> debbugs.gnu.org
Subject: Re: 27.0.50; Use utf-8 is all our Elisp files
Date: Fri, 21 Dec 2018 09:29:36 +0200

> Cc: monnier <at> iro.umontreal.ca, 33796 <at> debbugs.gnu.org
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Thu, 20 Dec 2018 13:49:44 -0800
> 
> On 12/20/18 8:06 AM, Eli Zaretskii wrote:
> 
>  > my opinion should also count, right?
> 
> Of course, although my impression was that you weren't expressing an 
> opinion and were soliciting opinions.

Same as Stefan, actually: he asked whether there were objections.

>  > we need the opinion of people
>  > who might be actually affected by the proposed change,
> 
> I assume you mean that we need the opinion of people who would be 
> affected _negatively_.

Not necessarily.  I would actually like to hear opinions from people
who read CJK scripts who think the distinction no longer matters, not
these days.

>  > All 3 of us simply don't care,
> 
> No, actually I do care. Non-UTF-8 source files are a real annoyance for 
> me

This is a misunderstanding: by "don't care" I meant we don't care
which font is used to display a particular Unicode codepoint in the
Han area.

> I do think we should cut down on the unnecessary markup 
> in that file.

Agreed.

> The markup should be used only when it helps. Text like 
> "<x-charset><param>mule-unicode-0100-24ff</param> </x-charset>" is not 
> helping anybody; the file should just contain " " there.

There are only 2 such occurrences, so this isn't a grave problem.  I
will take a look when I have time.

> Most of the markup in that file is not necessary for proper display,
> and just gets in the way when using tools other than Emacs.

Which markup is not necessary for display, in your opinion?  I'm
surprised to hear that "most of it" is unnecessary, but maybe I'm
missing something.

>  >  . By the above reasoning, if Emacs is enhanced to interpret HTML/XML
>  >    and show typefaces instead of markup, you will see that as a
>  >    regression and complain that raw HTML files are "gibberish"?
> 
> I hope Emacs doesn't do any such thing by default.

Really?  Quite a few Emacs users think that it should, and that the
fact it doesn't is one of the significant deficiencies in Emacs, as
compared to other popular editors.

> </x-charset><x-charset><param>greek-iso8859-7</param>Greek (ελληνικά)   
> Γειά σας
> 
> It would be better to remove this particular markup, so that git etc. 
> would show this:
> 
> Greek (ελληνικά)    Γειά σας
> 
> which is what Emacs ordinarily shows.

That markup is precisely what keeps the charset properties on the
corresponding greetings.  Removing it would be losing information that
HELLO is trying to preserve.

> I don't use that menu, but I took your hint and just now 
> tried it, by selecting the abovementioned word "ελληνικά" and menuing to 
> Edit > Text Properties > Describe Properties, but all it said was 'Text 
> content at position 1530: There are text properties here: unknown 
> ("x-charset")'. This missed the point that the word's character set is 
> greek-iso8859-7

I cannot reproduce this.  That menu item invokes the command
describe-text-properties, which pops up the *Help* buffer, and the
text there says:

  Text content at position 1530:

  There are text properties here:
    charset              greek-iso8859-7

I wonder why you don't see that.  Is it possible that you are looking
at a file/buffer that was modified from its original contents?

> which is a special hack that hints to Emacs (and nobody else, I
> guess? I couldn't find documentation for this stuff even in the 
> Emacs manuals) that the text should be displayed with a Greek font 
> instead of the same Greek font that Emacs would be using anyway.

The charset property allows us to have a fontset that directs Emacs to
use specific fonts for specific character ranges.  See set-fontset-font.
I do agree that these issues are notoriously under-documented.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33796; Package emacs. (Fri, 21 Dec 2018 13:47:02 GMT) Full text and rfc822 format available.

Message #35 received at 33796 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: handa <at> gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>, 33796 <at> debbugs.gnu.org
Subject: Re: 27.0.50; Use utf-8 is all our Elisp files
Date: Fri, 21 Dec 2018 08:46:11 -0500

> Not necessarily.  I would actually like to hear opinions from people
> who read CJK scripts who think the distinction no longer matters, not
> these days.

BTW, while looking closer, I'm inclined to think that maybe their
opinion doesn't matter that much: while the general issue of font choice
for CJK text in Elisp files might really affect some users, in the
specific case of the files affected by this patch I believe this likely
isn't the case, because while there are affected *chars*, there is no
affected *text*.  More specifically, AFAICT the affected chars are all
part of the code and they represent themselves rather than being used as
a carrier for a specific meaning in a text (because all this code is
about how to insert specific chars).

[ Snipped the rest about etc/HELLO.  ]


        Stefan "I asked Chong what he thought about it but said that
                he's not using CJK enough to be a good source of opinion"

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33796; Package emacs. (Fri, 21 Dec 2018 13:56:02 GMT) Full text and rfc822 format available.

Message #38 received at 33796 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: eggert <at> cs.ucla.edu
Cc: monnier <at> iro.umontreal.ca, 33796 <at> debbugs.gnu.org
Subject: Re: bug#33796: 27.0.50; Use utf-8 is all our Elisp files
Date: Fri, 21 Dec 2018 15:55:11 +0200

> Date: Fri, 21 Dec 2018 09:29:36 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: monnier <at> iro.umontreal.ca, 33796 <at> debbugs.gnu.org
> 
> > I don't use that menu, but I took your hint and just now 
> > tried it, by selecting the abovementioned word "ελληνικά" and menuing to 
> > Edit > Text Properties > Describe Properties, but all it said was 'Text 
> > content at position 1530: There are text properties here: unknown 
> > ("x-charset")'. This missed the point that the word's character set is 
> > greek-iso8859-7
> 
> I cannot reproduce this.  That menu item invokes the command
> describe-text-properties, which pops up the *Help* buffer, and the
> text there says:
> 
>   Text content at position 1530:
> 
> 
>   There are text properties here:
>     charset              greek-iso8859-7
> 
> I wonder why you don't see that.

I think I know the answer to that: you use Emacs 26 or older to look
at the file.  Only Emacs 27 supports the x-charset property in
Enriched mode.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33796; Package emacs. (Fri, 21 Dec 2018 15:56:01 GMT) Full text and rfc822 format available.

Message #41 received at 33796 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
Cc: handa <at> gnu.org, eggert <at> cs.ucla.edu, 33796 <at> debbugs.gnu.org
Subject: Re: 27.0.50; Use utf-8 is all our Elisp files
Date: Fri, 21 Dec 2018 17:54:48 +0200

> From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
> Cc: Paul Eggert <eggert <at> cs.ucla.edu>, handa <at> gnu.org, 33796 <at> debbugs.gnu.org
> Date: Fri, 21 Dec 2018 08:46:11 -0500
> 
> BTW, while looking closer, I'm inclined to think that maybe their
> opinion doesn't matter that much: while the general issue of font choice
> for CJK text in Elisp files might really affect some users, in the
> specific case of the files affected by this patch I believe this likely
> isn't the case, because while there are affected *chars*, there is no
> affected *text*.

Maybe.  But I wouldn't jump to conclusions: it could be that the
aversion is (or was) to how the glyphs look, regardless of whether
they are part of meaningful text.

Reply sent to Stefan Monnier <monnier <at> IRO.UMontreal.CA>:
You have taken responsibility. (Tue, 08 Jan 2019 02:21:02 GMT) Full text and rfc822 format available.

Notification sent to Stefan Monnier <monnier <at> iro.umontreal.ca>:
bug acknowledged by developer. (Tue, 08 Jan 2019 02:21:02 GMT) Full text and rfc822 format available.

Message #46 received at 33796-done <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
To: 33796-done <at> debbugs.gnu.org
Subject: Re: bug#33796: 27.0.50; Use utf-8 is all our Elisp files
Date: Mon, 07 Jan 2019 21:20:36 -0500

Installed,


        Stefan

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 05 Feb 2019 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 165 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #33796 27.0.50; Use utf-8 is all our Elisp files

GNU bug report logs - #33796
27.0.50; Use utf-8 is all our Elisp files