GNU bug report logs - #47856
auto-fill-mode vs. oriental languages: no respect

Previous Next

Package: emacs;

Reported by: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>

Date: Sun, 18 Apr 2021 01:13:02 UTC

Severity: minor

Tags: moreinfo

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 47856 in the body.
You can then email your comments to 47856 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#47856; Package emacs. (Sun, 18 Apr 2021 01:13:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 18 Apr 2021 01:13:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
To: bug-gnu-emacs <at> gnu.org
Subject: auto-fill-mode vs. oriental languages: no respect
Date: Sun, 18 Apr 2021 08:14:35 +0800
auto-fill-mode is an interactive compiled Lisp function in‘simple.el’. ...

   When Auto Fill mode is enabled, inserting a space at a column
                                               ^^^^^[1]
   beyond ‘current-fill-column’ automatically breaks the line at a
   previous space.
   ^^^^^^^^^^^^^^[2]

That is all fine and dandy. But it has no respect for oriental
languages.

What if it bro
ke a line like th
is?

That's how it treats oriental languages.

What if emacs "helpfully" turned

...if temperature > temp
   then stop_nuclear_reactor()

into

...if temp
   erature > temp
   then stop_nuclear_reactor()

Syntax error. Meltdown!

It's like if one wore braces for five years, and along came emacs and
in one second put ugly gaps back in your teeth.

Here is my line, pre-victimization:

 <p>那麼請貴司, 走過去台電大樓坐下來合作, 透過台電內部精準座標, 把這些孤兒門牌, 盡量一一歸案。</p>

And here is the mangled result:

 <p>那麼請貴司, 走過去台電大樓坐下來合作, 透過台電內部精準座標, 把
   這些孤兒門牌, 盡量一一歸案。</p>

In [2] we were promised "at a previous *space*".

Well it lied.

We put plenty of *spaces* in the line,
just to feed its hungry mouth.
But no. It had to go rip in to
"把這些孤兒門牌,"
and put a goofy gap in:
"把 這些孤兒門牌,"
That's how the rendered HTML will look.

Might as well make it

"把 這 些 孤 兒 門 牌,"

that way readers will think you were angry.

Also if the space is accidentally inserted before presidents' names,
that will mean you support/honor/respect them.

Sure, in English, President Nixon looks better than PresidentNixon.
So you will just have to take my word that I know what I am talking about.

I.e., it is super dangerous to go inserting random spaces in oriental
languages where there was none to begin with.

If there was one to begin with, then make it two or three, be my guest.
But don't just go semi-randomly put ting gaps in gran dmas' te eth. Than k you.

If there really is no way then to break a line, then just don't break
it. It's the user's problem in that case.

Maybe it can play fast and lose with .txt files,
but it should know better how silly it will make HTML look.

"Well browsers will break your oriental lines arbitrarily anyway. Bug closed."

Yes, but they do that a the ends of lines they render. The damage that
emacs does to the source file ends up as an ugly mid-word gap.
(Unless in the rare case where the browser also breaks the line at
emacs' gap, in which case the reader will not notice any problem.)

[1] P.S., RET at the end of line will destroy the line too. Not just space.
What's worse, you probably won't notice what happened, as your eyes are
already on the next line.

Seen with emacs 27.1, using -q. LC_CTYPE=zh_TW.UTF-8 .




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#47856; Package emacs. (Sun, 18 Apr 2021 06:47:01 GMT) Full text and rfc822 format available.

Message #8 received at 47856 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Cc: 47856 <at> debbugs.gnu.org
Subject: Re: bug#47856: auto-fill-mode vs. oriental languages: no respect
Date: Sun, 18 Apr 2021 09:46:24 +0300
> From: 積丹尼 Dan Jacobson
>  <jidanni <at> jidanni.org>
> Date: Sun, 18 Apr 2021 08:14:35 +0800
> 
> auto-fill-mode is an interactive compiled Lisp function in‘simple.el’. ...
> 
>    When Auto Fill mode is enabled, inserting a space at a column
>                                                ^^^^^[1]
>    beyond ‘current-fill-column’ automatically breaks the line at a
>    previous space.
>    ^^^^^^^^^^^^^^[2]
> 
> That is all fine and dandy. But it has no respect for oriental
> languages.
> 
> What if it bro
> ke a line like th
> is?
> 
> That's how it treats oriental languages.
> 
> What if emacs "helpfully" turned
> 
> ...if temperature > temp
>    then stop_nuclear_reactor()
> 
> into
> 
> ...if temp
>    erature > temp
>    then stop_nuclear_reactor()
> 
> Syntax error. Meltdown!
> 
> It's like if one wore braces for five years, and along came emacs and
> in one second put ugly gaps back in your teeth.
> 
> Here is my line, pre-victimization:
> 
>  <p>那麼請貴司, 走過去台電大樓坐下來合作, 透過台電內部精準座標, 把這些孤兒門牌, 盡量一一歸案。</p>
> 
> And here is the mangled result:
> 
>  <p>那麼請貴司, 走過去台電大樓坐下來合作, 透過台電內部精準座標, 把
>    這些孤兒門牌, 盡量一一歸案。</p>
> 
> In [2] we were promised "at a previous *space*".

Emacs by default employs the "kinsoku" rules for breaking lines in CJK
languages, when it fills text.  Isn't the place where it breaks the
line in this case according to Kinsoku rules? if you set
enable-kinsoku to nil, don't you get what you expected?  If so, this
seems to be a documentation issue.




Added tag(s) moreinfo. Request was from Stefan Kangas <stefan <at> marxist.se> to control <at> debbugs.gnu.org. (Sun, 18 Apr 2021 16:28:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#47856; Package emacs. (Tue, 20 Apr 2021 01:14:02 GMT) Full text and rfc822 format available.

Message #13 received at 47856 <at> debbugs.gnu.org (full text, mbox):

From: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 47856 <at> debbugs.gnu.org
Subject: Re: bug#47856: auto-fill-mode vs. oriental languages: no respect
Date: Tue, 20 Apr 2021 07:28:27 +0800
>>>>> "EZ" == Eli Zaretskii <eliz <at> gnu.org> writes:

EZ> Emacs by default employs the "kinsoku" rules for breaking lines in CJK
EZ> languages, when it fills text.  Isn't the place where it breaks the
EZ> line in this case according to Kinsoku rules? if you set
EZ> enable-kinsoku to nil, don't you get what you expected?  If so, this
EZ> seems to be a documentation issue.

Try it and you will see that whatever value enable-kinsoku has does not affect
this, nor #47857. And that is a good thing too. If it did we would
really be in trouble.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#47856; Package emacs. (Tue, 20 Apr 2021 05:21:02 GMT) Full text and rfc822 format available.

Message #16 received at 47856 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Cc: 47856 <at> debbugs.gnu.org
Subject: Re: bug#47856: auto-fill-mode vs. oriental languages: no respect
Date: Tue, 20 Apr 2021 01:20:02 -0400
> From: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
> Cc: 47856 <at> debbugs.gnu.org
> Date: Tue, 20 Apr 2021 07:28:27 +0800
> 
> >>>>> "EZ" == Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> EZ> Emacs by default employs the "kinsoku" rules for breaking lines in CJK
> EZ> languages, when it fills text.  Isn't the place where it breaks the
> EZ> line in this case according to Kinsoku rules? if you set
> EZ> enable-kinsoku to nil, don't you get what you expected?  If so, this
> EZ> seems to be a documentation issue.
> 
> Try it and you will see that whatever value enable-kinsoku has does not affect
> this, nor #47857. And that is a good thing too. If it did we would
> really be in trouble.

Why are you so unhelpful? don't you want this issue investigated and
resolved?  I asked the questions above because I don't speak Chinese
and cannot read the text you quoted in your report.  Please help me
understand the issue by answering those questions, and please provide
any additional information that could be of relevance, so that we
could make some progress here.  TIA.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#47856; Package emacs. (Tue, 20 Apr 2021 11:26:02 GMT) Full text and rfc822 format available.

Message #19 received at 47856 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: jidanni <at> jidanni.org
Cc: 47856 <at> debbugs.gnu.org
Subject: Re: bug#47856: auto-fill-mode vs. oriental languages: no respect
Date: Tue, 20 Apr 2021 14:24:45 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Date: Tue, 20 Apr 2021 01:20:02 -0400
> Cc: 47856 <at> debbugs.gnu.org
> 
> Please help me understand the issue by answering those questions,
> and please provide any additional information that could be of
> relevance, so that we could make some progress here.  TIA.

Never mind, I've managed to figure this out on my own.  So, back to
the TIL department:

 . For CJK scripts, Emacs's filling commands are allowed to break a
   line at _any_ character, not just at whitespace.  This is not just
   Emacs's invention: the Unicode Line-Breaking Algorithm mandates the
   same, albeit via special properties it assigns to CJK characters.

 . If you load 'kinsoku', Emacs will additionally refrain from
   breaking lines between some CJK characters, where there are special
   rules which prohibit that.  But still, line can be broken almost at
   any place in CJK text, even under the kinsoku rules.

 . Conclusion: this is the intended behavior, a feature.

So yeah, it's a documentation issue, to be fixed soon enough.

> "Well browsers will break your oriental lines arbitrarily anyway. Bug closed."
> 
> Yes, but they do that a the ends of lines they render.

If this is what you want, it's a different feature: you need to turn
on word-wrap (M-x visual-line-mode RET), not auto-fill.  In Emacs 28,
there will be an additional option, word-wrap-by-category, which will
obey kinsoku rules in visual-line-mode.




Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Tue, 20 Apr 2021 12:15:02 GMT) Full text and rfc822 format available.

Notification sent to 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>:
bug acknowledged by developer. (Tue, 20 Apr 2021 12:15:02 GMT) Full text and rfc822 format available.

Message #24 received at 47856-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: jidanni <at> jidanni.org
Cc: 47856-done <at> debbugs.gnu.org
Subject: Re: bug#47856: auto-fill-mode vs. oriental languages: no respect
Date: Tue, 20 Apr 2021 15:14:34 +0300
> Date: Tue, 20 Apr 2021 14:24:45 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 47856 <at> debbugs.gnu.org
> 
> So yeah, it's a documentation issue, to be fixed soon enough.

Now done, and closing the bug.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 19 May 2021 11:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 337 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.