GNU bug report logs -
#9286
fill-paragraph destroys URLs
Previous Next
Reported by: jidanni <at> jidanni.org
Date: Thu, 11 Aug 2011 21:30:02 UTC
Severity: minor
Tags: fixed, patch
Merged with 34463
Fixed in version 26.1
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 9286 in the body.
You can then email your comments to 9286 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
owner <at> debbugs.gnu.org, jidanni <at> jidanni.org, bug-gnu-emacs <at> gnu.org
:
bug#9286
; Package
emacs
.
(Thu, 11 Aug 2011 21:30:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
jidanni <at> jidanni.org
:
New bug report received and forwarded. Copy sent to
jidanni <at> jidanni.org, bug-gnu-emacs <at> gnu.org
.
(Thu, 11 Aug 2011 21:30:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Gentlemen, watch as emacs' fill-paragraph hatefully victimizes this
http://goo.gl/rThbu URL below while leaving the others unscathed.
$ cat a.txt
CD 定義:
http://goo.gl/rThbu
國內代表性網站:
http://smcj.net/
http://ragii.com/ 另參:
http://www.facebook.com/tg.taiwan
http://www.flickr.com/groups/tg-taiwan/
Our membership target is a Taiwan audience at this time.
煩請通知社團管理員您真的是否確定要加入,以免 spam.
$ LC_CTYPE=zh_TW.UTF-8 emacs a.txt
M-q
CD 定義: http://goo.gl/rThbu國內代表性網站: http://smcj.net/
http://ragii.com/ 另參: http://www.facebook.com/tg.taiwan
http://www.flickr.com/groups/tg-taiwan/ Our membership target is a
Taiwan audience at this time. 煩請通知社團管理員您真的是否確定要加入,以
免 spam.
Emacs _just assumes_ it is OK to ram 'u' into '國' if it crosses a newline.
Allow us to hit M-q on
u 國 u 國 u 國 u 國 u 國 u 國
u 國 u 國 u 國 u 國 u 國 u 國
We come up with:
u 國 u 國 u 國 u 國 u 國 u 國u 國 u 國 u 國 u 國 u 國 u 國
No kidding, deep in ones essays emacs is secretly destroying certain URLs as we speak.
My point is if emacs is brazen enough to squeeze out a newline, then it
should be brazen enough to squeeze out a space. But better yet don't be
brazen enough at all.
Further experiments
國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國
becomes:
國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國
國 國 國 國 國 國 國 國 國 國 國 國
but
國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國
國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國
becomes:
國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國
國 國 國 國 國 國 國 國 國 國 國 國國 國 國 國 國 國 國 國 國 國 國 國 國
國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國
OK have it your way, but at least don't join syntax of u into syntax of
Chinese... P.S., don't send me a fix just for me. I'm reporting a bug
not asking for help.
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#9286
; Package
emacs
.
(Sat, 20 Aug 2011 20:02:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 9286 <at> debbugs.gnu.org (full text, mbox):
If I am decoding the jidanni-speak correctly, his complaint is doing M-q
on a buffer containing
asdf
國
turns the text into
asdf國
instead of what he wants:
asdf 國
This is because line joining does not include a space if *either*
character on each side of the newline has the ?| (line-breakable)
category and an entry in fill-nospace-between-words-table. To get the
behavior jidanni wants, we could change it so that *both* the characters
must have this property; see attached patch.
But I am not sure this is TRT in general. Handa-san, could you weigh in
with an opinion? Adding a space seems more or less correct to me, but I
am no expert.
*** lisp/textmodes/fill.el 2011-07-16 20:05:54 +0000
--- lisp/textmodes/fill.el 2011-08-20 19:52:41 +0000
***************
*** 482,491 ****
(replace-match (get-text-property (match-beginning 0) 'fill-space))
(let ((prev (char-before (match-beginning 0)))
(next (following-char)))
! (if (and (or (aref (char-category-set next) ?|)
! (aref (char-category-set prev) ?|))
! (or (aref fill-nospace-between-words-table next)
! (aref fill-nospace-between-words-table prev)))
(delete-char -1))))))
(goto-char from)
--- 482,491 ----
(replace-match (get-text-property (match-beginning 0) 'fill-space))
(let ((prev (char-before (match-beginning 0)))
(next (following-char)))
! (if (and (aref (char-category-set next) ?|)
! (aref (char-category-set prev) ?|)
! (aref fill-nospace-between-words-table next)
! (aref fill-nospace-between-words-table prev))
(delete-char -1))))))
(goto-char from)
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#9286
; Package
emacs
.
(Sat, 20 Aug 2011 20:08:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 9286 <at> debbugs.gnu.org (full text, mbox):
CY> If I am decoding the jidanni-speak correctly
Yes correct.
Merged 9286 34463.
Request was from
Glenn Morris <rgm <at> gnu.org>
to
control <at> debbugs.gnu.org
.
(Tue, 12 Feb 2019 22:14:02 GMT)
Full text and
rfc822 format available.
Added tag(s) patch.
Request was from
Lars Ingebrigtsen <larsi <at> gnus.org>
to
control <at> debbugs.gnu.org
.
(Wed, 09 Oct 2019 22:29:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#9286
; Package
emacs
.
(Wed, 09 Oct 2019 22:31:02 GMT)
Full text and
rfc822 format available.
Message #18 received at 9286 <at> debbugs.gnu.org (full text, mbox):
Chong Yidong <cyd <at> stupidchicken.com> writes:
> If I am decoding the jidanni-speak correctly, his complaint is doing M-q
> on a buffer containing
>
> asdf
> 國
>
> turns the text into
>
> asdf國
>
> instead of what he wants:
>
> asdf 國
>
> This is because line joining does not include a space if *either*
> character on each side of the newline has the ?| (line-breakable)
> category and an entry in fill-nospace-between-words-table. To get the
> behavior jidanni wants, we could change it so that *both* the characters
> must have this property; see attached patch.
>
> But I am not sure this is TRT in general. Handa-san, could you weigh in
> with an opinion? Adding a space seems more or less correct to me, but I
> am no expert.
This problem is still present in Emacs 27. This patch, from 2011, was
never applied. I think Chong's proposal sounds logical, but like him,
I'm (ahem) no expert.
> *** lisp/textmodes/fill.el 2011-07-16 20:05:54 +0000
> --- lisp/textmodes/fill.el 2011-08-20 19:52:41 +0000
> ***************
> *** 482,491 ****
> (replace-match (get-text-property (match-beginning 0) 'fill-space))
> (let ((prev (char-before (match-beginning 0)))
> (next (following-char)))
> ! (if (and (or (aref (char-category-set next) ?|)
> ! (aref (char-category-set prev) ?|))
> ! (or (aref fill-nospace-between-words-table next)
> ! (aref fill-nospace-between-words-table prev)))
> (delete-char -1))))))
>
> (goto-char from)
> --- 482,491 ----
> (replace-match (get-text-property (match-beginning 0) 'fill-space))
> (let ((prev (char-before (match-beginning 0)))
> (next (following-char)))
> ! (if (and (aref (char-category-set next) ?|)
> ! (aref (char-category-set prev) ?|)
> ! (aref fill-nospace-between-words-table next)
> ! (aref fill-nospace-between-words-table prev))
> (delete-char -1))))))
>
> (goto-char from)
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#9286
; Package
emacs
.
(Thu, 10 Oct 2019 07:44:02 GMT)
Full text and
rfc822 format available.
Message #21 received at 9286 <at> debbugs.gnu.org (full text, mbox):
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Date: Thu, 10 Oct 2019 00:30:54 +0200
> Cc: 34463 <at> debbugs.gnu.org, jidanni <at> jidanni.org, Kenichi Handa <handa <at> m17n.org>,
> 9286 <at> debbugs.gnu.org
>
> > This is because line joining does not include a space if *either*
> > character on each side of the newline has the ?| (line-breakable)
> > category and an entry in fill-nospace-between-words-table. To get the
> > behavior jidanni wants, we could change it so that *both* the characters
> > must have this property; see attached patch.
> >
> > But I am not sure this is TRT in general. Handa-san, could you weigh in
> > with an opinion? Adding a space seems more or less correct to me, but I
> > am no expert.
>
> This problem is still present in Emacs 27. This patch, from 2011, was
> never applied. I think Chong's proposal sounds logical, but like him,
> I'm (ahem) no expert.
Since Kenichi didn't respond, I think we should study what the Unicode
Line-breaking Algorithm has to say about that. Can you look there for
relevant guidance? We don't yet implement the complete algorithm, but
some of what they say could nevertheless be used to resolve this
issue.
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#9286
; Package
emacs
.
(Fri, 11 Oct 2019 07:00:02 GMT)
Full text and
rfc822 format available.
Message #24 received at 9286 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
> Since Kenichi didn't respond, I think we should study what the Unicode
> Line-breaking Algorithm has to say about that. Can you look there for
> relevant guidance? We don't yet implement the complete algorithm, but
> some of what they say could nevertheless be used to resolve this
> issue.
That would be this:
https://unicode.org/reports/tr14/
I have just skimmed it, but I can't see that it says anything helpful
about filling/folding lines.
If I read it correctly, then it's perfectly allowed to line-break
asdf國
into
asdf
國
But it doesn't say what software should do when filling
asdf
國
Presumably filling that into
asdf國
would be correct in many circumstances, but as Dan said, if it's really
http://google.com
國
then filling that into
http://google.com國
is most likely wrong. So if we want to be cautious, then applying
Chong's patch seems to be the right thing: Adding the space will lead
to things working more of the time, while the downside is that somebody
might prefer
asdf國
visually. I think.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#9286
; Package
emacs
.
(Sat, 23 Nov 2019 14:02:13 GMT)
Full text and
rfc822 format available.
Message #27 received at 9286 <at> debbugs.gnu.org (full text, mbox):
Lars Ingebrigtsen <larsi <at> gnus.org> writes:
> That would be this:
>
> https://unicode.org/reports/tr14/
>
> I have just skimmed it, but I can't see that it says anything helpful
> about filling/folding lines.
Ah, this is all moot -- in Emacs 26, the
fill-separate-heterogeneous-words-with-space variable was introduced,
which gives the behaviour that Dan wants (and is similar to Chong's
patch, only guarded by that variable).
So I'm closing this bug report.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Added tag(s) fixed.
Request was from
Lars Ingebrigtsen <larsi <at> gnus.org>
to
control <at> debbugs.gnu.org
.
(Sat, 23 Nov 2019 14:03:01 GMT)
Full text and
rfc822 format available.
bug marked as fixed in version 26.1, send any further explanations to
9286 <at> debbugs.gnu.org and jidanni <at> jidanni.org
Request was from
Lars Ingebrigtsen <larsi <at> gnus.org>
to
control <at> debbugs.gnu.org
.
(Sat, 23 Nov 2019 14:03:04 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sun, 22 Dec 2019 12:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 4 years and 126 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.