GNU bug report logs - #9286
fill-paragraph destroys URLs

Previous Next

Package: emacs;

Reported by: jidanni <at> jidanni.org

Date: Thu, 11 Aug 2011 21:30:02 UTC

Severity: minor

Tags: fixed, patch

Merged with 34463

Fixed in version 26.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 9286 in the body.
You can then email your comments to 9286 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, jidanni <at> jidanni.org, bug-gnu-emacs <at> gnu.org:
bug#9286; Package emacs. (Thu, 11 Aug 2011 21:30:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to jidanni <at> jidanni.org:
New bug report received and forwarded. Copy sent to jidanni <at> jidanni.org, bug-gnu-emacs <at> gnu.org. (Thu, 11 Aug 2011 21:30:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: jidanni <at> jidanni.org
To: bug-gnu-emacs <at> gnu.org
Cc: handa <at> etl.go.jp, yamaoka <at> jpl.org
Subject: fill-paragraph destroys URLs
Date: Fri, 12 Aug 2011 05:25:41 +0800
Gentlemen, watch as emacs' fill-paragraph hatefully victimizes this
http://goo.gl/rThbu URL below while leaving the others unscathed.

$ cat a.txt
CD 定義:
http://goo.gl/rThbu
國內代表性網站:
http://smcj.net/
http://ragii.com/ 另參:
http://www.facebook.com/tg.taiwan
http://www.flickr.com/groups/tg-taiwan/
Our membership target is a Taiwan audience at this time.
煩請通知社團管理員您真的是否確定要加入,以免 spam.

$ LC_CTYPE=zh_TW.UTF-8 emacs a.txt
M-q
CD 定義: http://goo.gl/rThbu國內代表性網站: http://smcj.net/
http://ragii.com/ 另參: http://www.facebook.com/tg.taiwan
http://www.flickr.com/groups/tg-taiwan/ Our membership target is a
Taiwan audience at this time. 煩請通知社團管理員您真的是否確定要加入,以
免 spam.

Emacs _just assumes_ it is OK to ram 'u' into '國' if it crosses a newline.

Allow us to hit M-q on

u 國 u 國 u 國 u 國 u 國 u 國
u 國 u 國 u 國 u 國 u 國 u 國

We come up with:

u 國 u 國 u 國 u 國 u 國 u 國u 國 u 國 u 國 u 國 u 國 u 國

No kidding, deep in ones essays emacs is secretly destroying certain URLs as we speak.

My point is if emacs is brazen enough to squeeze out a newline, then it
should be brazen enough to squeeze out a space. But better yet don't be
brazen enough at all.

Further experiments
國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國
becomes:
國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國
國 國 國 國 國 國 國 國 國 國 國 國
but
國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國
國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國
becomes:
國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國
國 國 國 國 國 國 國 國 國 國 國 國國 國 國 國 國 國 國 國 國 國 國 國 國
國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國

OK have it your way, but at least don't join syntax of u into syntax of
Chinese... P.S., don't send me a fix just for me. I'm reporting a bug
not asking for help.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#9286; Package emacs. (Sat, 20 Aug 2011 20:02:01 GMT) Full text and rfc822 format available.

Message #8 received at 9286 <at> debbugs.gnu.org (full text, mbox):

From: Chong Yidong <cyd <at> stupidchicken.com>
To: Kenichi Handa <handa <at> m17n.org>
Cc: 9286 <at> debbugs.gnu.org, jidanni <at> jidanni.org
Subject: Re: bug#9286: fill-paragraph destroys URLs
Date: Sat, 20 Aug 2011 15:58:52 -0400
If I am decoding the jidanni-speak correctly, his complaint is doing M-q
on a buffer containing

asdf
國

turns the text into

asdf國

instead of what he wants:

asdf 國


This is because line joining does not include a space if *either*
character on each side of the newline has the ?| (line-breakable)
category and an entry in fill-nospace-between-words-table.  To get the
behavior jidanni wants, we could change it so that *both* the characters
must have this property; see attached patch.

But I am not sure this is TRT in general.  Handa-san, could you weigh in
with an opinion?  Adding a space seems more or less correct to me, but I
am no expert.


*** lisp/textmodes/fill.el	2011-07-16 20:05:54 +0000
--- lisp/textmodes/fill.el	2011-08-20 19:52:41 +0000
***************
*** 482,491 ****
  	    (replace-match (get-text-property (match-beginning 0) 'fill-space))
  	  (let ((prev (char-before (match-beginning 0)))
  		(next (following-char)))
! 	    (if (and (or (aref (char-category-set next) ?|)
! 			 (aref (char-category-set prev) ?|))
! 		     (or (aref fill-nospace-between-words-table next)
! 			 (aref fill-nospace-between-words-table prev)))
  		(delete-char -1))))))
  
    (goto-char from)
--- 482,491 ----
  	    (replace-match (get-text-property (match-beginning 0) 'fill-space))
  	  (let ((prev (char-before (match-beginning 0)))
  		(next (following-char)))
! 	    (if (and (aref (char-category-set next) ?|)
! 		     (aref (char-category-set prev) ?|)
! 		     (aref fill-nospace-between-words-table next)
! 		     (aref fill-nospace-between-words-table prev))
  		(delete-char -1))))))
  
    (goto-char from)





Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#9286; Package emacs. (Sat, 20 Aug 2011 20:08:01 GMT) Full text and rfc822 format available.

Message #11 received at 9286 <at> debbugs.gnu.org (full text, mbox):

From: jidanni <at> jidanni.org
To: cyd <at> stupidchicken.com
Cc: 9286 <at> debbugs.gnu.org, handa <at> m17n.org
Subject: Re: bug#9286: fill-paragraph destroys URLs
Date: Sun, 21 Aug 2011 04:04:49 +0800
CY> If I am decoding the jidanni-speak correctly
Yes correct.




Merged 9286 34463. Request was from Glenn Morris <rgm <at> gnu.org> to control <at> debbugs.gnu.org. (Tue, 12 Feb 2019 22:14:02 GMT) Full text and rfc822 format available.

Added tag(s) patch. Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Wed, 09 Oct 2019 22:29:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#9286; Package emacs. (Wed, 09 Oct 2019 22:31:02 GMT) Full text and rfc822 format available.

Message #18 received at 9286 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Chong Yidong <cyd <at> stupidchicken.com>
Cc: 34463 <at> debbugs.gnu.org, 9286 <at> debbugs.gnu.org, jidanni <at> jidanni.org,
 Kenichi Handa <handa <at> m17n.org>
Subject: Re: bug#9286: fill-paragraph destroys URLs
Date: Thu, 10 Oct 2019 00:30:54 +0200
Chong Yidong <cyd <at> stupidchicken.com> writes:

> If I am decoding the jidanni-speak correctly, his complaint is doing M-q
> on a buffer containing
>
> asdf
> 國
>
> turns the text into
>
> asdf國
>
> instead of what he wants:
>
> asdf 國
>
> This is because line joining does not include a space if *either*
> character on each side of the newline has the ?| (line-breakable)
> category and an entry in fill-nospace-between-words-table.  To get the
> behavior jidanni wants, we could change it so that *both* the characters
> must have this property; see attached patch.
>
> But I am not sure this is TRT in general.  Handa-san, could you weigh in
> with an opinion?  Adding a space seems more or less correct to me, but I
> am no expert.

This problem is still present in Emacs 27.  This patch, from 2011, was
never applied.  I think Chong's proposal sounds logical, but like him,
I'm (ahem) no expert.

> *** lisp/textmodes/fill.el	2011-07-16 20:05:54 +0000
> --- lisp/textmodes/fill.el	2011-08-20 19:52:41 +0000
> ***************
> *** 482,491 ****
>   	    (replace-match (get-text-property (match-beginning 0) 'fill-space))
>   	  (let ((prev (char-before (match-beginning 0)))
>   		(next (following-char)))
> ! 	    (if (and (or (aref (char-category-set next) ?|)
> ! 			 (aref (char-category-set prev) ?|))
> ! 		     (or (aref fill-nospace-between-words-table next)
> ! 			 (aref fill-nospace-between-words-table prev)))
>   		(delete-char -1))))))
>
>     (goto-char from)
> --- 482,491 ----
>   	    (replace-match (get-text-property (match-beginning 0) 'fill-space))
>   	  (let ((prev (char-before (match-beginning 0)))
>   		(next (following-char)))
> ! 	    (if (and (aref (char-category-set next) ?|)
> ! 		     (aref (char-category-set prev) ?|)
> ! 		     (aref fill-nospace-between-words-table next)
> ! 		     (aref fill-nospace-between-words-table prev))
>   		(delete-char -1))))))
>
>     (goto-char from)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#9286; Package emacs. (Thu, 10 Oct 2019 07:44:02 GMT) Full text and rfc822 format available.

Message #21 received at 9286 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 9286 <at> debbugs.gnu.org, 34463 <at> debbugs.gnu.org, cyd <at> stupidchicken.com,
 handa <at> m17n.org, jidanni <at> jidanni.org
Subject: Re: bug#9286: fill-paragraph destroys URLs
Date: Thu, 10 Oct 2019 10:43:07 +0300
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Date: Thu, 10 Oct 2019 00:30:54 +0200
> Cc: 34463 <at> debbugs.gnu.org, jidanni <at> jidanni.org, Kenichi Handa <handa <at> m17n.org>,
>  9286 <at> debbugs.gnu.org
> 
> > This is because line joining does not include a space if *either*
> > character on each side of the newline has the ?| (line-breakable)
> > category and an entry in fill-nospace-between-words-table.  To get the
> > behavior jidanni wants, we could change it so that *both* the characters
> > must have this property; see attached patch.
> >
> > But I am not sure this is TRT in general.  Handa-san, could you weigh in
> > with an opinion?  Adding a space seems more or less correct to me, but I
> > am no expert.
> 
> This problem is still present in Emacs 27.  This patch, from 2011, was
> never applied.  I think Chong's proposal sounds logical, but like him,
> I'm (ahem) no expert.

Since Kenichi didn't respond, I think we should study what the Unicode
Line-breaking Algorithm has to say about that.  Can you look there for
relevant guidance?  We don't yet implement the complete algorithm, but
some of what they say could nevertheless be used to resolve this
issue.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#9286; Package emacs. (Fri, 11 Oct 2019 07:00:02 GMT) Full text and rfc822 format available.

Message #24 received at 9286 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 9286 <at> debbugs.gnu.org, 34463 <at> debbugs.gnu.org, cyd <at> stupidchicken.com,
 handa <at> m17n.org, jidanni <at> jidanni.org
Subject: Re: bug#9286: fill-paragraph destroys URLs
Date: Fri, 11 Oct 2019 08:58:52 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

> Since Kenichi didn't respond, I think we should study what the Unicode
> Line-breaking Algorithm has to say about that.  Can you look there for
> relevant guidance?  We don't yet implement the complete algorithm, but
> some of what they say could nevertheless be used to resolve this
> issue.

That would be this:

https://unicode.org/reports/tr14/

I have just skimmed it, but I can't see that it says anything helpful
about filling/folding lines.

If I read it correctly, then it's perfectly allowed to line-break

asdf國

into

asdf
國

But it doesn't say what software should do when filling

asdf
國

Presumably filling that into

asdf國

would be correct in many circumstances, but as Dan said, if it's really

http://google.com
國

then filling that into 

http://google.com國

is most likely wrong.  So if we want to be cautious, then applying
Chong's patch seems to be the right thing:  Adding the space will lead
to things working more of the time, while the downside is that somebody
might prefer 

asdf國

visually.  I think.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#9286; Package emacs. (Sat, 23 Nov 2019 14:02:13 GMT) Full text and rfc822 format available.

Message #27 received at 9286 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 9286 <at> debbugs.gnu.org, cyd <at> stupidchicken.com, handa <at> m17n.org,
 jidanni <at> jidanni.org, 34463 <at> debbugs.gnu.org
Subject: Re: bug#9286: fill-paragraph destroys URLs
Date: Sat, 23 Nov 2019 15:00:55 +0100
Lars Ingebrigtsen <larsi <at> gnus.org> writes:

> That would be this:
>
> https://unicode.org/reports/tr14/
>
> I have just skimmed it, but I can't see that it says anything helpful
> about filling/folding lines.

Ah, this is all moot -- in Emacs 26, the
fill-separate-heterogeneous-words-with-space variable was introduced,
which gives the behaviour that Dan wants (and is similar to Chong's
patch, only guarded by that variable).

So I'm closing this bug report.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Added tag(s) fixed. Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Sat, 23 Nov 2019 14:03:01 GMT) Full text and rfc822 format available.

bug marked as fixed in version 26.1, send any further explanations to 9286 <at> debbugs.gnu.org and jidanni <at> jidanni.org Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Sat, 23 Nov 2019 14:03:04 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 22 Dec 2019 12:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 126 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.