GNU bug report logs - #78561
[PATCH] Add semantic linefeed support for paragraph filling

Previous Next

Package: emacs;

Reported by: Roi Martin <jroi.martin <at> gmail.com>

Date: Fri, 23 May 2025 09:59:02 UTC

Severity: normal

Tags: patch

To reply to this bug, email your comments to 78561 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to mbork <at> mbork.pl, bug-gnu-emacs <at> gnu.org:
bug#78561; Package emacs. (Fri, 23 May 2025 09:59:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Roi Martin <jroi.martin <at> gmail.com>:
New bug report received and forwarded. Copy sent to mbork <at> mbork.pl, bug-gnu-emacs <at> gnu.org. (Fri, 23 May 2025 09:59:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Roi Martin <jroi.martin <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: [PATCH] Add semantic linefeed support for paragraph filling
Date: Fri, 23 May 2025 11:58:02 +0200
[Message part 1 (text/plain, inline)]
Tags: patch

This patch adds semantic linefeed support for paragraph filling.  The
functionality has been discussed in the emacs-devel mailing list in the
following threads:

- Fill paragraph using semantic linefeeds: https://lists.gnu.org/archive/html/emacs-devel/2025-03/msg00035.html
- [GNU ELPA] New package: semlf: https://lists.gnu.org/archive/html/emacs-devel/2025-03/msg00702.html

In the second thread we agreed on sending a patch to core instead of
adding a new package to GNU ELPA.

Given that this is a first version, I have not added any reference to
the manuals.  If you think it makes sense, please let me know and I'll
modify the patch accordingly.

What follows is a detailed explanation of the term semantic linefeeds,
so we have all the information in one single place.

The term "semantic linefeeds" or "semantic line breaks" refers to a set
of conventions for using insensitive vertical whitespace to structure
prose along semantic boundaries.

The concept was first introduced by Brian Kernighan in "UNIX for
Beginners" [1] in October 1974.

  Hints for Preparing Documents
  
  Most documents go through several versions (always more than you
  expected) before they are finally finished.  Accordingly, you should
  do whatever possible to make the job of changing them easy.
  
  First, when you do the purely mechanical operations of typing, type so
  subsequent editing will be easy.  Start each sentence on a new line.
  Make lines short, and break lines at natural places, such as after
  commas and semicolons, rather than randomly.  Since most people change
  documents by rewriting phrases and adding, deleting and rearranging
  sentences, these precautions simplify any editing you have to do
  later.

Semantic linefeeds are usually used with markup languages that are not
sensitive to newlines when exported to a different format (e.g. Org,
Texinfo, Markdown).

Let's say that we have the following paragraph in an Org document:

  Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
  tempor.  Incididunt ut labore et dolore magna aliqua.  Ut enim ad minim
  veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
  commodo consequat.

After filling the paragraph using semantic linefeeds, the result is:

  Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
  tempor.
  Incididunt ut labore et dolore magna aliqua.
  Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi
  ut aliquip ex ea commodo consequat.

However, when exported, in both cases the result is:

  Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
  tempor.  Incididunt ut labore et dolore magna aliqua.  Ut enim ad minim
  veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
  commodo consequat.

So, what are the benefits?

One of the greatest benefits is that semantic linefeeds are "diff
friendly".

For example,

  -Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
  -tempor.  Incididunt ut labore et dolore magna aliqua.  Ut enim ad minim
  -veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
  -commodo consequat.
  +Lorem ipsum dolor sit amet, XXXXX consectetur adipiscing elit, sed do
  +eiusmod tempor.  Incididunt ut labore et dolore magna aliqua.  Ut enim
  +ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
  +aliquip ex ea commodo consequat.

Versus,

  -Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
  -tempor.
  +Lorem ipsum dolor sit amet, XXXXX consectetur adipiscing elit, sed do
  +eiusmod tempor.
   Incididunt ut labore et dolore magna aliqua.
   Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi
   ut aliquip ex ea commodo consequat.

Semantic linefeeds make easier to spot that the word "XXXXX" was added
in the first line.

Also, they are convenient during code reviews.  Shorter diffs and
separating "ideas" with newlines allow to be more accurate when adding
comments.

The site "Semantic Line Breaks" [2] by Mattt and the blog post "Semantic
Linefeeds" [3] by Brandon Rhodes are both excellent references.

[1] https://web.archive.org/web/20130108163017if_/http://miffy.tom-yam.or.jp:80/2238/ref/beg.pdf
[2] https://sembr.org/
[3] https://rhodesmill.org/brandon/2012/one-sentence-per-line/

[0001-Add-semantic-linefeed-support-for-paragraph-filling.patch (text/patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78561; Package emacs. (Fri, 23 May 2025 11:12:02 GMT) Full text and rfc822 format available.

Message #8 received at 78561 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Roi Martin <jroi.martin <at> gmail.com>
Cc: mbork <at> mbork.pl, 78561 <at> debbugs.gnu.org
Subject: Re: bug#78561: [PATCH] Add semantic linefeed support for paragraph
 filling
Date: Fri, 23 May 2025 14:11:35 +0300
> Cc: Marcin Borkowski <mbork <at> mbork.pl>
> From: Roi Martin <jroi.martin <at> gmail.com>
> Date: Fri, 23 May 2025 11:58:02 +0200
> 
> This patch adds semantic linefeed support for paragraph filling.  The
> functionality has been discussed in the emacs-devel mailing list in the
> following threads:
> 
> - Fill paragraph using semantic linefeeds: https://lists.gnu.org/archive/html/emacs-devel/2025-03/msg00035.html
> - [GNU ELPA] New package: semlf: https://lists.gnu.org/archive/html/emacs-devel/2025-03/msg00702.html
> 
> In the second thread we agreed on sending a patch to core instead of
> adding a new package to GNU ELPA.
> 
> Given that this is a first version, I have not added any reference to
> the manuals.  If you think it makes sense, please let me know and I'll
> modify the patch accordingly.
> 
> What follows is a detailed explanation of the term semantic linefeeds,
> so we have all the information in one single place.
> 
> The term "semantic linefeeds" or "semantic line breaks" refers to a set
> of conventions for using insensitive vertical whitespace to structure
> prose along semantic boundaries.
> 
> The concept was first introduced by Brian Kernighan in "UNIX for
> Beginners" [1] in October 1974.
> 
>   Hints for Preparing Documents
>   
>   Most documents go through several versions (always more than you
>   expected) before they are finally finished.  Accordingly, you should
>   do whatever possible to make the job of changing them easy.
>   
>   First, when you do the purely mechanical operations of typing, type so
>   subsequent editing will be easy.  Start each sentence on a new line.
>   Make lines short, and break lines at natural places, such as after
>   commas and semicolons, rather than randomly.  Since most people change
>   documents by rewriting phrases and adding, deleting and rearranging
>   sentences, these precautions simplify any editing you have to do
>   later.
> 
> Semantic linefeeds are usually used with markup languages that are not
> sensitive to newlines when exported to a different format (e.g. Org,
> Texinfo, Markdown).
> 
> Let's say that we have the following paragraph in an Org document:
> 
>   Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
>   tempor.  Incididunt ut labore et dolore magna aliqua.  Ut enim ad minim
>   veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
>   commodo consequat.
> 
> After filling the paragraph using semantic linefeeds, the result is:
> 
>   Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
>   tempor.
>   Incididunt ut labore et dolore magna aliqua.
>   Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi
>   ut aliquip ex ea commodo consequat.
> 
> However, when exported, in both cases the result is:
> 
>   Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
>   tempor.  Incididunt ut labore et dolore magna aliqua.  Ut enim ad minim
>   veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
>   commodo consequat.
> 
> So, what are the benefits?
> 
> One of the greatest benefits is that semantic linefeeds are "diff
> friendly".
> 
> For example,
> 
>   -Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
>   -tempor.  Incididunt ut labore et dolore magna aliqua.  Ut enim ad minim
>   -veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
>   -commodo consequat.
>   +Lorem ipsum dolor sit amet, XXXXX consectetur adipiscing elit, sed do
>   +eiusmod tempor.  Incididunt ut labore et dolore magna aliqua.  Ut enim
>   +ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
>   +aliquip ex ea commodo consequat.
> 
> Versus,
> 
>   -Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
>   -tempor.
>   +Lorem ipsum dolor sit amet, XXXXX consectetur adipiscing elit, sed do
>   +eiusmod tempor.
>    Incididunt ut labore et dolore magna aliqua.
>    Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi
>    ut aliquip ex ea commodo consequat.
> 
> Semantic linefeeds make easier to spot that the word "XXXXX" was added
> in the first line.
> 
> Also, they are convenient during code reviews.  Shorter diffs and
> separating "ideas" with newlines allow to be more accurate when adding
> comments.
> 
> The site "Semantic Line Breaks" [2] by Mattt and the blog post "Semantic
> Linefeeds" [3] by Brandon Rhodes are both excellent references.
> 
> [1] https://web.archive.org/web/20130108163017if_/http://miffy.tom-yam.or.jp:80/2238/ref/beg.pdf
> [2] https://sembr.org/
> [3] https://rhodesmill.org/brandon/2012/one-sentence-per-line/

Thanks.

> +(defun fill-paragraph-semlf (&optional justify)
> +  "Fill paragraph at or after point using semantic linefeeds.
> +
> +This function ensures that a newline character follows every
> +sentence, as punctuated by a period (.), exclamation mark (!), or
> +question mark (?).

This explanation of what is "semantic linefeeds" is a good starting
point, but it is not enough.  For starters, "ensures" hints but
doesn't say explicitly that if there's no newline there, it is
inserted.  Also, I think a URL to at least one site explaining what
"semantic linefeeds" are should be in the doc string.

> +	  (when (and (> (point) (line-beginning-position))
> +		     (< (point) (line-end-position)))
> +	    (delete-horizontal-space)
> +	    (newline)

Are you sure 'newline' is the right function to call here?  It doesn't
just insert the newline character, at least not in all the cases.
Perhaps inserting a literal newline character is better?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78561; Package emacs. (Fri, 23 May 2025 15:06:02 GMT) Full text and rfc822 format available.

Message #11 received at 78561 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Roi Martin <jroi.martin <at> gmail.com>
Cc: Marcin Borkowski <mbork <at> mbork.pl>, 78561 <at> debbugs.gnu.org
Subject: Re: bug#78561: [PATCH] Add semantic linefeed support for paragraph
 filling
Date: Fri, 23 May 2025 11:04:48 -0400
> Given that this is a first version, I have not added any reference to
> the manuals.  If you think it makes sense, please let me know and I'll
> modify the patch accordingly.

Maybe a short version of the explanation you give below would be good to
have in the manual (tho Eli suggests a URL instead, so maybe that's
good enough?).

> +(defun fill-paragraph-semlf (&optional justify)
> +  "Fill paragraph at or after point using semantic linefeeds.
> +
> +This function ensures that a newline character follows every
> +sentence, as punctuated by a period (.), exclamation mark (!), or
> +question mark (?).

This seems inaccurate: it just uses whichever definition of sentence is
used by `forward-sentence`, so it may ignore some of those chars or pay
attention to others.

> +If JUSTIFY is non-nil (interactively, with prefix argument), justify as
> +well.  If `sentence-end-double-space' is non-nil, then period followed
> +by one space does not end a sentence, so don't break a line there.  The
> +variable `fill-column' controls the width for filling."

I'd move the "The" to the last line.  🙂

> +  (interactive "P")
> +  (save-excursion
> +    (let ((end (progn
> +		 (fill-forward-paragraph 1)
> +		 (backward-word)
> +		 (end-of-line)
> +		 (point)))
> +	  (start (progn
> +		   (fill-forward-paragraph -1)
> +		   (forward-word)
> +		   (beginning-of-line)
> +		   (point)))
> +	  pfx)
> +      (with-restriction start end
> +	(let ((fill-column (point-max)))
> +	  (setq pfx (or (fill-region-as-paragraph (point-min) (point-max)) "")))
> +	(goto-char (point-min))
> +	(while (not (eobp))
> +	  (let ((fill-prefix pfx))
> +	    (fill-region-as-paragraph (point)
> +				      (progn (forward-sentence) (point))
> +				      justify))
> +	  (when (and (> (point) (line-beginning-position))
> +		     (< (point) (line-end-position)))
> +	    (delete-horizontal-space)
> +	    (newline)
> +	    (insert pfx))))))
> +  t)

Please try and separate it into a `fill-region-semlf` function and then
another one which applies it to a paragraph, so that it can also be used
to fill a specific user-specified region (or the whole buffer).


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78561; Package emacs. (Sat, 24 May 2025 12:16:01 GMT) Full text and rfc822 format available.

Message #14 received at 78561 <at> debbugs.gnu.org (full text, mbox):

From: Roi Martin <jroi.martin <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: mbork <at> mbork.pl, monnier <at> iro.umontreal.ca, 78561 <at> debbugs.gnu.org
Subject: Re: bug#78561: [PATCH] Add semantic linefeed support for paragraph
 filling
Date: Sat, 24 May 2025 14:15:37 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

>> +(defun fill-paragraph-semlf (&optional justify)
>> +  "Fill paragraph at or after point using semantic linefeeds.
>> +
>> +This function ensures that a newline character follows every
>> +sentence, as punctuated by a period (.), exclamation mark (!), or
>> +question mark (?).
>
> This explanation of what is "semantic linefeeds" is a good starting
> point, but it is not enough.  For starters, "ensures" hints but
> doesn't say explicitly that if there's no newline there, it is
> inserted.  Also, I think a URL to at least one site explaining what
> "semantic linefeeds" are should be in the doc string.

I would prefer to avoid depending on external URLs to explain the
concept.  I'd link to an external reference if, for instance, this
feature was backed by an standard located in a well-known site
(e.g. IETF RFCs).  In this case, the concept is quite simple and I agree
with Stefan in that we can provide our own interpretation in the manual
and link to the Info node from the doc string.  If you prefer to avoid
changing the manual until this is well tested, then we can provide a
more detailed explanation in the doc string itself.  What do you think?

>> +	  (when (and (> (point) (line-beginning-position))
>> +		     (< (point) (line-end-position)))
>> +	    (delete-horizontal-space)
>> +	    (newline)
>
> Are you sure 'newline' is the right function to call here?  It doesn't
> just insert the newline character, at least not in all the cases.
> Perhaps inserting a literal newline character is better?

The reason behind using 'newline' is to support documents that follow
other conventions to represent newlines (e.g. '\r\n' or '\r').  Does it
make sense?  Is this the right approach?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78561; Package emacs. (Sat, 24 May 2025 13:03:03 GMT) Full text and rfc822 format available.

Message #17 received at 78561 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Roi Martin <jroi.martin <at> gmail.com>
Cc: mbork <at> mbork.pl, monnier <at> iro.umontreal.ca, 78561 <at> debbugs.gnu.org
Subject: Re: bug#78561: [PATCH] Add semantic linefeed support for paragraph
 filling
Date: Sat, 24 May 2025 16:02:15 +0300
> From: Roi Martin <jroi.martin <at> gmail.com>
> Cc: 78561 <at> debbugs.gnu.org, mbork <at> mbork.pl, monnier <at> iro.umontreal.ca
> Date: Sat, 24 May 2025 14:15:37 +0200
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > This explanation of what is "semantic linefeeds" is a good starting
> > point, but it is not enough.  For starters, "ensures" hints but
> > doesn't say explicitly that if there's no newline there, it is
> > inserted.  Also, I think a URL to at least one site explaining what
> > "semantic linefeeds" are should be in the doc string.
> 
> I would prefer to avoid depending on external URLs to explain the
> concept.

Since the concept came from outside, why not?

> I'd link to an external reference if, for instance, this
> feature was backed by an standard located in a well-known site
> (e.g. IETF RFCs).  In this case, the concept is quite simple and I agree
> with Stefan in that we can provide our own interpretation in the manual
> and link to the Info node from the doc string.

There's no contradiction: we could describe this in our documentation
and also mention the external references.  We do that, for example,
for Unicode-related features.

> >> +	  (when (and (> (point) (line-beginning-position))
> >> +		     (< (point) (line-end-position)))
> >> +	    (delete-horizontal-space)
> >> +	    (newline)
> >
> > Are you sure 'newline' is the right function to call here?  It doesn't
> > just insert the newline character, at least not in all the cases.
> > Perhaps inserting a literal newline character is better?
> 
> The reason behind using 'newline' is to support documents that follow
> other conventions to represent newlines (e.g. '\r\n' or '\r').  Does it
> make sense?  Is this the right approach?

In Emacs, there's only one "newline convention", the one that uses the
newline (LFD) character.  The different en d-of-line conventions are
supported during I/O: we "encode" newlines as CR-LF pair for Windows,
for example, when saving buffers to files, and "decode" CR-LF back
into a single newline when reading files into buffers.

By contrast, the 'newline' function does other things, in addition to
inserting the newline character; see its documentation for the
details.  It seems to me that some of those additional actions is not
something this feature will want, because this feature is _only_ about
where to break text into physical lines.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78561; Package emacs. (Sat, 24 May 2025 13:39:02 GMT) Full text and rfc822 format available.

Message #20 received at 78561 <at> debbugs.gnu.org (full text, mbox):

From: Roi Martin <jroi.martin <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: mbork <at> mbork.pl, monnier <at> iro.umontreal.ca, 78561 <at> debbugs.gnu.org
Subject: Re: bug#78561: [PATCH] Add semantic linefeed support for paragraph
 filling
Date: Sat, 24 May 2025 15:38:40 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Roi Martin <jroi.martin <at> gmail.com>
>> Cc: 78561 <at> debbugs.gnu.org, mbork <at> mbork.pl, monnier <at> iro.umontreal.ca
>> Date: Sat, 24 May 2025 14:15:37 +0200
>> 
>> Eli Zaretskii <eliz <at> gnu.org> writes:
>> 
>> > This explanation of what is "semantic linefeeds" is a good starting
>> > point, but it is not enough.  For starters, "ensures" hints but
>> > doesn't say explicitly that if there's no newline there, it is
>> > inserted.  Also, I think a URL to at least one site explaining what
>> > "semantic linefeeds" are should be in the doc string.
>> 
>> I would prefer to avoid depending on external URLs to explain the
>> concept.
>
> Since the concept came from outside, why not?
>
>> I'd link to an external reference if, for instance, this
>> feature was backed by an standard located in a well-known site
>> (e.g. IETF RFCs).  In this case, the concept is quite simple and I agree
>> with Stefan in that we can provide our own interpretation in the manual
>> and link to the Info node from the doc string.
>
> There's no contradiction: we could describe this in our documentation
> and also mention the external references.  We do that, for example,
> for Unicode-related features.

OK.  I'll update the patch accordingly.

>> >> +	  (when (and (> (point) (line-beginning-position))
>> >> +		     (< (point) (line-end-position)))
>> >> +	    (delete-horizontal-space)
>> >> +	    (newline)
>> >
>> > Are you sure 'newline' is the right function to call here?  It doesn't
>> > just insert the newline character, at least not in all the cases.
>> > Perhaps inserting a literal newline character is better?
>> 
>> The reason behind using 'newline' is to support documents that follow
>> other conventions to represent newlines (e.g. '\r\n' or '\r').  Does it
>> make sense?  Is this the right approach?
>
> In Emacs, there's only one "newline convention", the one that uses the
> newline (LFD) character.  The different en d-of-line conventions are
> supported during I/O: we "encode" newlines as CR-LF pair for Windows,
> for example, when saving buffers to files, and "decode" CR-LF back
> into a single newline when reading files into buffers.

Got it.  Thanks a lot for the explanation.  That simplifies things a
lot.

> By contrast, the 'newline' function does other things, in addition to
> inserting the newline character; see its documentation for the
> details.  It seems to me that some of those additional actions is not
> something this feature will want, because this feature is _only_ about
> where to break text into physical lines.

You are right.  I replaced it with

  (insert "\n")

Thanks!




This bug report was last modified today.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.