Package: emacs;
Reported by: Roi Martin <jroi.martin <at> gmail.com>
Date: Fri, 23 May 2025 09:59:02 UTC
Severity: normal
Tags: patch
To reply to this bug, email your comments to 78561 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
View this report as an mbox folder, status mbox, maintainer mbox
mbork <at> mbork.pl, bug-gnu-emacs <at> gnu.org
:bug#78561
; Package emacs
.
(Fri, 23 May 2025 09:59:02 GMT) Full text and rfc822 format available.Roi Martin <jroi.martin <at> gmail.com>
:mbork <at> mbork.pl, bug-gnu-emacs <at> gnu.org
.
(Fri, 23 May 2025 09:59:02 GMT) Full text and rfc822 format available.Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
From: Roi Martin <jroi.martin <at> gmail.com> To: bug-gnu-emacs <at> gnu.org Subject: [PATCH] Add semantic linefeed support for paragraph filling Date: Fri, 23 May 2025 11:58:02 +0200
[Message part 1 (text/plain, inline)]
Tags: patch This patch adds semantic linefeed support for paragraph filling. The functionality has been discussed in the emacs-devel mailing list in the following threads: - Fill paragraph using semantic linefeeds: https://lists.gnu.org/archive/html/emacs-devel/2025-03/msg00035.html - [GNU ELPA] New package: semlf: https://lists.gnu.org/archive/html/emacs-devel/2025-03/msg00702.html In the second thread we agreed on sending a patch to core instead of adding a new package to GNU ELPA. Given that this is a first version, I have not added any reference to the manuals. If you think it makes sense, please let me know and I'll modify the patch accordingly. What follows is a detailed explanation of the term semantic linefeeds, so we have all the information in one single place. The term "semantic linefeeds" or "semantic line breaks" refers to a set of conventions for using insensitive vertical whitespace to structure prose along semantic boundaries. The concept was first introduced by Brian Kernighan in "UNIX for Beginners" [1] in October 1974. Hints for Preparing Documents Most documents go through several versions (always more than you expected) before they are finally finished. Accordingly, you should do whatever possible to make the job of changing them easy. First, when you do the purely mechanical operations of typing, type so subsequent editing will be easy. Start each sentence on a new line. Make lines short, and break lines at natural places, such as after commas and semicolons, rather than randomly. Since most people change documents by rewriting phrases and adding, deleting and rearranging sentences, these precautions simplify any editing you have to do later. Semantic linefeeds are usually used with markup languages that are not sensitive to newlines when exported to a different format (e.g. Org, Texinfo, Markdown). Let's say that we have the following paragraph in an Org document: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor. Incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. After filling the paragraph using semantic linefeeds, the result is: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor. Incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. However, when exported, in both cases the result is: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor. Incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. So, what are the benefits? One of the greatest benefits is that semantic linefeeds are "diff friendly". For example, -Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod -tempor. Incididunt ut labore et dolore magna aliqua. Ut enim ad minim -veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea -commodo consequat. +Lorem ipsum dolor sit amet, XXXXX consectetur adipiscing elit, sed do +eiusmod tempor. Incididunt ut labore et dolore magna aliqua. Ut enim +ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut +aliquip ex ea commodo consequat. Versus, -Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod -tempor. +Lorem ipsum dolor sit amet, XXXXX consectetur adipiscing elit, sed do +eiusmod tempor. Incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Semantic linefeeds make easier to spot that the word "XXXXX" was added in the first line. Also, they are convenient during code reviews. Shorter diffs and separating "ideas" with newlines allow to be more accurate when adding comments. The site "Semantic Line Breaks" [2] by Mattt and the blog post "Semantic Linefeeds" [3] by Brandon Rhodes are both excellent references. [1] https://web.archive.org/web/20130108163017if_/http://miffy.tom-yam.or.jp:80/2238/ref/beg.pdf [2] https://sembr.org/ [3] https://rhodesmill.org/brandon/2012/one-sentence-per-line/
[0001-Add-semantic-linefeed-support-for-paragraph-filling.patch (text/patch, attachment)]
bug-gnu-emacs <at> gnu.org
:bug#78561
; Package emacs
.
(Fri, 23 May 2025 11:12:02 GMT) Full text and rfc822 format available.Message #8 received at 78561 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Roi Martin <jroi.martin <at> gmail.com> Cc: mbork <at> mbork.pl, 78561 <at> debbugs.gnu.org Subject: Re: bug#78561: [PATCH] Add semantic linefeed support for paragraph filling Date: Fri, 23 May 2025 14:11:35 +0300
> Cc: Marcin Borkowski <mbork <at> mbork.pl> > From: Roi Martin <jroi.martin <at> gmail.com> > Date: Fri, 23 May 2025 11:58:02 +0200 > > This patch adds semantic linefeed support for paragraph filling. The > functionality has been discussed in the emacs-devel mailing list in the > following threads: > > - Fill paragraph using semantic linefeeds: https://lists.gnu.org/archive/html/emacs-devel/2025-03/msg00035.html > - [GNU ELPA] New package: semlf: https://lists.gnu.org/archive/html/emacs-devel/2025-03/msg00702.html > > In the second thread we agreed on sending a patch to core instead of > adding a new package to GNU ELPA. > > Given that this is a first version, I have not added any reference to > the manuals. If you think it makes sense, please let me know and I'll > modify the patch accordingly. > > What follows is a detailed explanation of the term semantic linefeeds, > so we have all the information in one single place. > > The term "semantic linefeeds" or "semantic line breaks" refers to a set > of conventions for using insensitive vertical whitespace to structure > prose along semantic boundaries. > > The concept was first introduced by Brian Kernighan in "UNIX for > Beginners" [1] in October 1974. > > Hints for Preparing Documents > > Most documents go through several versions (always more than you > expected) before they are finally finished. Accordingly, you should > do whatever possible to make the job of changing them easy. > > First, when you do the purely mechanical operations of typing, type so > subsequent editing will be easy. Start each sentence on a new line. > Make lines short, and break lines at natural places, such as after > commas and semicolons, rather than randomly. Since most people change > documents by rewriting phrases and adding, deleting and rearranging > sentences, these precautions simplify any editing you have to do > later. > > Semantic linefeeds are usually used with markup languages that are not > sensitive to newlines when exported to a different format (e.g. Org, > Texinfo, Markdown). > > Let's say that we have the following paragraph in an Org document: > > Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod > tempor. Incididunt ut labore et dolore magna aliqua. Ut enim ad minim > veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea > commodo consequat. > > After filling the paragraph using semantic linefeeds, the result is: > > Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod > tempor. > Incididunt ut labore et dolore magna aliqua. > Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi > ut aliquip ex ea commodo consequat. > > However, when exported, in both cases the result is: > > Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod > tempor. Incididunt ut labore et dolore magna aliqua. Ut enim ad minim > veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea > commodo consequat. > > So, what are the benefits? > > One of the greatest benefits is that semantic linefeeds are "diff > friendly". > > For example, > > -Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod > -tempor. Incididunt ut labore et dolore magna aliqua. Ut enim ad minim > -veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea > -commodo consequat. > +Lorem ipsum dolor sit amet, XXXXX consectetur adipiscing elit, sed do > +eiusmod tempor. Incididunt ut labore et dolore magna aliqua. Ut enim > +ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut > +aliquip ex ea commodo consequat. > > Versus, > > -Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod > -tempor. > +Lorem ipsum dolor sit amet, XXXXX consectetur adipiscing elit, sed do > +eiusmod tempor. > Incididunt ut labore et dolore magna aliqua. > Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi > ut aliquip ex ea commodo consequat. > > Semantic linefeeds make easier to spot that the word "XXXXX" was added > in the first line. > > Also, they are convenient during code reviews. Shorter diffs and > separating "ideas" with newlines allow to be more accurate when adding > comments. > > The site "Semantic Line Breaks" [2] by Mattt and the blog post "Semantic > Linefeeds" [3] by Brandon Rhodes are both excellent references. > > [1] https://web.archive.org/web/20130108163017if_/http://miffy.tom-yam.or.jp:80/2238/ref/beg.pdf > [2] https://sembr.org/ > [3] https://rhodesmill.org/brandon/2012/one-sentence-per-line/ Thanks. > +(defun fill-paragraph-semlf (&optional justify) > + "Fill paragraph at or after point using semantic linefeeds. > + > +This function ensures that a newline character follows every > +sentence, as punctuated by a period (.), exclamation mark (!), or > +question mark (?). This explanation of what is "semantic linefeeds" is a good starting point, but it is not enough. For starters, "ensures" hints but doesn't say explicitly that if there's no newline there, it is inserted. Also, I think a URL to at least one site explaining what "semantic linefeeds" are should be in the doc string. > + (when (and (> (point) (line-beginning-position)) > + (< (point) (line-end-position))) > + (delete-horizontal-space) > + (newline) Are you sure 'newline' is the right function to call here? It doesn't just insert the newline character, at least not in all the cases. Perhaps inserting a literal newline character is better?
bug-gnu-emacs <at> gnu.org
:bug#78561
; Package emacs
.
(Fri, 23 May 2025 15:06:02 GMT) Full text and rfc822 format available.Message #11 received at 78561 <at> debbugs.gnu.org (full text, mbox):
From: Stefan Monnier <monnier <at> iro.umontreal.ca> To: Roi Martin <jroi.martin <at> gmail.com> Cc: Marcin Borkowski <mbork <at> mbork.pl>, 78561 <at> debbugs.gnu.org Subject: Re: bug#78561: [PATCH] Add semantic linefeed support for paragraph filling Date: Fri, 23 May 2025 11:04:48 -0400
> Given that this is a first version, I have not added any reference to > the manuals. If you think it makes sense, please let me know and I'll > modify the patch accordingly. Maybe a short version of the explanation you give below would be good to have in the manual (tho Eli suggests a URL instead, so maybe that's good enough?). > +(defun fill-paragraph-semlf (&optional justify) > + "Fill paragraph at or after point using semantic linefeeds. > + > +This function ensures that a newline character follows every > +sentence, as punctuated by a period (.), exclamation mark (!), or > +question mark (?). This seems inaccurate: it just uses whichever definition of sentence is used by `forward-sentence`, so it may ignore some of those chars or pay attention to others. > +If JUSTIFY is non-nil (interactively, with prefix argument), justify as > +well. If `sentence-end-double-space' is non-nil, then period followed > +by one space does not end a sentence, so don't break a line there. The > +variable `fill-column' controls the width for filling." I'd move the "The" to the last line. 🙂 > + (interactive "P") > + (save-excursion > + (let ((end (progn > + (fill-forward-paragraph 1) > + (backward-word) > + (end-of-line) > + (point))) > + (start (progn > + (fill-forward-paragraph -1) > + (forward-word) > + (beginning-of-line) > + (point))) > + pfx) > + (with-restriction start end > + (let ((fill-column (point-max))) > + (setq pfx (or (fill-region-as-paragraph (point-min) (point-max)) ""))) > + (goto-char (point-min)) > + (while (not (eobp)) > + (let ((fill-prefix pfx)) > + (fill-region-as-paragraph (point) > + (progn (forward-sentence) (point)) > + justify)) > + (when (and (> (point) (line-beginning-position)) > + (< (point) (line-end-position))) > + (delete-horizontal-space) > + (newline) > + (insert pfx)))))) > + t) Please try and separate it into a `fill-region-semlf` function and then another one which applies it to a paragraph, so that it can also be used to fill a specific user-specified region (or the whole buffer). Stefan
bug-gnu-emacs <at> gnu.org
:bug#78561
; Package emacs
.
(Sat, 24 May 2025 12:16:01 GMT) Full text and rfc822 format available.Message #14 received at 78561 <at> debbugs.gnu.org (full text, mbox):
From: Roi Martin <jroi.martin <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: mbork <at> mbork.pl, monnier <at> iro.umontreal.ca, 78561 <at> debbugs.gnu.org Subject: Re: bug#78561: [PATCH] Add semantic linefeed support for paragraph filling Date: Sat, 24 May 2025 14:15:37 +0200
Eli Zaretskii <eliz <at> gnu.org> writes: >> +(defun fill-paragraph-semlf (&optional justify) >> + "Fill paragraph at or after point using semantic linefeeds. >> + >> +This function ensures that a newline character follows every >> +sentence, as punctuated by a period (.), exclamation mark (!), or >> +question mark (?). > > This explanation of what is "semantic linefeeds" is a good starting > point, but it is not enough. For starters, "ensures" hints but > doesn't say explicitly that if there's no newline there, it is > inserted. Also, I think a URL to at least one site explaining what > "semantic linefeeds" are should be in the doc string. I would prefer to avoid depending on external URLs to explain the concept. I'd link to an external reference if, for instance, this feature was backed by an standard located in a well-known site (e.g. IETF RFCs). In this case, the concept is quite simple and I agree with Stefan in that we can provide our own interpretation in the manual and link to the Info node from the doc string. If you prefer to avoid changing the manual until this is well tested, then we can provide a more detailed explanation in the doc string itself. What do you think? >> + (when (and (> (point) (line-beginning-position)) >> + (< (point) (line-end-position))) >> + (delete-horizontal-space) >> + (newline) > > Are you sure 'newline' is the right function to call here? It doesn't > just insert the newline character, at least not in all the cases. > Perhaps inserting a literal newline character is better? The reason behind using 'newline' is to support documents that follow other conventions to represent newlines (e.g. '\r\n' or '\r'). Does it make sense? Is this the right approach?
bug-gnu-emacs <at> gnu.org
:bug#78561
; Package emacs
.
(Sat, 24 May 2025 13:03:03 GMT) Full text and rfc822 format available.Message #17 received at 78561 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Roi Martin <jroi.martin <at> gmail.com> Cc: mbork <at> mbork.pl, monnier <at> iro.umontreal.ca, 78561 <at> debbugs.gnu.org Subject: Re: bug#78561: [PATCH] Add semantic linefeed support for paragraph filling Date: Sat, 24 May 2025 16:02:15 +0300
> From: Roi Martin <jroi.martin <at> gmail.com> > Cc: 78561 <at> debbugs.gnu.org, mbork <at> mbork.pl, monnier <at> iro.umontreal.ca > Date: Sat, 24 May 2025 14:15:37 +0200 > > Eli Zaretskii <eliz <at> gnu.org> writes: > > > This explanation of what is "semantic linefeeds" is a good starting > > point, but it is not enough. For starters, "ensures" hints but > > doesn't say explicitly that if there's no newline there, it is > > inserted. Also, I think a URL to at least one site explaining what > > "semantic linefeeds" are should be in the doc string. > > I would prefer to avoid depending on external URLs to explain the > concept. Since the concept came from outside, why not? > I'd link to an external reference if, for instance, this > feature was backed by an standard located in a well-known site > (e.g. IETF RFCs). In this case, the concept is quite simple and I agree > with Stefan in that we can provide our own interpretation in the manual > and link to the Info node from the doc string. There's no contradiction: we could describe this in our documentation and also mention the external references. We do that, for example, for Unicode-related features. > >> + (when (and (> (point) (line-beginning-position)) > >> + (< (point) (line-end-position))) > >> + (delete-horizontal-space) > >> + (newline) > > > > Are you sure 'newline' is the right function to call here? It doesn't > > just insert the newline character, at least not in all the cases. > > Perhaps inserting a literal newline character is better? > > The reason behind using 'newline' is to support documents that follow > other conventions to represent newlines (e.g. '\r\n' or '\r'). Does it > make sense? Is this the right approach? In Emacs, there's only one "newline convention", the one that uses the newline (LFD) character. The different en d-of-line conventions are supported during I/O: we "encode" newlines as CR-LF pair for Windows, for example, when saving buffers to files, and "decode" CR-LF back into a single newline when reading files into buffers. By contrast, the 'newline' function does other things, in addition to inserting the newline character; see its documentation for the details. It seems to me that some of those additional actions is not something this feature will want, because this feature is _only_ about where to break text into physical lines.
bug-gnu-emacs <at> gnu.org
:bug#78561
; Package emacs
.
(Sat, 24 May 2025 13:39:02 GMT) Full text and rfc822 format available.Message #20 received at 78561 <at> debbugs.gnu.org (full text, mbox):
From: Roi Martin <jroi.martin <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: mbork <at> mbork.pl, monnier <at> iro.umontreal.ca, 78561 <at> debbugs.gnu.org Subject: Re: bug#78561: [PATCH] Add semantic linefeed support for paragraph filling Date: Sat, 24 May 2025 15:38:40 +0200
Eli Zaretskii <eliz <at> gnu.org> writes: >> From: Roi Martin <jroi.martin <at> gmail.com> >> Cc: 78561 <at> debbugs.gnu.org, mbork <at> mbork.pl, monnier <at> iro.umontreal.ca >> Date: Sat, 24 May 2025 14:15:37 +0200 >> >> Eli Zaretskii <eliz <at> gnu.org> writes: >> >> > This explanation of what is "semantic linefeeds" is a good starting >> > point, but it is not enough. For starters, "ensures" hints but >> > doesn't say explicitly that if there's no newline there, it is >> > inserted. Also, I think a URL to at least one site explaining what >> > "semantic linefeeds" are should be in the doc string. >> >> I would prefer to avoid depending on external URLs to explain the >> concept. > > Since the concept came from outside, why not? > >> I'd link to an external reference if, for instance, this >> feature was backed by an standard located in a well-known site >> (e.g. IETF RFCs). In this case, the concept is quite simple and I agree >> with Stefan in that we can provide our own interpretation in the manual >> and link to the Info node from the doc string. > > There's no contradiction: we could describe this in our documentation > and also mention the external references. We do that, for example, > for Unicode-related features. OK. I'll update the patch accordingly. >> >> + (when (and (> (point) (line-beginning-position)) >> >> + (< (point) (line-end-position))) >> >> + (delete-horizontal-space) >> >> + (newline) >> > >> > Are you sure 'newline' is the right function to call here? It doesn't >> > just insert the newline character, at least not in all the cases. >> > Perhaps inserting a literal newline character is better? >> >> The reason behind using 'newline' is to support documents that follow >> other conventions to represent newlines (e.g. '\r\n' or '\r'). Does it >> make sense? Is this the right approach? > > In Emacs, there's only one "newline convention", the one that uses the > newline (LFD) character. The different en d-of-line conventions are > supported during I/O: we "encode" newlines as CR-LF pair for Windows, > for example, when saving buffers to files, and "decode" CR-LF back > into a single newline when reading files into buffers. Got it. Thanks a lot for the explanation. That simplifies things a lot. > By contrast, the 'newline' function does other things, in addition to > inserting the newline character; see its documentation for the > details. It seems to me that some of those additional actions is not > something this feature will want, because this feature is _only_ about > where to break text into physical lines. You are right. I replaced it with (insert "\n") Thanks!
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.