GNU bug report logs -
#78213
should produce a diff with similar lines grouped together if possible
Previous Next
To reply to this bug, email your comments to 78213 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-diffutils <at> gnu.org
:
bug#78213
; Package
diffutils
.
(Fri, 02 May 2025 14:19:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Vincent Lefevre <vincent <at> vinc17.net>
:
New bug report received and forwarded. Copy sent to
bug-diffutils <at> gnu.org
.
(Fri, 02 May 2025 14:19:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
"diff" should produce a diff with similar lines grouped together if
possible, so that it can be more easily readable and be word-diff
friendly.
Issues can occur on text files with paragraphs separated by a blank
line (like in wiki source files and LaTeX files), where a modification
consists in
* some paragraph being split, and
* some of the following paragraphs being slightly modified.
In such a case, a shift of the slightly modified lines can occur,
which makes the diff hardly readable and breaks word-diff (e.g.
when one opens the diff file with GNU Emacs).
I've attached an example:
* file1 and file2: the files to be diff'ed (file2 is similar to
file1, with the first paragraph split).
* file-bad.diff: the diff produced by diff 3.10.
* file-ok.diff: the diff I would expect.
--
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon)
[file1 (text/plain, attachment)]
[file2 (text/plain, attachment)]
[file-bad.diff (text/plain, attachment)]
[file-ok.diff (text/plain, attachment)]
Information forwarded
to
bug-diffutils <at> gnu.org
:
bug#78213
; Package
diffutils
.
(Sat, 03 May 2025 08:14:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 78213 <at> debbugs.gnu.org (full text, mbox):
The diff to create file-bad.diff was with the '-u' option.
I used diff v3.6 with identical results.
The only lines that match between the file1 and file2 are empty ones.
This bash command shows only an empty line in common:
comm -12 -- <(sort -u file1) <(sort -u file2)
Note: The line terminator on the provided files is CRLF.
Long lines - Use 'less -S' :-)
On Fri, May 2, 2025 at 7:19 AM Vincent Lefevre <vincent <at> vinc17.net> wrote:
>
> "diff" should produce a diff with similar lines grouped together if
> possible, so that it can be more easily readable and be word-diff
> friendly.
>
> Issues can occur on text files with paragraphs separated by a blank
> line (like in wiki source files and LaTeX files), where a modification
> consists in
> * some paragraph being split, and
> * some of the following paragraphs being slightly modified.
>
> In such a case, a shift of the slightly modified lines can occur,
> which makes the diff hardly readable and breaks word-diff (e.g.
> when one opens the diff file with GNU Emacs).
>
> I've attached an example:
> * file1 and file2: the files to be diff'ed (file2 is similar to
> file1, with the first paragraph split).
> * file-bad.diff: the diff produced by diff 3.10.
> * file-ok.diff: the diff I would expect.
>
> --
> Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
> Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon)
Information forwarded
to
bug-diffutils <at> gnu.org
:
bug#78213
; Package
diffutils
.
(Sat, 03 May 2025 09:25:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 78213 <at> debbugs.gnu.org (full text, mbox):
On 2025-05-03 01:12:44 -0700, Robert Webb wrote:
> The diff to create file-bad.diff was with the '-u' option.
Yes, I forgot to mention that as this is what I *always* use
(and I always see diffs generated with this option).
> The only lines that match between the file1 and file2 are empty ones.
> This bash command shows only an empty line in common:
> comm -12 -- <(sort -u file1) <(sort -u file2)
>
> Note: The line terminator on the provided files is CRLF.
No, a single LF as usual (I suspect that some mail software converts
them to CRLF when saving).
> Long lines - Use 'less -S' :-)
Well, be careful that the difference between the lines are at the end.
I generated the files with "lorem -p 5" as the goal was to generate
5 paragraphs (and there is no way to get shorter paragraphs), and
slightly editing them. I now think that "lorem -s 5" (to generate
5 sentences in a single paragraph) would have been better here since
I had to add blank lines for the testcase anyway.
--
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon)
Information forwarded
to
bug-diffutils <at> gnu.org
:
bug#78213
; Package
diffutils
.
(Sat, 03 May 2025 09:46:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 78213 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 2025-05-03 11:24:10 +0200, Vincent Lefevre wrote:
> On 2025-05-03 01:12:44 -0700, Robert Webb wrote:
> > Long lines - Use 'less -S' :-)
>
> Well, be careful that the difference between the lines are at the end.
> I generated the files with "lorem -p 5" as the goal was to generate
> 5 paragraphs (and there is no way to get shorter paragraphs), and
> slightly editing them. I now think that "lorem -s 5" (to generate
> 5 sentences in a single paragraph) would have been better here since
> I had to add blank lines for the testcase anyway.
Here's a new version of the testcase, with files having short lines.
Note: To obtain file-ok.diff, I first added a character in the
first empty line of "file2", generated the diff with "diff -u"
(with the added character, the issue with the first common empty
lines disappears, so that the diff is good), then removed the
character from the generated diff.
--
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon)
[file1 (text/plain, attachment)]
[file2 (text/plain, attachment)]
[file-bad.diff (text/plain, attachment)]
[file-ok.diff (text/plain, attachment)]
Information forwarded
to
bug-diffutils <at> gnu.org
:
bug#78213
; Package
diffutils
.
(Sat, 03 May 2025 10:14:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 78213 <at> debbugs.gnu.org (full text, mbox):
Yes, I was mistaken about the line terminators. Your files were attached
as "Content-Type: text/plain; charset=us-ascii" with normal line endings.
Using Gmail (in Firefox), I tried saving all the attachments in a zip file,
and individually too. Either way they had the CRLF terminators, but Gmail
should have converted them to LF at least when saving the individual files.
On Sat, May 3, 2025 at 2:24 AM Vincent Lefevre <vincent <at> vinc17.net> wrote:
>
> On 2025-05-03 01:12:44 -0700, Robert Webb wrote:
> > The diff to create file-bad.diff was with the '-u' option.
>
> Yes, I forgot to mention that as this is what I *always* use
> (and I always see diffs generated with this option).
>
> > The only lines that match between the file1 and file2 are empty ones.
> > This bash command shows only an empty line in common:
> > comm -12 -- <(sort -u file1) <(sort -u file2)
> >
> > Note: The line terminator on the provided files is CRLF.
>
> No, a single LF as usual (I suspect that some mail software converts
> them to CRLF when saving).
>
> > Long lines - Use 'less -S' :-)
>
> Well, be careful that the difference between the lines are at the end.
> I generated the files with "lorem -p 5" as the goal was to generate
> 5 paragraphs (and there is no way to get shorter paragraphs), and
> slightly editing them. I now think that "lorem -s 5" (to generate
> 5 sentences in a single paragraph) would have been better here since
> I had to add blank lines for the testcase anyway.
>
> --
> Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
> Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon)
This bug report was last modified 3 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.