GNU bug report logs - #31787
Newline badly matched/substituted

Previous Next

Package: sed;

Reported by: Gabriel Czernikier <gabocze <at> gmail.com>

Date: Mon, 11 Jun 2018 18:31:02 UTC

Severity: normal

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 31787 in the body.
You can then email your comments to 31787 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-sed <at> gnu.org:
bug#31787; Package sed. (Mon, 11 Jun 2018 18:31:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Gabriel Czernikier <gabocze <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-sed <at> gnu.org. (Mon, 11 Jun 2018 18:31:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Gabriel Czernikier <gabocze <at> gmail.com>
To: bug-sed <at> gnu.org
Subject: Newline badly matched/substituted
Date: Mon, 11 Jun 2018 15:29:41 -0300
[Message part 1 (text/plain, inline)]
sed --version
sed (GNU sed) 4.2.2

uname -o
Cygwin

uname -r
1.7.28(0.271/5/3)


For a file named threaddump.txt which contents delimited by ='s is:
==========================
                at
org/springframework/security/ui/SpringSecurityFilter.doFilter(SpringSecurityFilter.java:53)[optimized]

                at
org/springframework/security/util/FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:406)[optimized]

==========================

For which there's just line-feed as line separator (i.e. there's no
carriage-return at all), and just ASCII chars there are whithin.

For unambiguity's sake, the input in question, printed through sed -n l,
delimited by ='s is:
==========================
                at org/springframework/security/ui/SpringSecurityFilt\
er.doFilter(SpringSecurityFilter.java:53)[optimized]$
          $
                at org/springframework/security/util/FilterChainProxy\
$VirtualFilterChain.doFilter(FilterChainProxy.java:406)[optimized]$
          $
==========================

So, the wanted result is the input text transformed so that the newlines
are replaced, one on one, by, let's say, literal 'X's (without quotes). So
the result delimited by ='s, filtered through sed -n l, would be:
==========================
                at org/springframework/security/ui/SpringSecurityFilt\
er.doFilter(SpringSecurityFilter.java:53)X          X                \
at org/springframework/security/util/FilterChainProxy$VirtualFilterCh\
ain.doFilter(FilterChainProxy.java:406)X          X$
==========================

Of a couple of scripts that would seem apropriate to achieve this result,
none of them does it but with littering. Each program invocation followed
by its result, are shown next.
sed '$!N; s/\n/X/g;' threaddump.txt | sed -n l
==========================
                at org/springframework/security/ui/SpringSecurityFilt\
er.doFilter(SpringSecurityFilter.java:53)X          $
                at org/springframework/security/util/FilterChainProxy\
$VirtualFilterChain.doFilter(FilterChainProxy.java:406)X          $
==========================

sed '$!N; s/$/X/mg;' threaddump.txt | sed -n l
==========================
                at org/springframework/security/ui/SpringSecurityFilt\
er.doFilter(SpringSecurityFilter.java:53)X$
          X$
                at org/springframework/security/util/FilterChainProxy\
$VirtualFilterChain.doFilter(FilterChainProxy.java:406)X$
          X$
==========================

sed '$!N; s/$/X/mg; s/\n/X/g;' threaddump.txt | sed -n l
==========================
                at org/springframework/security/ui/SpringSecurityFilt\
er.doFilter(SpringSecurityFilter.java:53)XX          X$
                at org/springframework/security/util/FilterChainProxy\
$VirtualFilterChain.doFilter(FilterChainProxy.java:406)XX          X$
==========================

sed '$!N; s/\n/X/g; s/$/X/mg;' threaddump.txt | sed -n l
==========================
                at org/springframework/security/ui/SpringSecurityFilt\
er.doFilter(SpringSecurityFilter.java:53)X          X$
                at org/springframework/security/util/FilterChainProxy\
$VirtualFilterChain.doFilter(FilterChainProxy.java:406)X          X$
==========================

Regards,
Gabriel Czernikier
[Message part 2 (text/html, inline)]

Information forwarded to bug-sed <at> gnu.org:
bug#31787; Package sed. (Tue, 12 Jun 2018 08:50:01 GMT) Full text and rfc822 format available.

Message #8 received at 31787 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Gabriel Czernikier <gabocze <at> gmail.com>, 31787 <at> debbugs.gnu.org
Subject: Re: bug#31787: Newline badly matched/substituted
Date: Tue, 12 Jun 2018 02:49:47 -0600
tag 31787 notabug
close 31787
stop


Hello,

On 11/06/18 12:29 PM, Gabriel Czernikier wrote:
> sed --version
> sed (GNU sed) 4.2.2
> 
> uname -o
> Cygwin
> 
> uname -r
> 1.7.28(0.271/5/3)

First,
sed 4.2.2 is rather old (almost six years now).
sed 4.4 was released in 2017,
and sed 4.5 was released just 3 months ago.

Second,
Cygwin 1.7.28 also seems rather old (released in 2014).
The most recent version currently is 2.10.

I'm mentioning this because there have been some fundamental
changes in the default behavior of cygwin in regards to newlines
(and sed is directly affected). If you are already spending time
on dealing with sed+cygwin+newlines, it is likely beneficial to
keep an eye on the latest development. For more details, see:
http://lists.gnu.org/archive/html/bug-sed/2017-05/msg00001.html
(and the entire thread of this message is also instructive).


> For a file named threaddump.txt which contents delimited by ='s is:
> ==========================
>                  at
> org/springframework/security/ui/SpringSecurityFilter.doFilter(SpringSecurityFilter.java:53)[optimized]
> 
>                  at
> org/springframework/security/util/FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:406)[optimized]
> 
> ==========================
> 
> For which there's just line-feed as line separator (i.e. there's no
> carriage-return at all), and just ASCII chars there are whithin.
> 
> For unambiguity's sake, the input in question, printed through sed -n l,
> delimited by ='s is:
> ==========================
>                  at org/springframework/security/ui/SpringSecurityFilt\
> er.doFilter(SpringSecurityFilter.java:53)[optimized]$
>            $
>                  at org/springframework/security/util/FilterChainProxy\
> $VirtualFilterChain.doFilter(FilterChainProxy.java:406)[optimized]$
>            $
> ==========================
> 
> So, the wanted result is the input text transformed so that the newlines
> are replaced, one on one, by, let's say, literal 'X's (without quotes). So
> the result delimited by ='s, filtered through sed -n l, would be:
> ==========================
>                  at org/springframework/security/ui/SpringSecurityFilt\
> er.doFilter(SpringSecurityFilter.java:53)X          X                \
> at org/springframework/security/util/FilterChainProxy$VirtualFilterCh\
> ain.doFilter(FilterChainProxy.java:406)X          X$
> ==========================

Since you know exactly which ascii value you want to replace,
perhaps using 'tr' would be easier?

Replacing new-line (ASCII 0x0A, octal \012) with another single 
character can be done like so, with no regards to which operating system
you are using or what line-endings are there:


    tr '\012' X < threaddump.txt > output.txt

---

Regarding your examples:

> Of a couple of scripts that would seem apropriate to achieve this
> result, none of them does it but with littering. Each program
> invocation followed by its result, are shown next.[...]
> sed '$!N; s/\n/X/g;' threaddump.txt | sed -n l
> sed '$!N; s/$/X/mg;' threaddump.txt | sed -n l
> sed '$!N; s/$/X/mg; s/\n/X/g;' threaddump.txt | sed -n l
> sed '$!N; s/\n/X/g; s/$/X/mg;' threaddump.txt | sed -n l

There might be a mis-understanding of what "N" does.
"N" reads the next input line (just one line), and appends it the
the current buffer (called "pattern space"). "N" does not restart the 
cycle - it allows the sed program to continue to the next command.

So if the statement "$!N;" was meant as:
 "read all input lines and append them to one buffer,
  and only at the last line process them"
Then that is not what "N" does.

Instead, the following two programs achieve it:

  $ seq 5 | sed 'H; $!d; x; l; s/\n// ; s/\n/***/g ; s/$/***/'
  \n1\n2\n3\n4\n5$
  1***2***3***4***5***

  $ seq 5 | sed ':x ; N; $!bx ; l; s/\n/***/g ; s/$/***/'
  1\n2\n3\n4\n5$
  1***2***3***4***5***

In the first program:
1. "H"  appends the line to the hold buffer.
2. "$!d" deletes the line and restarts the cycle
   (skipping all other commands), except on the last line.
3. "x" swaps the hold buffer and the pattern buffer.
   The hold buffer contains the entire file, and after "x"
   the pattern space will contain the entire file.
4. "l" - used just for illustrative purposes - it shows
   the content of the pattern space - which contains the entire input
   with newlines.
5. "s/\n//" - removes the first newline (which is a side effect of
   using H on the first line).
6. "s/\n/***/g" replaces all embedded newlines with a marker.
7. "s/$/***/" - replaces the implied last newline with the same marker

In the second program:
1. ":x" - a label at the beginning of the program, will be used below.
2. "N"  - read the next line from the input and append to the buffer.
3. "$!bx" - jump to label "x" without restarting the cycle,
   effectively accumulating more lines due to the "N",
   until the last line.
The rest is like in the first program.

More information can be found here:
https://www.gnu.org/software/sed/manual/sed.html#Multiline-techniques

-----

Lastly,
The regex end-of-line anchor '$' will not match the actual newline
character - you can't use it to replace the newlines, even with "s///m".

Observe the following:

  $ seq 5 | sed ':x ; N;$!bx ; l; s/$/***/mg ; l'
  1\n2\n3\n4\n5$
  1***\n2***\n3***\n4***\n5***$
  1***
  2***
  3***
  4***
  5***

  $ seq 5 | sed ':x ; N;$!bx ; l; s/$/***/g ; l'
  1\n2\n3\n4\n5$
  1\n2\n3\n4\n5***$
  1
  2
  3
  4
  5***


----

I hope this resolves the issue.
I'm marking this item as closed, but discussion can continue by replying
to this thread.

regards,
 - assaf







Reply sent to Assaf Gordon <assafgordon <at> gmail.com>:
You have taken responsibility. (Mon, 08 Oct 2018 23:55:02 GMT) Full text and rfc822 format available.

Notification sent to Gabriel Czernikier <gabocze <at> gmail.com>:
bug acknowledged by developer. (Mon, 08 Oct 2018 23:55:02 GMT) Full text and rfc822 format available.

Message #13 received at 31787-done <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: 31787-done <at> debbugs.gnu.org
Subject: Re: bug#31787: Newline badly matched/substituted
Date: Mon, 8 Oct 2018 17:54:21 -0600
With no further replies, I'm closing this bug.





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 06 Nov 2018 12:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 172 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.