GNU bug report logs - #77462
"/s" instability? I think this is a bug.

Previous Next

Package: sed;

Reported by: "gnudborgonly <at> s-epost.no" <gnudborgonly <at> s-epost.no>

Date: Wed, 2 Apr 2025 15:21:01 UTC

Severity: normal

To reply to this bug, email your comments to 77462 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-sed <at> gnu.org:
bug#77462; Package sed. (Wed, 02 Apr 2025 15:21:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "gnudborgonly <at> s-epost.no" <gnudborgonly <at> s-epost.no>:
New bug report received and forwarded. Copy sent to bug-sed <at> gnu.org. (Wed, 02 Apr 2025 15:21:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "gnudborgonly <at> s-epost.no" <gnudborgonly <at> s-epost.no>
To: bug-sed <at> gnu.org
Subject: "/s" instability? I think this is a bug.
Date: Wed, 02 Apr 2025 16:00:40 +0200
[Message part 1 (text/plain, inline)]
This seems to qualify as a bug:

The sed version included in my Linux seems to be unstable when using
the '\s' and/or '\S' regex extensions:

Example:

   id <at> pc:~$ echo '[ subCA2 ]' | sed -n 's|^\[\s\+\([^\s]\+[0-
   9]\+\)\s\+\]\s*$|\1|p'
   id <at> pc:~$ echo '[ subCA2 ]' | sed -n 's:^\[\s\+\([\S]\+[0-
   9]\+\)\s\+\]\s*$:\1:p'
   id <at> pc:~$ echo '[ rootCA1 ]' | sed -n 's|^\[\s\+\([^\s]\+[0-
   9]\+\)\s\+\]\s*$|\1|p'
   rootCA1
   
If I replace '\s' with '[ \t]' (and '\S' with '[^ \t]') things work as
expected:
   id <at> pc:~$ echo '[ subCA2 ]' | sed -n 's:^\[[ \t]\+\([^ \t]\+[0-9]\+\)[
   \t]\+\][ \t]*$:\1:p'
   subCA2
   id <at> pc:~$ echo '[ rootCA2 ]' | sed -n 's:^\[[ \t]\+\([^ \t]\+[0-9]\+\)[
   \t]\+\][ \t]*$:\1:p'
   rootCA2

-----------------------------------------------------------------------
--------------------------
My Linux version: 
Debian 12.10 as per 2025-04-02, terminal session in an Xfce4 desktop
environment, fully updated: 

My sed version:
   id <at> pc:~$ sed --version
   sed (GNU sed) 4.9
   Packaged by Debian
   Copyright (C) 2022 Free Software Foundation, Inc.
   License GPLv3+: GNU GPL version 3 or later
   <https://gnu.org/licenses/gpl.html>.
   This is free software: you are free to change and redistribute it.
   There is NO WARRANTY, to the extent permitted by law.
   
   Written by Jay Fenlason, Tom Lord, Ken Pizzini,
   Paolo Bonzini, Jim Meyering, and Assaf Gordon.
   
   This sed program was built with SELinux support.
   SELinux is disabled on this system.
   
   GNU sed home page: <https://www.gnu.org/software/sed/>.
   General help using GNU software: <https://www.gnu.org/gethelp/>.
   E-mail bug reports to: <bug-sed <at> gnu.org>.
   id <at> pc:~$ 
-----------------------------------------------------------------------
--------------------------

Regards,
Vidar Hanto
Norway

[Message part 2 (text/html, inline)]

Information forwarded to bug-sed <at> gnu.org:
bug#77462; Package sed. (Wed, 02 Apr 2025 20:08:01 GMT) Full text and rfc822 format available.

Message #8 received at 77462 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "gnudborgonly <at> s-epost.no" <gnudborgonly <at> s-epost.no>
Cc: 77462 <at> debbugs.gnu.org
Subject: Re: bug#77462: "/s" instability? I think this is a bug.
Date: Wed, 2 Apr 2025 13:07:14 -0700
On Wed, Apr 2, 2025 at 8:41 AM gnudborgonly <at> s-epost.no via
<bug-sed <at> gnu.org> wrote:
> This seems to qualify as a bug:

Thanks for the report.
You can fix your usage by not putting "[...]" around those uses of "\s" or "\S".

> The sed version included in my Linux seems to be unstable when using
> the '\s' and/or '\S' regex extensions:
>
> Example:
>
>    id <at> pc:~$ echo '[ subCA2 ]' | sed -n 's|^\[\s\+\([^\s]\+[0-9]\+\)\s\+\]\s*$|\1|p'
>    id <at> pc:~$ echo '[ subCA2 ]' | sed -n 's:^\[\s\+\([\S]\+[0-9]\+\)\s\+\]\s*$:\1:p'

Drop the square brackets and it works. I.e., change the latter to this:

  $ echo '[ subCA2 ]' | sed -n 's:^\[\s\+\(\S\+[0-9]\+\)\s\+\]\s*$:\1:p'
  subCA2

Or better still, use sed's -E option to make the regular expression
more readable, eliding **six** backslashes:

  echo '[ subCA2 ]' | sed -nE 's:^\[\s+(\S+[0-9]+)\s+\]\s*$:\1:p'

I admit this is an unpleasant irregularity about GNU sed's "\S" extension,
since it's different from how things work in PCRE.
This is one of the reasons I urge people use Perl instead of sed
(another is because PCRE lets you use "\d" and non-greedy modifiers
like "\S+?" below):

  $ echo '[ subCA2 ]' | perl -nle 'm{^\[\s+(\S+?\d+)\s+\]\s*$} and print $1'
  subCA2

 Searching Sed's sources/docs for references to \S and \s vs ranges, I
found no trace, but did see this 4.1 NEWS entry:

  * removed documentation for \s and \S which worked incorrectly

I'll leave this bug report open, because this is a wart that needs to
be documented.




Information forwarded to bug-sed <at> gnu.org:
bug#77462; Package sed. (Wed, 02 Apr 2025 20:32:04 GMT) Full text and rfc822 format available.

Message #11 received at 77462 <at> debbugs.gnu.org (full text, mbox):

From: "gnudborgonly <at> s-epost.no" <gnudborgonly <at> s-epost.no>
To: 77462 <at> debbugs.gnu.org
Subject: Re: bug#77462: Acknowledgement ("\s" instability? I think this is
 a bug.)
Date: Wed, 02 Apr 2025 22:18:26 +0200
[Message part 1 (text/plain, inline)]
I think I have confirmed the bug. Consider this bash-script:

   #! /bin/bash
   
   for x in A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e
   f g h i j k l m n o p q r s t u v w x y z; do
     echo "[ ${x}XXtxt1 ]" | sed -n 's|^\[\s\+\([^\s]\+[0-
   9]\+\)\s\+\]\s*$|\1|p'
   done
   
sed is printing a result in all cases, except when the 's' is starting
the text that is captured in the sed-script.

Regards,
Vidar Hanto
Norway

-----------------------------------------------------------------------
----------------------------------------
On Wed, 2025-04-02 at 15:21 +0000, GNU bug Tracking System wrote:
> Thank you for filing a new bug report with debbugs.gnu.org.
> 
> This is an automatically generated reply to let you know your message
> has been received.
> 
> Your message is being forwarded to the package maintainers and other
> interested parties for their attention; they will reply in due
> course.
> 
> Your message has been sent to the package maintainer(s):
>  bug-sed <at> gnu.org
> 
> If you wish to submit further information on this problem, please
> send it to 77462 <at> debbugs.gnu.org.
> 
> Please do not send mail to help-debbugs <at> gnu.org unless you wish
> to report a problem with the Bug-tracking system.
> 
-----------------------------------------------------------------------
-------------------------------------

This seems to qualify as a bug:

The sed version included in my Linux seems to be unstable when using
the '\s' and/or '\S' regex extensions:

Example:

   id <at> pc:~$ echo '[ subCA2 ]' | sed -n 's|^\[\s\+\([^\s]\+[0-
   9]\+\)\s\+\]\s*$|\1|p'
   id <at> pc:~$ echo '[ subCA2 ]' | sed -n 's:^\[\s\+\([\S]\+[0-
   9]\+\)\s\+\]\s*$:\1:p'
   id <at> pc:~$ echo '[ rootCA1 ]' | sed -n 's|^\[\s\+\([^\s]\+[0-
   9]\+\)\s\+\]\s*$|\1|p'
   rootCA1
   
If I replace '\s' with '[ \t]' (and '\S' with '[^ \t]') things work as
expected:
   id <at> pc:~$ echo '[ subCA2 ]' | sed -n 's:^\[[ \t]\+\([^ \t]\+[0-9]\+\)[
   \t]\+\][ \t]*$:\1:p'
   subCA2
   id <at> pc:~$ echo '[ rootCA2 ]' | sed -n 's:^\[[ \t]\+\([^ \t]\+[0-9]\+\)[
   \t]\+\][ \t]*$:\1:p'
   rootCA2

-----------------------------------------------------------------------
--------------------------
My Linux version: 
Debian 12.10 as per 2025-04-02, terminal session in an Xfce4 desktop
environment, fully updated: 

My sed version:
   id <at> pc:~$ sed --version
   sed (GNU sed) 4.9
   Packaged by Debian
   Copyright (C) 2022 Free Software Foundation, Inc.
   License GPLv3+: GNU GPL version 3 or later
   <https://gnu.org/licenses/gpl.html>.
   This is free software: you are free to change and redistribute it.
   There is NO WARRANTY, to the extent permitted by law.
   
   Written by Jay Fenlason, Tom Lord, Ken Pizzini,
   Paolo Bonzini, Jim Meyering, and Assaf Gordon.
   
   This sed program was built with SELinux support.
   SELinux is disabled on this system.
   
   GNU sed home page: <https://www.gnu.org/software/sed/>.
   General help using GNU software: <https://www.gnu.org/gethelp/>.
   E-mail bug reports to: <bug-sed <at> gnu.org>.
   id <at> pc:~$ 
-----------------------------------------------------------------------
--------------------------

Regards,
Vidar Hanto
Norway



[Message part 2 (text/html, inline)]

This bug report was last modified 2 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.