GNU bug report logs - #51462
sed bug: ASCII NUL not handled in simple pattern

Previous Next

Package: sed;

Reported by: Frances Wingerter <fw <at> immunant.com>

Date: Thu, 28 Oct 2021 16:49:02 UTC

Severity: normal

To reply to this bug, email your comments to 51462 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-sed <at> gnu.org:
bug#51462; Package sed. (Thu, 28 Oct 2021 16:49:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Frances Wingerter <fw <at> immunant.com>:
New bug report received and forwarded. Copy sent to bug-sed <at> gnu.org. (Thu, 28 Oct 2021 16:49:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Frances Wingerter <fw <at> immunant.com>
To: bug-sed <at> gnu.org
Subject: sed bug: ASCII NUL not handled in simple pattern
Date: Thu, 28 Oct 2021 15:25:42 +0000
I'm using sed 4.8 (`sed (GNU sed) 4.8` per `sed --version`) on x86_64
Arch Linux.

Compare the output of these two sed invocations:
```
$ echo -e 'a\nb\n\0\nc\n' | sed -e '/\0/,$d'
a
b

c

```
and
```
$ echo -e 'a\nb\n\v\nc\n' | sed -e '/\v/,$d'
a
b
```

The latter is the expected behavior, but when input and pattern use
`\0`, sed seems to miss the matches and never triggers.

Hopefully this should be an easy fix.
Thanks,
Frances




Information forwarded to bug-sed <at> gnu.org:
bug#51462; Package sed. (Thu, 28 Oct 2021 17:33:01 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Davide Brini <dave_br <at> gmx.com>
To: bug-sed <at> gnu.org
Subject: Re: bug#51462: sed bug: ASCII NUL not handled in simple pattern
Date: Thu, 28 Oct 2021 19:32:02 +0200
On Thu, 28 Oct 2021 15:25:42 +0000, Frances Wingerter <fw <at> immunant.com>
wrote:

> I'm using sed 4.8 (`sed (GNU sed) 4.8` per `sed --version`) on x86_64
> Arch Linux.
>
> Compare the output of these two sed invocations:
> ```
> $ echo -e 'a\nb\n\0\nc\n' | sed -e '/\0/,$d'
> a
> b
>
> c
>

This works

$ echo -ne 'a\nb\n\0\nc\n' | sed -e '/\d000/,$d'

(\o000, \x00 also work). All documented here:

https://www.gnu.org/software/sed/manual/sed.html#Escapes

Whether sed maintainers want to also allow the \0 syntax, up to them of
course.

--
D.




Information forwarded to bug-sed <at> gnu.org:
bug#51462; Package sed. (Sat, 30 Oct 2021 07:12:01 GMT) Full text and rfc822 format available.

Message #11 received at 51462 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Davide Brini <dave_br <at> gmx.com>, 51462 <at> debbugs.gnu.org,
 Frances Wingerter <fw <at> immunant.com>, Eric Blake <eblake <at> redhat.com>
Subject: Re: bug#51462: sed bug: ASCII NUL not handled in simple pattern
Date: Sat, 30 Oct 2021 01:11:35 -0600
(Adding Eric Blake for POSIX opinion)

Hello,

On 2021-10-28 11:32 a.m., Davide Brini wrote:
> On Thu, 28 Oct 2021 15:25:42 +0000, Frances Wingerter <fw <at> immunant.com>
> wrote:
>>
>> Compare the output of these two sed invocations:
>> ```
>> $ echo -e 'a\nb\n\0\nc\n' | sed -e '/\0/,$d'
>>
> $ echo -ne 'a\nb\n\0\nc\n' | sed -e '/\d000/,$d'
> 
> (\o000, \x00 also work). All documented here:
> https://www.gnu.org/software/sed/manual/sed.html#Escapes
> 
> Whether sed maintainers want to also allow the \0 syntax, up to them of
> course.

Thanks Davide for the reply.

In GNU sed, "\0" in the replacement part acts identically to "&" - 
referencing the whole matched portion.

This is the implemented behavior (though undocumented?) since GNU sed
version 3, released in December 1995 - so not likely to be changed.

For comparison, in BSDs "\0" acts as literal zero (ASCII 48).

Interestingly, POSIX defines a "BACKREF" as:

   [...] The character string consisting of a <backslash> character
   followed by a single-digit numeral, '1' to '9'.
   ( from: 
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_05 
)

And so one could argue that this is a GNU extension that should be
disabled when used with "sed --posix".

I think we should keep "\0" undocumented to prevent proliferation of
this non-standard behavior.

regards,
 - assaf






This bug report was last modified 3 years and 59 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.