GNU bug report logs -
#51462
sed bug: ASCII NUL not handled in simple pattern
Previous Next
To reply to this bug, email your comments to 51462 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-sed <at> gnu.org
:
bug#51462
; Package
sed
.
(Thu, 28 Oct 2021 16:49:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Frances Wingerter <fw <at> immunant.com>
:
New bug report received and forwarded. Copy sent to
bug-sed <at> gnu.org
.
(Thu, 28 Oct 2021 16:49:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
I'm using sed 4.8 (`sed (GNU sed) 4.8` per `sed --version`) on x86_64
Arch Linux.
Compare the output of these two sed invocations:
```
$ echo -e 'a\nb\n\0\nc\n' | sed -e '/\0/,$d'
a
b
c
```
and
```
$ echo -e 'a\nb\n\v\nc\n' | sed -e '/\v/,$d'
a
b
```
The latter is the expected behavior, but when input and pattern use
`\0`, sed seems to miss the matches and never triggers.
Hopefully this should be an easy fix.
Thanks,
Frances
Information forwarded
to
bug-sed <at> gnu.org
:
bug#51462
; Package
sed
.
(Thu, 28 Oct 2021 17:33:01 GMT)
Full text and
rfc822 format available.
Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):
On Thu, 28 Oct 2021 15:25:42 +0000, Frances Wingerter <fw <at> immunant.com>
wrote:
> I'm using sed 4.8 (`sed (GNU sed) 4.8` per `sed --version`) on x86_64
> Arch Linux.
>
> Compare the output of these two sed invocations:
> ```
> $ echo -e 'a\nb\n\0\nc\n' | sed -e '/\0/,$d'
> a
> b
>
> c
>
This works
$ echo -ne 'a\nb\n\0\nc\n' | sed -e '/\d000/,$d'
(\o000, \x00 also work). All documented here:
https://www.gnu.org/software/sed/manual/sed.html#Escapes
Whether sed maintainers want to also allow the \0 syntax, up to them of
course.
--
D.
Information forwarded
to
bug-sed <at> gnu.org
:
bug#51462
; Package
sed
.
(Sat, 30 Oct 2021 07:12:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 51462 <at> debbugs.gnu.org (full text, mbox):
(Adding Eric Blake for POSIX opinion)
Hello,
On 2021-10-28 11:32 a.m., Davide Brini wrote:
> On Thu, 28 Oct 2021 15:25:42 +0000, Frances Wingerter <fw <at> immunant.com>
> wrote:
>>
>> Compare the output of these two sed invocations:
>> ```
>> $ echo -e 'a\nb\n\0\nc\n' | sed -e '/\0/,$d'
>>
> $ echo -ne 'a\nb\n\0\nc\n' | sed -e '/\d000/,$d'
>
> (\o000, \x00 also work). All documented here:
> https://www.gnu.org/software/sed/manual/sed.html#Escapes
>
> Whether sed maintainers want to also allow the \0 syntax, up to them of
> course.
Thanks Davide for the reply.
In GNU sed, "\0" in the replacement part acts identically to "&" -
referencing the whole matched portion.
This is the implemented behavior (though undocumented?) since GNU sed
version 3, released in December 1995 - so not likely to be changed.
For comparison, in BSDs "\0" acts as literal zero (ASCII 48).
Interestingly, POSIX defines a "BACKREF" as:
[...] The character string consisting of a <backslash> character
followed by a single-digit numeral, '1' to '9'.
( from:
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_05
)
And so one could argue that this is a GNU extension that should be
disabled when used with "sed --posix".
I think we should keep "\0" undocumented to prevent proliferation of
this non-standard behavior.
regards,
- assaf
This bug report was last modified 3 years and 59 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.