GNU bug report logs - #40242
n as delimiter alias

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: sed; Reported by: Oğuz <oguzismailuysal@HIDDEN>; Keywords: confirmed; merged with #40239; dated Thu, 26 Mar 2020 15:31:02 UTC; Maintainer for sed is bug-sed@HIDDEN.
Added tag(s) confirmed. Request was from Assaf Gordon <assafgordon@HIDDEN> to control <at> debbugs.gnu.org. Full text available.
Merged 40239 40242. Request was from Assaf Gordon <assafgordon@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 40242 <at> debbugs.gnu.org:


Received: (at 40242) by debbugs.gnu.org; 31 Mar 2020 04:42:21 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Mar 31 00:42:21 2020
Received: from localhost ([127.0.0.1]:35377 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1jJ8jF-0001gM-Aq
	for submit <at> debbugs.gnu.org; Tue, 31 Mar 2020 00:42:21 -0400
Received: from mail-pg1-f180.google.com ([209.85.215.180]:39090)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <assafgordon@HIDDEN>) id 1jJ8jC-0001g4-V7
 for 40242 <at> debbugs.gnu.org; Tue, 31 Mar 2020 00:42:19 -0400
Received: by mail-pg1-f180.google.com with SMTP id g32so3739058pgb.6
 for <40242 <at> debbugs.gnu.org>; Mon, 30 Mar 2020 21:42:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=subject:to:references:from:message-id:date:user-agent:mime-version
 :in-reply-to:content-language:content-transfer-encoding;
 bh=ArcHdwiuHGEGioHf/D56pTnTbOfGPNX8gpdwtVg9lck=;
 b=cjI+GSOBoQeDYNQfxV7viUq7I/FkyYlnI0w59fFMqCLT/Mq9ndOR7YJdliW6nNJzHU
 Xxsl2lxbNEF5EtPXePW05+ihf8jKjH44+MoMfcRyJ5H9CjXPBwcv9rW/F2cqWX0JPChb
 0J2CyLTJwViO+jhNHESENB1T2nqwP/wdlWWXKQzYakQNbomXITPgq/m1j7X53ylrlTnE
 Jx0EDOc+hivYcUsXxIcJSj8pFJFMjSDD979o4igY4x7UAG7uPkVwipHS9CBObQr+qQGQ
 z35kbrtYCzh54I7CqwEqh8NyFpXxgpbxSw1fZgNiJGKTRL8g5lXyQDP+v6rs75yLyLrF
 H33w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:subject:to:references:from:message-id:date
 :user-agent:mime-version:in-reply-to:content-language
 :content-transfer-encoding;
 bh=ArcHdwiuHGEGioHf/D56pTnTbOfGPNX8gpdwtVg9lck=;
 b=bclfm3KjrXRZXC0tlPj9iixSjIwJGsTwp95B/S2IgUOWOmt9ELvZ58e0oam805Zu8F
 F7F/pExEoOywKcthHxCL/rqWsJkgI7lILKIbLBzYzSmcVTzZutDS029vgIEdEFFXd+7v
 HgtelAs+bVGLn87gQqoKw5nxpiVegt2+S6/VJFUX63cuY2dQprNLbHrj2JNYFxnVb8Jy
 P5rT1i7nFlgzwe658tWBY4ZavmaDtcG2RphP6+0YxyQpkcQ/5Zj/1uUS5ZvuxuBwMycY
 LTGh3bQwt5UahPsOsvwfSuTKORkBnz8GlCAMD7ecJ25FQQPSFk1umP/BNy7Rfxdd9zcZ
 hFHw==
X-Gm-Message-State: ANhLgQ2ZSY2XMM3e81Ngrm0mQHtrp/ks14xgYPDZK1kUqMUBSMfV9q+c
 qyZTkIbK+JDU+9Hp67JBFWdvy0Ll
X-Google-Smtp-Source: ADFU+vu5sfgcntoPQHCH9f+QTRhet5VDGCjzuKc8B/koJE6qYl1puv+jloKM0gCAl1uTq037fO8UqQ==
X-Received: by 2002:a63:24c6:: with SMTP id
 k189mr15860629pgk.436.1585629732245; 
 Mon, 30 Mar 2020 21:42:12 -0700 (PDT)
Received: from tomato.moose.housegordon.com (moose.housegordon.com.
 [184.68.105.38])
 by smtp.googlemail.com with ESMTPSA id nh14sm863567pjb.17.2020.03.30.21.42.10
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Mon, 30 Mar 2020 21:42:11 -0700 (PDT)
Subject: Re: bug#40242: n as delimiter alias
To: =?UTF-8?B?T8SfdXo=?= <oguzismailuysal@HIDDEN>, 40242 <at> debbugs.gnu.org
References: <CAH7i3LrffNCULxT6pjKt_niTTmxf7DED-oun8EEmgsm_v4GKoA@HIDDEN>
From: Assaf Gordon <assafgordon@HIDDEN>
Message-ID: <d90ac2cf-9e4b-dfd9-de3e-5e488e6b0787@HIDDEN>
Date: Mon, 30 Mar 2020 22:42:09 -0600
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.6.0
MIME-Version: 1.0
In-Reply-To: <CAH7i3LrffNCULxT6pjKt_niTTmxf7DED-oun8EEmgsm_v4GKoA@HIDDEN>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 40242
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

tags 40242 confirmed
stop

Hello,

On 2020-03-25 11:30 p.m., Oğuz wrote:
> While '\t' matches a literal 't' when 't' is the delimiter, '\n' does not
> match 'n' when 'n' is the delimiter. See:
> 
> $ echo t | sed 'st\ttt' | xxd
> 00000000: 0a                                       .
> $
> $ echo n | sed 'sn\nnn' | xxd
> 00000000: 6e0a
> 
> Is this a bug or is there a sound logic behind this?

Thank you for finding this interesting edge-case.

I think it is a (very old) bug. I'm not sure about its origin,
perhaps Jim or Paolo can comment.

First,
let's start with what's expected (slightly modifying your examples):

The canonical usage, here "\t" becomes a TAB, and "t" is not replaced:

    $ printf t | sed 's/\t//' | od -a -An
       t

Then, using a different character "q" instead of "/", works the same:

    $ printf t | sed 'sq\tqq' | od -a -An
       t

The sed manual says (in section "3.3 The s command"):
       "
       The / characters may be uniformly replaced by any other single
       character within any given s command.

       The / character (or whatever other character is used in its
       stead) can appear in the regexp or replacement only if it is
       preceded by a \ character.
       "

This is the reason "\t" represents a regular "t" (not TAB)
*if* the substitute command's delimiter is "t" as well:

       $ printf t | sed 'st\ttt' | od -a -An
       [no output, as expected]

And similarly for other characters:

       printf x | sed 'sx\xxx' | od -a -An
       printf a | sed 'sa\aaa' | od -a -An
       printf z | sed 'sz\zzz' | od -a -An
       [no output, as expected]

---

Second,
The "\n" case behaves differently, regardless of which
separator is used. It is always treated as "\n" (new line),
never literal "n", even if the separator is "n":

These are correct, as expected:
     $ printf n | sed 's/\n//' | od -a -An
        n
     $ printf n | sed 's/\n//' | od -a -An
        n
     $ printf n | sed 'sx\nxx' | od -a -An
        n

Here, we'd expect "\n" to be treated as a literal "n" character,
not "\n", but it is not (as you've found):

     $ printf n | sed 'sn\nnn' | od -a -An
        n

----

In the code, the "match_slash" function [1] is used to find
the delimiters of the "s" command (typically "slashes").
Special handling happens if a slash is found [2],
And in lines 557-8 there's this conditional:

               else if (ch == 'n' && regex)
                 ch = '\n';

Which forces any "\n" to be a new-line, regardless if the
delimiter itself was an "n".

[1] https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c#n531
[2] https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c#n552

In older sed versions, these two lines where protected by
"#ifndef REG_PERL" [3] so perhaps it had something to do with regex 
variants. But the origin of this line predates the git history.
Jim/Paolo - any ideas what this relates to?

https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c?id=41a169a9a14b5bdc736313eb411f02bcbe1c046d#n551

---

Interestingly, removing these two lines does not cause
any test failures, so this might be easy to fix without causing
any regressions.


For now I'm leaving this item open until we decide how to deal with it.

regards,
  - assaf








Information forwarded to bug-sed@HIDDEN:
bug#40242; Package sed. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 26 Mar 2020 15:30:27 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Mar 26 11:30:27 2020
Received: from localhost ([127.0.0.1]:59514 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1jHUSh-00077Y-Do
	for submit <at> debbugs.gnu.org; Thu, 26 Mar 2020 11:30:27 -0400
Received: from lists.gnu.org ([209.51.188.17]:59141)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <oguzismailuysal@HIDDEN>) id 1jHL5w-0005Mi-Q8
 for submit <at> debbugs.gnu.org; Thu, 26 Mar 2020 01:30:21 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:44733)
 by lists.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <oguzismailuysal@HIDDEN>) id 1jHL5v-0001Vw-GA
 for bug-sed@HIDDEN; Thu, 26 Mar 2020 01:30:20 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM,
 HTML_MESSAGE,URIBL_BLOCKED autolearn=disabled version=3.3.2
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <oguzismailuysal@HIDDEN>) id 1jHL5u-0005zy-Av
 for bug-sed@HIDDEN; Thu, 26 Mar 2020 01:30:19 -0400
Received: from mail-ua1-x931.google.com ([2607:f8b0:4864:20::931]:40991)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
 (Exim 4.71) (envelope-from <oguzismailuysal@HIDDEN>)
 id 1jHL5u-0005zm-44
 for bug-sed@HIDDEN; Thu, 26 Mar 2020 01:30:18 -0400
Received: by mail-ua1-x931.google.com with SMTP id f9so1692075uaq.8
 for <bug-sed@HIDDEN>; Wed, 25 Mar 2020 22:30:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:from:date:message-id:subject:to;
 bh=avil3CGUU528McFE4wKuaDcsoo4z5v363sV+/FhDOQY=;
 b=nTl7e2mboLP/lr8h50alPCoNgKzQcDEDzQaVx/cXl+cq0eoFTvJ5A040guPvCWED0/
 rYWlXQE2svm0DzmyE8FufSf+jBuMdMXpgejwSdyHJI55AUD2dkJEm50WoAiFcVsLblaX
 XrL8/C311DrZK/ad4VNwhr2tcqWEUCqjGrjrwTJGsjYxNBO9tbvdsbmq8O6ufuISaiU5
 TqXTg/+pqAYOGRGw05droPOeg79UsHfHg7PlS0J6YIzS/a54XH9JSqhzPEGFGzH9HNf5
 fZnj/sONHpg+q023JM/4b1Ln5sIc/eCYKg3g8WpVINNrMCS7ZQJma+ecSuwNiMzEYBDg
 el9g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:from:date:message-id:subject:to;
 bh=avil3CGUU528McFE4wKuaDcsoo4z5v363sV+/FhDOQY=;
 b=fHFtXs5BdEyk4CXDhM/9WF+WQiI32ymFrpvXw/0ZA3OtTcjDXbusp2EBgFJImVg0Rh
 XP2dxs96AGYxXMU5/L7WBGIp5XMA8bQFDbWzEOprrNy6HzevIB70f6F2ea5BBvm1RWj6
 iWY5xiPNTUtMN+25s94Sb8M2V1vQSrgGddXLY5poObgTC5aBcAjqLSRLZThRaGpAKprs
 N813/f4yt/s8lYPvtNxtnTmE3N7Ow9kWExIQazJs713expeeeiIudzngRSzZfLORkVGi
 Li4a4CZ5nwWXCF1cZH7atEHp2/ik9OSlc0qRDOfkVFUAs0pR8opwS+giznt2q2IKiq66
 XEBA==
X-Gm-Message-State: ANhLgQ2A9DTw7w0LoUA0pun3P0KUiHIwT9m6hwT2eeM7vbBCJiRERFQY
 JatVhBn0E2x66L+1oGsqnkpSmOiJf8LbXRWbi7DSTJgyfUg=
X-Google-Smtp-Source: ADFU+vstHztF2hxNow9im6vzs2k+BRp4Vw2I0BHCTiZJDefQmYygOIRZYCfrcK8qjn4s2TPotrTJgO4+yuzBWxVxhz8=
X-Received: by 2002:ab0:6516:: with SMTP id w22mr5019801uam.101.1585200616829; 
 Wed, 25 Mar 2020 22:30:16 -0700 (PDT)
MIME-Version: 1.0
Received: by 2002:a05:6102:200b:0:0:0:0 with HTTP; Wed, 25 Mar 2020 22:30:16
 -0700 (PDT)
From: =?UTF-8?B?T8SfdXo=?= <oguzismailuysal@HIDDEN>
Date: Thu, 26 Mar 2020 07:30:16 +0200
Message-ID: <CAH7i3LrffNCULxT6pjKt_niTTmxf7DED-oun8EEmgsm_v4GKoA@HIDDEN>
Subject: n as delimiter alias
To: bug-sed@HIDDEN
Content-Type: multipart/alternative; boundary="000000000000a9f87105a1bb47ba"
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
 recognized.
X-Received-From: 2607:f8b0:4864:20::931
X-Spam-Score: 0.3 (/)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Thu, 26 Mar 2020 11:30:26 -0400
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.7 (/)

--000000000000a9f87105a1bb47ba
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

$ sed --version
sed (GNU sed) 4.7
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <
https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Jay Fenlason, Tom Lord, Ken Pizzini,
Paolo Bonzini, Jim Meyering, and Assaf Gordon.
GNU sed home page: <https://www.gnu.org/software/sed/>.
General help using GNU software: <https://www.gnu.org/gethelp/>.
E-mail bug reports to: <bug-sed@HIDDEN>.

While '\t' matches a literal 't' when 't' is the delimiter, '\n' does not
match 'n' when 'n' is the delimiter. See:

$ echo t | sed 'st\ttt' | xxd
00000000: 0a                                       .
$
$ echo n | sed 'sn\nnn' | xxd
00000000: 6e0a

Is this a bug or is there a sound logic behind this?


--=20
O=C4=9Fuz

--000000000000a9f87105a1bb47ba
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div>$ sed --version</div><div>sed (GNU sed) 4.7</div><div>Copyright (C) 20=
18 Free Software Foundation, Inc.</div><div>License GPLv3+: GNU GPL version=
 3 or later &lt;<a href=3D"https://gnu.org/licenses/gpl.html">https://gnu.o=
rg/licenses/gpl.html</a>&gt;.</div><div>This is free software: you are free=
 to change and redistribute it.</div><div>There is NO WARRANTY, to the exte=
nt permitted by law.</div><div><br></div><div>Written by Jay Fenlason, Tom =
Lord, Ken Pizzini,</div><div>Paolo Bonzini, Jim Meyering, and Assaf Gordon.=
</div><div>GNU sed home page: &lt;<a href=3D"https://www.gnu.org/software/s=
ed/">https://www.gnu.org/software/sed/</a>&gt;.</div><div>General help usin=
g GNU software: &lt;<a href=3D"https://www.gnu.org/gethelp/">https://www.gn=
u.org/gethelp/</a>&gt;.</div><div>E-mail bug reports to: &lt;<a href=3D"mai=
lto:bug-sed@HIDDEN">bug-sed@HIDDEN</a>&gt;.</div><div><br></div><div>Whil=
e &#39;\t&#39; matches a literal &#39;t&#39; when &#39;t&#39; is the delimi=
ter, &#39;\n&#39; does not match &#39;n&#39; when &#39;n&#39; is the delimi=
ter. See:</div><div><br></div><div>$ echo t | sed &#39;st\ttt&#39; | xxd</d=
iv><div>00000000: 0a=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0.</div><div>$</div><div>$ echo n | sed &#39;sn\nnn&#39; | xxd</di=
v><div>00000000: 6e0a</div><div><br></div><div>Is this a bug or is there a =
sound logic behind this?</div><br><br>-- <br><div dir=3D"ltr">O=C4=9Fuz</di=
v><br>

--000000000000a9f87105a1bb47ba--




Acknowledgement sent to Oğuz <oguzismailuysal@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-sed@HIDDEN. Full text available.
Report forwarded to bug-sed@HIDDEN:
bug#40242; Package sed. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Tue, 31 Mar 2020 05:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.