GNU bug report logs - #44704
uniq: replace repeated lines with a message about how many repeated lines

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: coreutils; Severity: wishlist; Reported by: "Brian J. Murrell" <brian@HIDDEN>; Keywords: notabug; dated Tue, 17 Nov 2020 14:14:01 UTC; Maintainer for coreutils is bug-coreutils@HIDDEN.

Message received at 44704 <at> debbugs.gnu.org:


Received: (at 44704) by debbugs.gnu.org; 18 Nov 2020 11:25:21 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Nov 18 06:25:21 2020
Received: from localhost ([127.0.0.1]:34405 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1kfLaT-0002zB-Fs
	for submit <at> debbugs.gnu.org; Wed, 18 Nov 2020 06:25:21 -0500
Received: from mail-wm1-f46.google.com ([209.85.128.46]:39496)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <celvidge001@HIDDEN>) id 1kfLaR-0002yd-JQ
 for 44704 <at> debbugs.gnu.org; Wed, 18 Nov 2020 06:25:19 -0500
Received: by mail-wm1-f46.google.com with SMTP id s13so2366903wmh.4
 for <44704 <at> debbugs.gnu.org>; Wed, 18 Nov 2020 03:25:19 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=subject:to:references:cc:from:message-id:date:user-agent
 :mime-version:in-reply-to:content-language:content-transfer-encoding;
 bh=aUY8ZCPT10CSOqTexTKM1WOrzBx8DYPNoCq/fd57rFs=;
 b=qDbwKxBjs3blywMgq+bqXk+G4ACO/aWq0GI3U3IBQEkLtyOhReDNoGifCvMLg++rED
 EOOdNpuFB2uUS8ea3H/1NZuhU5UMtKN3yAur3Y7CTo+mFpWv9pp7mQ+hJTFT1jUD9+Ie
 MSj/Y0PBfreHCtPLw3tWMcU/yQEAyx8eqWQxWllGnpb/lRb0n+gx6Dw4NPens2zIXhdo
 6sdEKgISOSpqhsmoDWQKnoXyTB3DuDeDk09ezyezSE8IM50fpz2+vNP7wNca+9V6IFcH
 AW2k0qiIhhzGJpcJHwCt/3WKtkrH6O8wMSKom5z1y1cu06DtuWJgorbRSnfCn1g4ZL4s
 fUOg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:subject:to:references:cc:from:message-id:date
 :user-agent:mime-version:in-reply-to:content-language
 :content-transfer-encoding;
 bh=aUY8ZCPT10CSOqTexTKM1WOrzBx8DYPNoCq/fd57rFs=;
 b=IRgwXFQA7cWWL+iVazuuwZlBOynDFhJKbUzQ/TKKRbASSX03r9QPi69IwT723EBbmz
 7sF+gzn+CRx6ukuDC/QdjOc12OTrb4q7X2IwNuT3wAYv73bk9IVNydbW1vGyQoRadAyK
 rvXc3FPzWM5rRJ7cig6+nQ0NGxQIRIZuBZcmsODt9ndsgHrezZyB+VGxQijGSiv1HjTQ
 2OXA4uUrle2LnfaXnHRdrGMv7juozVN00fkwenTv5mJ6GAuRHCo13R59ps/xPM6gB5jN
 eYWfSeSCFywKqNXX/5yS/OA92MqOQBEpIMZeOaPuFTc8biH2xqknZfDIMivwrYibnGJD
 YrzA==
X-Gm-Message-State: AOAM533ns7N2fTBTqya7TzoTJ3teZhRNK/+God3VDRWzryWIsVpQ6deL
 EVYj6TR3ZG6sGIpIe7kuJEA=
X-Google-Smtp-Source: ABdhPJzQSK4143HhWieky0ve/8woZEty5nDUc16KXmn4Iy+KPVRZlrNBWLH0J1oMNXw7cGxjcTOGzg==
X-Received: by 2002:a1c:b387:: with SMTP id c129mr3885641wmf.58.1605698713762; 
 Wed, 18 Nov 2020 03:25:13 -0800 (PST)
Received: from [192.168.23.100] (92.40.176.149.threembb.co.uk. [92.40.176.149])
 by smtp.gmail.com with ESMTPSA id n10sm33944443wrx.9.2020.11.18.03.25.12
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Wed, 18 Nov 2020 03:25:12 -0800 (PST)
Subject: Re: bug#44704: uniq: replace repeated lines with a message about how
 many repeated lines
To: 44704 <at> debbugs.gnu.org
References: <b898eca3e980f661156db1d268733149b0c47179.camel@HIDDEN>
From: Chris Elvidge <celvidge001@HIDDEN>
Message-ID: <7e7b68bc-e6b3-a1df-1d5e-c4a47435cf63@HIDDEN>
Date: Wed, 18 Nov 2020 11:25:11 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.2.1 Lightning/5.4
MIME-Version: 1.0
In-Reply-To: <b898eca3e980f661156db1d268733149b0c47179.camel@HIDDEN>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-GB
Content-Transfer-Encoding: 7bit
X-Spam-Score: 3.8 (+++)
X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org",
 has NOT identified this incoming email as spam.  The original
 message has been attached to this so you can view it or label
 similar future email.  If you have any questions, see
 the administrator of that system for details.
 Content preview:  On 17/11/2020 01:32 pm, Brian J. Murrell wrote: > It would
 be a useful enhancement to uniq to replace all lines > considered non-uniq
 (i.e. those that would be removed from the output) > with a messag [...] 
 Content analysis details:   (3.8 points, 10.0 required)
 pts rule name              description
 ---- ---------------------- --------------------------------------------------
 3.6 RCVD_IN_SBL_CSS        RBL: Received via a relay in Spamhaus SBL-CSS
 [92.40.176.149 listed in zen.spamhaus.org]
 0.0 FREEMAIL_FROM          Sender email is commonly abused enduser mail
 provider (celvidge001[at]gmail.com)
 0.0 SPF_HELO_NONE          SPF: HELO does not publish an SPF Record
 0.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends
 in digit (celvidge001[at]gmail.com)
 -0.0 SPF_PASS               SPF: sender matches SPF record
 -0.0 RCVD_IN_DNSWL_NONE     RBL: Sender listed at https://www.dnswl.org/,
 no trust [209.85.128.46 listed in list.dnswl.org]
 -0.0 RCVD_IN_MSPIKE_H2      RBL: Average reputation (+2)
 [209.85.128.46 listed in wl.mailspike.net]
 -0.0 NICE_REPLY_A           Looks like a legit reply (A)
X-Debbugs-Envelope-To: 44704
Cc: "Brian J. Murrell" <brian@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 2.8 (++)
X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org",
 has NOT identified this incoming email as spam.  The original
 message has been attached to this so you can view it or label
 similar future email.  If you have any questions, see
 the administrator of that system for details.
 
 Content preview:  On 17/11/2020 01:32 pm, Brian J. Murrell wrote: > It would
    be a useful enhancement to uniq to replace all lines > considered non-uniq
    (i.e. those that would be removed from the output) > with a messag [...] 
 
 Content analysis details:   (2.8 points, 10.0 required)
 
  pts rule name              description
 ---- ---------------------- --------------------------------------------------
 -0.0 RCVD_IN_MSPIKE_H2      RBL: Average reputation (+2)
                             [209.85.128.46 listed in wl.mailspike.net]
  3.6 RCVD_IN_SBL_CSS        RBL: Received via a relay in Spamhaus SBL-CSS
                             [92.40.176.149 listed in zen.spamhaus.org]
 -0.0 RCVD_IN_DNSWL_NONE     RBL: Sender listed at https://www.dnswl.org/,
                              no trust
                             [209.85.128.46 listed in list.dnswl.org]
  0.0 FREEMAIL_FROM          Sender email is commonly abused enduser mail
                             provider (celvidge001[at]gmail.com)
  0.0 SPF_HELO_NONE          SPF: HELO does not publish an SPF Record
  0.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends
                             in digit (celvidge001[at]gmail.com)
 -0.0 SPF_PASS               SPF: sender matches SPF record
 -1.0 MAILING_LIST_MULTI     Multiple indicators imply a widely-seen list
                             manager
 -0.0 NICE_REPLY_A           Looks like a legit reply (A)

On 17/11/2020 01:32 pm, Brian J. Murrell wrote:
> It would be a useful enhancement to uniq to replace all lines
> considered non-uniq (i.e. those that would be removed from the output)
> with a message about how many times the previous line was repeated.
> 
> I.e.
> 
> $ cat <<EOF | uniq --replace-with-message '[previous line repeated %d times]'
> first line
> second line
> repeated line
> repeated line
> repeated line
> repeated line
> repeated line
> third line
> EOF
> first line
> second line
> repeated line
> [previous line repeated 4 times]
> third
> line
> 
> Cheers,
> b.
> 
> 

You could write your own function to do it. E.g.

unique() {
[ "$1" ] || { echo "Needs a readable file to test" && return 1; }
[ -r "$1" ] || { echo "Needs a readable file to test" && return 1; }
R=""; N=0
while IFS=$'\n' read L; do
[ "$L" = "$R" ] && { ((N++)); continue; }
[ "$N" -gt 0 ] && { echo "[Previous line repeated $N times]"; N=0; }
R="$L"
echo "$L"
done <$1
}


-- 

Chris Elvidge





Information forwarded to bug-coreutils@HIDDEN:
bug#44704; Package coreutils. Full text available.

Message received at 44704 <at> debbugs.gnu.org:


Received: (at 44704) by debbugs.gnu.org; 17 Nov 2020 22:18:21 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Nov 17 17:18:21 2020
Received: from localhost ([127.0.0.1]:33252 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1kf9Ir-0003vX-4z
	for submit <at> debbugs.gnu.org; Tue, 17 Nov 2020 17:18:21 -0500
Received: from mail.interlinx.bc.ca
 ([69.165.217.196]:56378 helo=server.interlinx.bc.ca ident=bloodninja)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <brian@HIDDEN>) id 1kf9In-0003vN-Qc
 for 44704 <at> debbugs.gnu.org; Tue, 17 Nov 2020 17:18:20 -0500
Received: from pc.interlinx.bc.ca (pc.interlinx.bc.ca
 [IPv6:fd31:aeb1:48df:0:3b14:e643:83d8:7017])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by server.interlinx.bc.ca (Postfix) with ESMTPSA id 9FAB725AE0;
 Tue, 17 Nov 2020 17:18:08 -0500 (EST)
Message-ID: <3eb9b58be3a6757c1b5f824ec9f75e1cb686c89f.camel@HIDDEN>
Subject: Re: bug#44704: uniq: replace repeated lines with a message about
 how many repeated lines
From: "Brian J. Murrell" <brian@HIDDEN>
To: Paul Eggert <eggert@HIDDEN>
Date: Tue, 17 Nov 2020 17:18:07 -0500
In-Reply-To: <e7fc262b-d243-deba-c1dd-658b0fe9e3ea@HIDDEN>
References: <b898eca3e980f661156db1d268733149b0c47179.camel@HIDDEN>
 <e7fc262b-d243-deba-c1dd-658b0fe9e3ea@HIDDEN>
Content-Type: multipart/signed; micalg="pgp-sha256";
 protocol="application/pgp-signature"; boundary="=-i2UCE7KZWY5Yu1FU2GXO"
User-Agent: Evolution 3.36.5 (3.36.5-1.fc32) 
MIME-Version: 1.0
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 44704
Cc: 44704 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)


--=-i2UCE7KZWY5Yu1FU2GXO
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Tue, 2020-11-17 at 14:10 -0800, Paul Eggert wrote:
> On 11/17/20 5:32 AM, Brian J. Murrell wrote:
>  > [previous line repeated 4 times]
>=20
> uniq -c already does something like that, though it outputs "5"
> instead of "4".=20

Right.  I had considered that.  Something like:

$ cat /tmp/in | uniq -c | while read c line; do
> echo $line
> if [ $c -gt 1 ]; then
> echo "Last line repeated $((c-1)) times"
> fi
> done

But that eats leading whitespace on $line.

> Not sure it's worth gussying up 'uniq' to provide exactly the
> functionality=20
> requested, as output reformatting is easy enough to do yourself using
> awk or=20
> Python or whatever.

Right.  But if I were going to pull out such a big hammer, I'd just
again, eliminate uniq and do everything in awk or Python or whatever.

Anyway, it was just a suggestion.  Doesn't seem like it will go much of
anywhere.  That's fine.  If it really itched me enough, I guess I'd
just submit a patch.

Cheers,
b.


--=-i2UCE7KZWY5Yu1FU2GXO
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----

iQEyBAABCAAdFiEE8B/A+mOVz5cTNBuZ2sHQNBbLyKAFAl+0TB8ACgkQ2sHQNBbL
yKD/2Af2KM+gErz1bOIQYS9MKfww4G5C3kfPUb6Qbe8I+/L/UIm1ObDQDW05w8Uf
SVjlphlxfP11EKKEwcIqSBkXQa8Qg/10uSCF8HRKTd/YLaml73zk14XmGiZY6lGI
pPgX6srM8x4Z4VZ/k1P29A9X+PaWxe5XB1ckkGK1gfM12SV1WfOqvG23mMyTVxI4
OjXcK+/QYFMCLYM6ZFRnEQibzdAKfQxG+L1B8uB+baj1B4znbwUTFo/4LbNobNj4
BEHdSio1eK8YyVwRN7kdc+EUuh/fa1FxH18iKTkBXMQia3XTsbExgFHllpjjMekk
ySSOjvkxiOZ901eNAkzvneb/qle8
=ueUO
-----END PGP SIGNATURE-----

--=-i2UCE7KZWY5Yu1FU2GXO--





Information forwarded to bug-coreutils@HIDDEN:
bug#44704; Package coreutils. Full text available.

Message received at 44704 <at> debbugs.gnu.org:


Received: (at 44704) by debbugs.gnu.org; 17 Nov 2020 22:11:10 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Nov 17 17:11:10 2020
Received: from localhost ([127.0.0.1]:33227 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1kf9Bu-0003jz-1H
	for submit <at> debbugs.gnu.org; Tue, 17 Nov 2020 17:11:10 -0500
Received: from zimbra.cs.ucla.edu ([131.179.128.68]:54370)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eggert@HIDDEN>) id 1kf9Bn-0003jQ-DR
 for 44704 <at> debbugs.gnu.org; Tue, 17 Nov 2020 17:11:09 -0500
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id A66E216006A;
 Tue, 17 Nov 2020 14:10:57 -0800 (PST)
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032)
 with ESMTP id cNnTtzFzHl6p; Tue, 17 Nov 2020 14:10:57 -0800 (PST)
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id EC1AA16011F;
 Tue, 17 Nov 2020 14:10:56 -0800 (PST)
X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026)
 with ESMTP id hOmaf5TO97u3; Tue, 17 Nov 2020 14:10:56 -0800 (PST)
Received: from [192.168.1.9] (cpe-23-243-218-95.socal.res.rr.com
 [23.243.218.95])
 by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id C780416006A;
 Tue, 17 Nov 2020 14:10:56 -0800 (PST)
Subject: Re: bug#44704: uniq: replace repeated lines with a message about how
 many repeated lines
To: "Brian J. Murrell" <brian@HIDDEN>
References: <b898eca3e980f661156db1d268733149b0c47179.camel@HIDDEN>
From: Paul Eggert <eggert@HIDDEN>
Organization: UCLA Computer Science Department
Message-ID: <e7fc262b-d243-deba-c1dd-658b0fe9e3ea@HIDDEN>
Date: Tue, 17 Nov 2020 14:10:56 -0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.10.0
MIME-Version: 1.0
In-Reply-To: <b898eca3e980f661156db1d268733149b0c47179.camel@HIDDEN>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 44704
Cc: 44704 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

On 11/17/20 5:32 AM, Brian J. Murrell wrote:
 > [previous line repeated 4 times]

uniq -c already does something like that, though it outputs "5" instead of "4". 
Not sure it's worth gussying up 'uniq' to provide exactly the functionality 
requested, as output reformatting is easy enough to do yourself using awk or 
Python or whatever.




Information forwarded to bug-coreutils@HIDDEN:
bug#44704; Package coreutils. Full text available.

Message received at 44704 <at> debbugs.gnu.org:


Received: (at 44704) by debbugs.gnu.org; 17 Nov 2020 15:28:59 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Nov 17 10:28:59 2020
Received: from localhost ([127.0.0.1]:60918 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1kf2uh-0002G9-Hz
	for submit <at> debbugs.gnu.org; Tue, 17 Nov 2020 10:28:59 -0500
Received: from mail.interlinx.bc.ca
 ([69.165.217.196]:32976 helo=server.interlinx.bc.ca ident=bloodninja)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <brian@HIDDEN>) id 1kf2uf-0002G1-FC
 for 44704 <at> debbugs.gnu.org; Tue, 17 Nov 2020 10:28:58 -0500
Received: from pc.interlinx.bc.ca (pc.interlinx.bc.ca
 [IPv6:fd31:aeb1:48df:0:3b14:e643:83d8:7017])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by server.interlinx.bc.ca (Postfix) with ESMTPSA id A2DC225AE0;
 Tue, 17 Nov 2020 10:28:53 -0500 (EST)
Message-ID: <afab5b5bf0dd890ea8da21af84b21bd248a0f71a.camel@HIDDEN>
Subject: Re: bug#44704: uniq: replace repeated lines with a message about
 how many repeated lines
From: "Brian J. Murrell" <brian@HIDDEN>
To: Assaf Gordon <assafgordon@HIDDEN>, 44704 <at> debbugs.gnu.org
Date: Tue, 17 Nov 2020 10:28:53 -0500
In-Reply-To: <d83080a3-b122-ae92-dff6-e5f0003898ca@HIDDEN>
References: <b898eca3e980f661156db1d268733149b0c47179.camel@HIDDEN>
 <d83080a3-b122-ae92-dff6-e5f0003898ca@HIDDEN>
Content-Type: multipart/signed; micalg="pgp-sha256";
 protocol="application/pgp-signature"; boundary="=-hg9mpm8IJaAWoThWfF8J"
User-Agent: Evolution 3.36.5 (3.36.5-1.fc32) 
MIME-Version: 1.0
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 44704
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)


--=-hg9mpm8IJaAWoThWfF8J
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Tue, 2020-11-17 at 08:05 -0700, Assaf Gordon wrote:
>=20
> Hello,

Hi,

> uniq supports the "--group" option, which adds a blank line after
> each
> group of identical lines - this can be used down-stream to process
> groups in any way you want.

But there is no way to have it remove the repeated lines also, correct?

By down-stream process, I feel like you are leaving it up to the down-
stream to remove the duplicate lines as well as add the "repeated %s
times" messages.  Is that correct?

If so, uniq really adds no value.  The down-stream might as well just
do the adjacent line comparison also in such a case.

> And with counting:
>=20
> $ cat in | uniq --group=3Dappend \
>       | awk 'BEGIN { c =3D 0 } ;
>              $0=3D=3D"" { print "Group has " c " lines" ; c=3D0 ; next } =
;
>              1 { print ; c++ }'
>    first line
>    Group has 1 lines
>    second line
>    Group has 1 lines
>    repeated line
>    repeated line
>    repeated line
>    repeated line
>    repeated line
>    Group has 5 lines
>    third line
>    Group has 1 lines

This still doesn't really achieve the original stated goal as the
repeated lines are not being replaced by your "Group has %d lines".

I think once you add the repeated line suppression, you will see that
adding a simple adjacent line comparison and just not using uniq at all
is only slightly incrementally more in the down-stream (which is now
the main).

Cheers,
b.


--=-hg9mpm8IJaAWoThWfF8J
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----

iQEzBAABCAAdFiEE8B/A+mOVz5cTNBuZ2sHQNBbLyKAFAl+z7DUACgkQ2sHQNBbL
yKAEUAf+IEMVDPUuLvt/J27UG12w3qiCCv0129Oqoax9Jv7X3SjGuQh2iPFjcUFG
4tni0dfV6hXJLYcOB0f3Ml5J4dZJhJGZJD2T6amImVl0Lt/kZapLpXCIN19CDTVg
mmhuX4L7jaCg3kquu7S4JTxGqhdrVFgrEha3d5Kvs5hUIIBZvmiNA95+WlHyFuuC
yoQprAuBVCk0msDArUc2TdLCeCKBPubry60hce1A6YNJX/Z60hvgVYBpt6uAkMZW
LOYb8lFWHNuuSSJSaCcBhWdGYhnIjtylLuNYtPpVwKuIKQ51zGrhcAceZ9zFrhOu
Y+jg2RPISl7FqnTIj2ZLDtR0Eg7p8A==
=FR0v
-----END PGP SIGNATURE-----

--=-hg9mpm8IJaAWoThWfF8J--





Information forwarded to bug-coreutils@HIDDEN:
bug#44704; Package coreutils. Full text available.
Severity set to 'wishlist' from 'normal' Request was from Assaf Gordon <assafgordon@HIDDEN> to control <at> debbugs.gnu.org. Full text available.
Added tag(s) notabug. Request was from Assaf Gordon <assafgordon@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 44704 <at> debbugs.gnu.org:


Received: (at 44704) by debbugs.gnu.org; 17 Nov 2020 15:05:57 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Nov 17 10:05:57 2020
Received: from localhost ([127.0.0.1]:60878 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1kf2YO-00081F-UF
	for submit <at> debbugs.gnu.org; Tue, 17 Nov 2020 10:05:57 -0500
Received: from mail-pf1-f180.google.com ([209.85.210.180]:38493)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <assafgordon@HIDDEN>)
 id 1kf2YN-00080x-Fq; Tue, 17 Nov 2020 10:05:55 -0500
Received: by mail-pf1-f180.google.com with SMTP id 10so17456448pfp.5;
 Tue, 17 Nov 2020 07:05:55 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=subject:to:references:from:message-id:date:user-agent:mime-version
 :in-reply-to:content-language:content-transfer-encoding;
 bh=xSW91yEyWhdOK2oZAJ653nh4BNmKSib/CV1a0kCEF9s=;
 b=BAkGGu4uSwG+9spbOtAryiWV3rJxSq9QgrgQV2vYB2GNXiz4gQwJAfgVfON8KcexP7
 Wpt568HxXR66wlNgEPMkKEv9Hj+f/jetgIos7nuPPUU7rcH+vjXsmRTBbVSSTe9Uz57Q
 Vmt+FtQOF4T5sbemxDjRbKOEctZA5x/fSdyKW75biqAiRHXgWbRAjxUfD1OxqR+RmEwT
 khbxztP/d73/QapczNjuZeDdowRvknGSpUYC7Sc9hhnZukiTukWCSE/yZ8y35G6WKa5r
 vVOVxvahzHvTvSPRwzi+7XeizEcqLIZDUeD+FB77pqRi7V7k/vmxLcgkwUo0AMOvCyY4
 ha7A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:subject:to:references:from:message-id:date
 :user-agent:mime-version:in-reply-to:content-language
 :content-transfer-encoding;
 bh=xSW91yEyWhdOK2oZAJ653nh4BNmKSib/CV1a0kCEF9s=;
 b=FVxEQQ1FyPQBrBUIin2mjN4Qh78UyTxrxVee3wD7q304xoTnoXAQPusb1J2eraJKt8
 DO73KaJAPFeN/MOkljlwYAR3hONlZYwSgzJe7mgyKzQr8F6MuBk/Y67MysilpoS63akR
 8060cClBfn8iH7M9pYJYzGPZdBjBpwd2IlkYs3r9HB3unTj6dBuAObhiIo5802N2keM1
 LQ2Z64neq3u3t9TTltDNDAj/XHSdPNGZDQcqrk8XQlhSfz2dyVNqQAEMvWcV3nOoST6+
 bLzmm7sdf62stHP6yYDWGSTfNdbVfOzB6FS+qbIH4TgySMM+p9CE5XE0cpLeTyVXkqgV
 rYag==
X-Gm-Message-State: AOAM530ySeWX3dD3LDPbiGEv+J+82buu8hSfn0d4ByXyK6nx0MVWGsA+
 GfHlcEZargGUSswz0OtzojXlfnTX538=
X-Google-Smtp-Source: ABdhPJyGI8eM7uzVIFHPmWjt2uQP1egf+2/AvwxC6fBFMbvL7zR4lhrJ943c9YmaN4gLMxez//JTAA==
X-Received: by 2002:a63:d1b:: with SMTP id c27mr3937800pgl.25.1605625548692;
 Tue, 17 Nov 2020 07:05:48 -0800 (PST)
Received: from tomato.moose.housegordon.com (moose.housegordon.com.
 [184.68.105.38])
 by smtp.googlemail.com with ESMTPSA id y14sm3514919pjt.39.2020.11.17.07.05.47
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Tue, 17 Nov 2020 07:05:47 -0800 (PST)
Subject: Re: bug#44704: uniq: replace repeated lines with a message about how
 many repeated lines
To: "Brian J. Murrell" <brian@HIDDEN>, 44704 <at> debbugs.gnu.org
References: <b898eca3e980f661156db1d268733149b0c47179.camel@HIDDEN>
From: Assaf Gordon <assafgordon@HIDDEN>
Message-ID: <d83080a3-b122-ae92-dff6-e5f0003898ca@HIDDEN>
Date: Tue, 17 Nov 2020 08:05:46 -0700
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
 Thunderbird/78.4.0
MIME-Version: 1.0
In-Reply-To: <b898eca3e980f661156db1d268733149b0c47179.camel@HIDDEN>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 44704
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

tag 44704 notabug
severity 44704 wishlist
stop

Hello,

On 2020-11-17 6:32 a.m., Brian J. Murrell wrote:
> It would be a useful enhancement to uniq to replace all lines
> considered non-uniq (i.e. those that would be removed from the output)
> with a message about how many times the previous line was repeated.
> 
> I.e.
> 
> $ cat <<EOF | uniq --replace-with-message '[previous line repeated %d times]'
[...]

uniq supports the "--group" option, which adds a blank line after each
group of identical lines - this can be used down-stream to process
groups in any way you want.

Example:
   $ cat <<EOF > in
   first line
   second line
   repeated line
   repeated line
   repeated line
   repeated line
   repeated line
   third line
   EOF

   $ cat in | uniq --group=append
   first line

   second line

   repeated line
   repeated line
   repeated line
   repeated line
   repeated line

   third line


   $ cat in | uniq --group=append \
       | awk '$0=="" { print "do something after group" ; next } ;
              1 { print }'
   first line
   do something after group
   second line
   do something after group
   repeated line
   repeated line
   repeated line
   repeated line
   repeated line
   do something after group
   third line
   do something after group

And with counting:

$ cat in | uniq --group=append \
      | awk 'BEGIN { c = 0 } ;
             $0=="" { print "Group has " c " lines" ; c=0 ; next } ;
             1 { print ; c++ }'
   first line
   Group has 1 lines
   second line
   Group has 1 lines
   repeated line
   repeated line
   repeated line
   repeated line
   repeated line
   Group has 5 lines
   third line
   Group has 1 lines


Hope this helps.
More information about "uniq --group=X" is here:
 
https://www.gnu.org/software/coreutils/manual/html_node/uniq-invocation.html

I'm marking this as "notabug/wishlist", but will likely close soon as
"wontfix" unless we come up with convincing argument why "--group"
is not sufficient for your use case.

Regardless of the status, discussion can continue by replying to this 
thread.

regards,
  - assaf





Information forwarded to bug-coreutils@HIDDEN:
bug#44704; Package coreutils. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 17 Nov 2020 14:13:28 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Nov 17 09:13:28 2020
Received: from localhost ([127.0.0.1]:58741 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1kf1ja-0006DY-0O
	for submit <at> debbugs.gnu.org; Tue, 17 Nov 2020 09:13:28 -0500
Received: from lists.gnu.org ([209.51.188.17]:40212)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <brian@HIDDEN>) id 1kf16i-0005FW-OQ
 for submit <at> debbugs.gnu.org; Tue, 17 Nov 2020 08:33:20 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:49502)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <brian@HIDDEN>)
 id 1kf16g-0004vU-K8
 for bug-coreutils@HIDDEN; Tue, 17 Nov 2020 08:33:16 -0500
Received: from mail.interlinx.bc.ca ([69.165.217.196]:39342
 helo=server.interlinx.bc.ca)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <brian@HIDDEN>)
 id 1kf16e-0004oN-BA
 for bug-coreutils@HIDDEN; Tue, 17 Nov 2020 08:33:14 -0500
Received: from pc.interlinx.bc.ca (pc.interlinx.bc.ca
 [IPv6:fd31:aeb1:48df:0:3b14:e643:83d8:7017])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by server.interlinx.bc.ca (Postfix) with ESMTPSA id 2772925A11
 for <bug-coreutils@HIDDEN>; Tue, 17 Nov 2020 08:32:37 -0500 (EST)
Message-ID: <b898eca3e980f661156db1d268733149b0c47179.camel@HIDDEN>
Subject: uniq: replace repeated lines with a message about how many repeated
 lines
From: "Brian J. Murrell" <brian@HIDDEN>
To: bug-coreutils@HIDDEN
Date: Tue, 17 Nov 2020 08:32:36 -0500
Content-Type: multipart/signed; micalg="pgp-sha256";
 protocol="application/pgp-signature"; boundary="=-Fyrz0DvWeWTpn2VgdZCz"
User-Agent: Evolution 3.36.5 (3.36.5-1.fc32) 
MIME-Version: 1.0
Received-SPF: pass client-ip=69.165.217.196;
 envelope-from=brian@HIDDEN; helo=server.interlinx.bc.ca
X-detected-operating-system: by eggs.gnu.org: First seen = 2020/11/17 08:32:50
X-ACL-Warn: Detected OS   = Linux 3.11 and newer [fuzzy]
X-Spam_score_int: -41
X-Spam_score: -4.2
X-Spam_bar: ----
X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3,
 SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -1.4 (-)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Tue, 17 Nov 2020 09:13:21 -0500
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.4 (--)


--=-Fyrz0DvWeWTpn2VgdZCz
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

It would be a useful enhancement to uniq to replace all lines
considered non-uniq (i.e. those that would be removed from the output)
with a message about how many times the previous line was repeated.

I.e.

$ cat <<EOF | uniq --replace-with-message '[previous line repeated %d times=
]'
first line
second line
repeated line
repeated line
repeated line
repeated line
repeated line
third line
EOF
first line
second line
repeated line
[previous line repeated 4 times]
third
line

Cheers,
b.



--=-Fyrz0DvWeWTpn2VgdZCz
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----

iQEzBAABCAAdFiEE8B/A+mOVz5cTNBuZ2sHQNBbLyKAFAl+z0PQACgkQ2sHQNBbL
yKCfiwf9HsyiAbmdlIXw0xXgtmTKc9+SwEYyVFOSTKZ//6JR68mBAnipdX8NPwP8
GUhS6d9p0HPJKGTHNVzelJUfBRM2fnAzHVm+X/hHzJrsn6sJ/MnwXMFx9dap2RVG
QR+V5yXpJRPd6FZdAH6C4dlHVWKwDYUgP3AtRt4HlL/TZ8wh/LartoHuDyuhq1tw
lN2kefDepjvQLSq9O9EBCA1CEL9Up2+Y+g40yApyCOwvFzYMn/jJBipa3ZeSC/zk
UN0LM0pkLre/OrCRWpD/yD1nca2ZO06MrdhHhXaB3PWMnmkWSRpasDAFnm3V54Gq
yWJCHO1PW6V+8FLpuzt7kfEXUxiujw==
=GmlJ
-----END PGP SIGNATURE-----

--=-Fyrz0DvWeWTpn2VgdZCz--





Acknowledgement sent to "Brian J. Murrell" <brian@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-coreutils@HIDDEN. Full text available.
Report forwarded to bug-coreutils@HIDDEN:
bug#44704; Package coreutils. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Wed, 18 Nov 2020 11:30:01 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.