Received: (at submit) by debbugs.gnu.org; 18 Sep 2010 23:02:58 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Sat Sep 18 19:02:58 2010 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Ox6Qr-0004MU-7m for submit <at> debbugs.gnu.org; Sat, 18 Sep 2010 19:02:58 -0400 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <p.org@HIDDEN>) id 1Ox66a-0004DC-4w for submit <at> debbugs.gnu.org; Sat, 18 Sep 2010 18:42:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from <p.org@HIDDEN>) id 1Ox68u-0002rD-V1 for submit <at> debbugs.gnu.org; Sat, 18 Sep 2010 18:44:25 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, T_TO_NO_BRKTS_FREEMAIL autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:59764) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from <p.org@HIDDEN>) id 1Ox68u-0002r9-Ss for submit <at> debbugs.gnu.org; Sat, 18 Sep 2010 18:44:24 -0400 Received: from [140.186.70.92] (port=38632 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ox68t-0004nd-HS for bug-coreutils@HIDDEN; Sat, 18 Sep 2010 18:44:24 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from <p.org@HIDDEN>) id 1Ox68s-0002qI-8d for bug-coreutils@HIDDEN; Sat, 18 Sep 2010 18:44:23 -0400 Received: from mailout-de.gmx.net ([213.165.64.23]:57703 helo=mail.gmx.net) by eggs.gnu.org with smtp (Exim 4.69) (envelope-from <p.org@HIDDEN>) id 1Ox68r-0002pk-RX for bug-coreutils@HIDDEN; Sat, 18 Sep 2010 18:44:22 -0400 Received: (qmail invoked by alias); 18 Sep 2010 22:44:19 -0000 Received: from d86-32-104-53.cust.tele2.at (EHLO [192.168.1.2]) [86.32.104.53] by mail.gmx.net (mp001) with SMTP; 19 Sep 2010 00:44:19 +0200 X-Authenticated: #14749042 X-Provags-ID: V01U2FsdGVkX19Ru61xviVRh1hQenplsUW13SlTDmoCArgCWQbD+i oz9disb4rGiNYM Message-Id: <016CD1D6-E98C-449C-BAAB-EB698D0C2B0F@HIDDEN> From: Stefan Nowak <p.org@HIDDEN> To: bug-coreutils@HIDDEN Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v935.3) Subject: Feature request: uniq --field-separator="SEP" --consider-fields="a, b, c" --ignore-fields="x, y, z" Date: Sun, 19 Sep 2010 00:44:17 +0200 X-Mailer: Apple Mail (2.935.3) X-Y-GMX-Trusted: 0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sat, 18 Sep 2010 19:02:56 -0400 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/pipermail/debbugs-submit> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Sender: debbugs-submit-bounces <at> debbugs.gnu.org Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org X-Spam-Score: -5.0 (-----) Hello developers! CURRENT SYNTAX: http://www.gnu.org/software/coreutils/manual/html_node/uniq-invocation.html --skip-fields=n Skip n fields on each line before checking for uniqueness. Use a null string for comparison if a line has fewer than n fields. Fields are sequences of non-space non-tab characters that are separated from each other by at least one space or tab. --- FEATURE REQUEST #1 --- --field-separator="SEP", -F EXAMPLE: Scenario: Imagine a filesystem listing. Because of the hierarchical nature, all entries are unique. Now I want to ignore the filepath- prefix (skip the field/s by -F), and only consider the basename, and see how many instances exist of it, and where (all duplicate instances by -D). Input: folder a<TAB>file 1 folder b<TAB>file 1 folder b<TAB>file 2 folder c<TAB>file 3 Commandline: cat sample.txt | guniq -D -F "\t" -f 1 Output: folder a<TAB>file 1 folder b<TAB>file 1 BENEFIT: If you can define the separator character (i.e. TAB), then you have the freedom to have all other characters besides SEP within your column data, i.e. your column could then contain SPACE characters. --- FEATURE SUGGESTION #2 --- --consider-fields=a[,b,c, ...] Build the comparison string of a line from these field(s). --ignore-fields=x[,y,z,...] Build the comparison string of a line by excluding these field(s). EXAMPLE: Input: folder a<TAB>file 1<TAB>suffixA folder b<TAB>file 1<TAB>suffixB folder b<TAB>file 2<TAB>suffixA folder c<TAB>file 3<TAB>suffixA Commandline: cat sample.txt | guniq -D -F "\t" --consider-fields="2" Equivalent to: cat sample.txt | guniq -D -F "\t" --ignore-fields="1,3" Output: folder a<TAB>file 1<TAB>suffixA folder b<TAB>file 1<TAB>suffixB WORKAROUND MEANWHILE: Pre-insert a RegEx find/replace process in the pipe before uniq, which brings all the comparison-ignored data to the front, and then --skip-fields. BENEFIT: Of course it would be much more convenient to work with the data as-is, and have the functions --consider-fields and --ignore- fields. Regards, Stefan Nowak
Stefan Nowak <p.org@HIDDEN>
:bug-coreutils@HIDDEN
.
Full text available.owner <at> debbugs.gnu.org, bug-coreutils@HIDDEN
:bug#7068
; Package coreutils
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.