Bob Proulx <bob@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.Bob Proulx <bob@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.Received: (at 6780) by debbugs.gnu.org; 2 Aug 2010 20:30:04 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Aug 02 16:30:03 2010 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Og1e7-0000yR-5l for submit <at> debbugs.gnu.org; Mon, 02 Aug 2010 16:30:03 -0400 Received: from joseki.proulx.com ([216.17.153.58]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <bob@HIDDEN>) id 1Og1e4-0000y0-RL; Mon, 02 Aug 2010 16:30:01 -0400 Received: from dementia.proulx.com (dementia.proulx.com [192.168.230.115]) by joseki.proulx.com (Postfix) with ESMTP id 160422130E; Mon, 2 Aug 2010 14:30:24 -0600 (MDT) Received: by dementia.proulx.com (Postfix, from userid 1000) id 0B30A3CC39A; Mon, 2 Aug 2010 14:30:24 -0600 (MDT) Date: Mon, 2 Aug 2010 14:30:24 -0600 From: Bob Proulx <bob@HIDDEN> To: Bill <bill3@HIDDEN> Subject: Re: bug#6780: Problem with the cut command Message-ID: <20100802203023.GA10969@HIDDEN> References: <1280753791.2833.135.camel@HIDDEN> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1280753791.2833.135.camel@HIDDEN> User-Agent: Mutt/1.5.18 (2008-05-17) X-Spam-Score: -1.1 (-) X-Debbugs-Envelope-To: 6780 Cc: 6780 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/pipermail/debbugs-submit> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Sender: debbugs-submit-bounces <at> debbugs.gnu.org Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org X-Spam-Score: -2.4 (--) tags 6780 + wishlist retitle 6780 Add cut multi-character/expression delimiters thanks Bill wrote: > I'm not sure if this is a bug, a question or a feature request, > but there is a problem with the cut command, specifically with > it's delimiter option '-d'. > > In older times disk space was scarce and every byte was > conserved. Fields in data files were delimited with a single > character such as ':'. This practise continues today. But > sometimes it does not and fields in some files are separated > with multiple characters. Space is no longer precious. Sure. But I think none of that is relevant to changing stable program interfaces and behavior. That is a good point for creating a new program that has no legacy however. The world is wide open for adding new programs. Feel free to go for it there. > Suppose I wish to import information about a disk partition > into my backup script. I want to assign the type of filesystem > to a variable. Compare the output of these two commands. > > cat /etc/fstab |grep home | cut -d ' ' -f3 > yields a blank output line It is data dependent. The output depends upon what you have as input. For some files it would be one way and for others a different way. But that just points out that using cut is the wrong tool for the task. As you are well aware of by your note cut works with single character delimiters. But the fstab may have multiple whitespace. This makes cut an inappropriate tool for the job. > cat /etc/fstab |grep opt | awk -F " " '{print $3}' > yields the desired output - reiserfs. Awk is a much better tool for the task. But the inefficiencies present in that command line are many. There are much better ways. Try this instead. awk '/opt/{print$3}' /etc/fstab However that doesn't account for comments that may also match. To avoid problems comments should be removed first. awk '/#/{gsub("#.*","")}/opt/{print$3}' /etc/fstab And I am inclined to say that it is better to just match on a particular field. awk '/#/{gsub("#.*","")}$2=="/opt"{print$3}' /etc/fstab > The problem is that the cut command can't handle multiple > instances of the same delimiter. It's designed to handle > a single character like ':', but can't cope with repeating > characters like '::' or a series of spaces as in /etc/fstab. All correct. The cut command is not the appropriate tool for your task. > So my question is shouldn't the cut delimiter handle > multiple instances of the same character internally or > failing that, shouldn't there be some way of specifying a > series of single delimiter characters such as -d':'+ ? In my opinion no, it should not. It is feature creep and code bloat. Cut is not just used on large servers and large desktops but also on wristwatches and toaster ovens. Should the increase in size be multipled by every system in the known universe? And even if this feature were added to cut the program would still be insufficient to the task since it has no capability to handle comments nor line selection (although your combination with grep is fine with me, good in fact though sed would be better since it enables checking return status). Furthermore the feature is already implemented and fully supported by awk. Using awk is a much better fit than using cut. The solution already exists in awk and therefore is not needed in cut. The awk program is standardized and portable. To me awk is the best in class tool for this task. Bob
owner <at> debbugs.gnu.org, bug-coreutils@HIDDEN
:bug#6780
; Package coreutils
.
Full text available.Received: (at submit) by debbugs.gnu.org; 2 Aug 2010 19:09:46 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Aug 02 15:09:46 2010 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Og0OQ-0000PI-DW for submit <at> debbugs.gnu.org; Mon, 02 Aug 2010 15:09:46 -0400 Received: from mx10.gnu.org ([199.232.76.166]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <dave_br@HIDDEN>) id 1Og0OO-0000PB-Db for submit <at> debbugs.gnu.org; Mon, 02 Aug 2010 15:09:45 -0400 Received: from lists.gnu.org ([199.232.76.165]:60455) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from <dave_br@HIDDEN>) id 1Og0Om-0007Sl-Gq for submit <at> debbugs.gnu.org; Mon, 02 Aug 2010 15:10:08 -0400 Received: from [140.186.70.92] (port=40261 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Og0Ol-0001uH-0G for bug-coreutils@HIDDEN; Mon, 02 Aug 2010 15:10:08 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,T_RP_MATCHES_RCVD,T_TO_NO_BRKTS_FREEMAIL autolearn=unavailable version=3.3.1 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from <dave_br@HIDDEN>) id 1Og0Oj-0005IT-Cb for bug-coreutils@HIDDEN; Mon, 02 Aug 2010 15:10:06 -0400 Received: from mailout-eu.gmx.com ([213.165.64.42]:37714) by eggs.gnu.org with smtp (Exim 4.69) (envelope-from <dave_br@HIDDEN>) id 1Og0Oj-0005I5-0f for bug-coreutils@HIDDEN; Mon, 02 Aug 2010 15:10:05 -0400 Received: (qmail invoked by alias); 02 Aug 2010 19:10:02 -0000 Received: from hex.aaisp.net.uk (EHLO scooter.muppet.show) [90.155.53.9] by mail.gmx.com (mp-eu002) with SMTP; 02 Aug 2010 21:10:02 +0200 X-Authenticated: #48875277 X-Provags-ID: V01U2FsdGVkX18he0lC8X4hU4cUCJX6TP9dQYrQWy3E6Z0wMsVz16 JR7wq/TbFY3ygi Date: Mon, 2 Aug 2010 19:56:43 +0100 From: Davide Brini <dave_br@HIDDEN> To: bug-coreutils@HIDDEN Subject: Re: bug#6780: Problem with the cut command Message-ID: <20100802195643.57236244@HIDDEN> In-Reply-To: <1280753791.2833.135.camel@HIDDEN> References: <1280753791.2833.135.camel@HIDDEN> X-Mailer: Claws Mail 3.7.6 (GTK+ 2.20.1; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-Spam-Score: -5.5 (-----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/pipermail/debbugs-submit> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Sender: debbugs-submit-bounces <at> debbugs.gnu.org Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org X-Spam-Score: -5.6 (-----) On Mon, 02 Aug 2010 05:56:31 -0700 Bill <bill3@HIDDEN> wrote: > I'm not sure if this is a bug, a question or a feature request, > but there is a problem with the cut command, specifically with > it's delimiter option '-d'. > > In older times disk space was scarce and every byte was > conserved. Fields in data files were delimited with a single > character such as ':'. This practise continues today. But > sometimes it does not and fields in some files are separated > with multiple characters. Space is no longer precious. > > Suppose I wish to import information about a disk partition > into my backup script. I want to assign the type of filesystem > to a variable. Compare the output of these two commands. > > cat /etc/fstab |grep home | cut -d ' ' -f3 > yields a blank output line > > cat /etc/fstab |grep opt | awk -F " " '{print $3}' > yields the desired output - reiserfs. > > The problem is that the cut command can't handle multiple > instances of the same delimiter. It's designed to handle > a single character like ':', but can't cope with repeating > characters like '::' or a series of spaces as in /etc/fstab. > > So my question is shouldn't the cut delimiter handle > multiple instances of the same character internally or > failing that, shouldn't there be some way of specifying a > series of single delimiter characters such as -d':'+ ? cut is required by POSIX to treat every separator character as delimiting a field. "Output fields shall be separated by a single occurrence of the field delimiter character." However, what you suggest might be implemented as an extension, which the user would have to enable explicitly (although I wouldn't bet that the maintainers think this is a good idea, but I may be wrong). On a side note, you mention awk which in your specific example of space as separator happens to work fine. However, that is specifically special-cased in awk; with any other single-character separator, awk works exactly like cut: echo 'a::b:c' | awk -F':' '{print "-"$1"--"$2"--"$3"--"$4"-"}' -a----b--c- note the empty second field. But of course in awk, unlike cut. you can say -F ':+' and get the behavior you want. -- D.
owner <at> debbugs.gnu.org, bug-coreutils@HIDDEN
:bug#6780
; Package coreutils
.
Full text available.Received: (at submit) by debbugs.gnu.org; 2 Aug 2010 15:55:19 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Aug 02 11:55:19 2010 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1OfxMF-0006BP-5c for submit <at> debbugs.gnu.org; Mon, 02 Aug 2010 11:55:19 -0400 Received: from mail.gnu.org ([199.232.76.166] helo=mx10.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <bill3@HIDDEN>) id 1OfuYy-0004uV-Nd for submit <at> debbugs.gnu.org; Mon, 02 Aug 2010 08:56:17 -0400 Received: from lists.gnu.org ([199.232.76.165]:36511) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from <bill3@HIDDEN>) id 1OfuZM-0007qP-47 for submit <at> debbugs.gnu.org; Mon, 02 Aug 2010 08:56:40 -0400 Received: from [140.186.70.92] (port=49868 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OfuZK-0002Em-F8 for bug-coreutils@HIDDEN; Mon, 02 Aug 2010 08:56:39 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable version=3.3.1 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from <bill3@HIDDEN>) id 1OfuZI-0007Yz-V6 for bug-coreutils@HIDDEN; Mon, 02 Aug 2010 08:56:38 -0400 Received: from smtp-relay2f.uniserve.ca ([216.113.194.204]:36531 helo=smtp-relay2.uniserve.ca) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from <bill3@HIDDEN>) id 1OfuZI-0007YR-QW for bug-coreutils@HIDDEN; Mon, 02 Aug 2010 08:56:36 -0400 Received: from [216.113.198.149] (helo=zefram.soho.lan) by smtp-relay2.uniserve.ca with esmtp (Exim 4.69) (envelope-from <bill3@HIDDEN>) id 1OfuZE-0003bg-2q; Mon, 02 Aug 2010 05:56:32 -0700 Subject: Problem with the cut command From: Bill <bill3@HIDDEN> To: bug-coreutils@HIDDEN Content-Type: text/plain Date: Mon, 02 Aug 2010 05:56:31 -0700 Message-Id: <1280753791.2833.135.camel@HIDDEN> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-Sender-Info: bill3@HIDDEN X-Scanner: OK. Scanned. X-Uniserve-Spam-Score: 2.7 27 (++) X-Uniserve-Spam-Report: Content analysis details: (2.7 points) pts rule name description ---- ---------------------- -------------------------------------------------- 2.6 HELO_LH_HOME HELO_LH_HOME 0.1 RDNS_NONE Delivered to trusted network by a host with no rDNS X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-Spam-Score: -3.4 (---) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Mon, 02 Aug 2010 11:55:16 -0400 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/pipermail/debbugs-submit> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Sender: debbugs-submit-bounces <at> debbugs.gnu.org Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org X-Spam-Score: -4.7 (----) Hello, I'm not sure if this is a bug, a question or a feature request, but there is a problem with the cut command, specifically with it's delimiter option '-d'. In older times disk space was scarce and every byte was conserved. Fields in data files were delimited with a single character such as ':'. This practise continues today. But sometimes it does not and fields in some files are separated with multiple characters. Space is no longer precious. Suppose I wish to import information about a disk partition into my backup script. I want to assign the type of filesystem to a variable. Compare the output of these two commands. cat /etc/fstab |grep home | cut -d ' ' -f3 yields a blank output line cat /etc/fstab |grep opt | awk -F " " '{print $3}' yields the desired output - reiserfs. The problem is that the cut command can't handle multiple instances of the same delimiter. It's designed to handle a single character like ':', but can't cope with repeating characters like '::' or a series of spaces as in /etc/fstab. So my question is shouldn't the cut delimiter handle multiple instances of the same character internally or failing that, shouldn't there be some way of specifying a series of single delimiter characters such as -d':'+ ? I hope this is useful feedback and look forward to your reply. Bill McGrath
Bill <bill3@HIDDEN>
:bug-coreutils@HIDDEN
.
Full text available.owner <at> debbugs.gnu.org, bug-coreutils@HIDDEN
:bug#6780
; Package coreutils
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.