GNU bug report logs - #6780
Add cut multi-character/expression delimiters

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: coreutils; Severity: wishlist; Reported by: Bill <bill3@HIDDEN>; dated Mon, 2 Aug 2010 15:56:02 UTC; Maintainer for coreutils is bug-coreutils@HIDDEN.
Severity set to 'wishlist' from 'normal' Request was from Bob Proulx <bob@HIDDEN> to control <at> debbugs.gnu.org. Full text available.
Changed bug title to 'Add cut multi-character/expression delimiters' from 'Problem with the cut command' Request was from Bob Proulx <bob@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 6780 <at> debbugs.gnu.org:


Received: (at 6780) by debbugs.gnu.org; 2 Aug 2010 20:30:04 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Aug 02 16:30:03 2010
Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1Og1e7-0000yR-5l
	for submit <at> debbugs.gnu.org; Mon, 02 Aug 2010 16:30:03 -0400
Received: from joseki.proulx.com ([216.17.153.58])
	by debbugs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <bob@HIDDEN>)
	id 1Og1e4-0000y0-RL; Mon, 02 Aug 2010 16:30:01 -0400
Received: from dementia.proulx.com (dementia.proulx.com [192.168.230.115])
	by joseki.proulx.com (Postfix) with ESMTP id 160422130E;
	Mon,  2 Aug 2010 14:30:24 -0600 (MDT)
Received: by dementia.proulx.com (Postfix, from userid 1000)
	id 0B30A3CC39A; Mon,  2 Aug 2010 14:30:24 -0600 (MDT)
Date: Mon, 2 Aug 2010 14:30:24 -0600
From: Bob Proulx <bob@HIDDEN>
To: Bill <bill3@HIDDEN>
Subject: Re: bug#6780: Problem with the cut command
Message-ID: <20100802203023.GA10969@HIDDEN>
References: <1280753791.2833.135.camel@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1280753791.2833.135.camel@HIDDEN>
User-Agent: Mutt/1.5.18 (2008-05-17)
X-Spam-Score: -1.1 (-)
X-Debbugs-Envelope-To: 6780
Cc: 6780 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.11
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/pipermail/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -2.4 (--)

tags 6780 + wishlist
retitle 6780 Add cut multi-character/expression delimiters
thanks

Bill wrote:
> I'm not sure if this is a bug, a question or a feature request,
> but there is a problem with the cut command, specifically with
> it's delimiter option '-d'. 
> 
> In older times disk space was scarce and every byte was 
> conserved. Fields in data files were delimited with a single
> character such as ':'. This practise continues today. But 
> sometimes it does not and fields in some files are separated
> with multiple characters. Space is no longer precious.

Sure.  But I think none of that is relevant to changing stable program
interfaces and behavior.  That is a good point for creating a new
program that has no legacy however.  The world is wide open for adding
new programs.  Feel free to go for it there.

> Suppose I wish to import information about a disk partition
> into my backup script. I want to assign the type of filesystem
> to a variable. Compare the output of these two commands.
> 
> cat /etc/fstab |grep home | cut -d ' ' -f3
> yields a blank output line

It is data dependent.  The output depends upon what you have as input.
For some files it would be one way and for others a different way.
But that just points out that using cut is the wrong tool for the
task.  As you are well aware of by your note cut works with single
character delimiters.  But the fstab may have multiple whitespace.
This makes cut an inappropriate tool for the job.

> cat /etc/fstab |grep opt | awk -F " " '{print $3}'
> yields the desired output - reiserfs.

Awk is a much better tool for the task.  But the inefficiencies
present in that command line are many.  There are much better ways.
Try this instead.

  awk '/opt/{print$3}' /etc/fstab

However that doesn't account for comments that may also match.  To
avoid problems comments should be removed first.

  awk '/#/{gsub("#.*","")}/opt/{print$3}' /etc/fstab

And I am inclined to say that it is better to just match on a
particular field.

  awk '/#/{gsub("#.*","")}$2=="/opt"{print$3}' /etc/fstab

> The problem is that the cut command can't handle multiple 
> instances of the same delimiter. It's designed to handle
> a single character like ':', but can't cope with repeating
> characters like '::' or a series of spaces as in /etc/fstab.

All correct.  The cut command is not the appropriate tool for your
task.

> So my question is shouldn't the cut delimiter handle 
> multiple instances of the same character internally or 
> failing that, shouldn't there be some way of specifying a 
> series of single delimiter characters such as -d':'+  ?

In my opinion no, it should not.  It is feature creep and code bloat.
Cut is not just used on large servers and large desktops but also on
wristwatches and toaster ovens.  Should the increase in size be
multipled by every system in the known universe?  And even if this
feature were added to cut the program would still be insufficient to
the task since it has no capability to handle comments nor line
selection (although your combination with grep is fine with me, good
in fact though sed would be better since it enables checking return
status).  Furthermore the feature is already implemented and fully
supported by awk.  Using awk is a much better fit than using cut.  The
solution already exists in awk and therefore is not needed in cut.
The awk program is standardized and portable.  To me awk is the best
in class tool for this task.

Bob




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils@HIDDEN:
bug#6780; Package coreutils. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 2 Aug 2010 19:09:46 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Aug 02 15:09:46 2010
Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1Og0OQ-0000PI-DW
	for submit <at> debbugs.gnu.org; Mon, 02 Aug 2010 15:09:46 -0400
Received: from mx10.gnu.org ([199.232.76.166])
	by debbugs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <dave_br@HIDDEN>) id 1Og0OO-0000PB-Db
	for submit <at> debbugs.gnu.org; Mon, 02 Aug 2010 15:09:45 -0400
Received: from lists.gnu.org ([199.232.76.165]:60455)
	by monty-python.gnu.org with esmtps
	(TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60)
	(envelope-from <dave_br@HIDDEN>) id 1Og0Om-0007Sl-Gq
	for submit <at> debbugs.gnu.org; Mon, 02 Aug 2010 15:10:08 -0400
Received: from [140.186.70.92] (port=40261 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Og0Ol-0001uH-0G
	for bug-coreutils@HIDDEN; Mon, 02 Aug 2010 15:10:08 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM,
	RCVD_IN_DNSWL_NONE,T_RP_MATCHES_RCVD,T_TO_NO_BRKTS_FREEMAIL
	autolearn=unavailable version=3.3.1
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <dave_br@HIDDEN>) id 1Og0Oj-0005IT-Cb
	for bug-coreutils@HIDDEN; Mon, 02 Aug 2010 15:10:06 -0400
Received: from mailout-eu.gmx.com ([213.165.64.42]:37714)
	by eggs.gnu.org with smtp (Exim 4.69)
	(envelope-from <dave_br@HIDDEN>) id 1Og0Oj-0005I5-0f
	for bug-coreutils@HIDDEN; Mon, 02 Aug 2010 15:10:05 -0400
Received: (qmail invoked by alias); 02 Aug 2010 19:10:02 -0000
Received: from hex.aaisp.net.uk (EHLO scooter.muppet.show) [90.155.53.9]
	by mail.gmx.com (mp-eu002) with SMTP; 02 Aug 2010 21:10:02 +0200
X-Authenticated: #48875277
X-Provags-ID: V01U2FsdGVkX18he0lC8X4hU4cUCJX6TP9dQYrQWy3E6Z0wMsVz16
	JR7wq/TbFY3ygi
Date: Mon, 2 Aug 2010 19:56:43 +0100
From: Davide Brini <dave_br@HIDDEN>
To: bug-coreutils@HIDDEN
Subject: Re: bug#6780: Problem with the cut command
Message-ID: <20100802195643.57236244@HIDDEN>
In-Reply-To: <1280753791.2833.135.camel@HIDDEN>
References: <1280753791.2833.135.camel@HIDDEN>
X-Mailer: Claws Mail 3.7.6 (GTK+ 2.20.1; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Y-GMX-Trusted: 0
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3)
X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6,
	seldom 2.4 (older, 4)
X-Spam-Score: -5.5 (-----)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.11
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/pipermail/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -5.6 (-----)

On Mon, 02 Aug 2010 05:56:31 -0700 Bill <bill3@HIDDEN> wrote:

> I'm not sure if this is a bug, a question or a feature request,
> but there is a problem with the cut command, specifically with
> it's delimiter option '-d'. 
> 
> In older times disk space was scarce and every byte was 
> conserved. Fields in data files were delimited with a single
> character such as ':'. This practise continues today. But 
> sometimes it does not and fields in some files are separated
> with multiple characters. Space is no longer precious.
> 
> Suppose I wish to import information about a disk partition
> into my backup script. I want to assign the type of filesystem
> to a variable. Compare the output of these two commands.
> 
> cat /etc/fstab |grep home | cut -d ' ' -f3
> yields a blank output line
> 
> cat /etc/fstab |grep opt | awk -F " " '{print $3}'
> yields the desired output - reiserfs.
> 
> The problem is that the cut command can't handle multiple 
> instances of the same delimiter. It's designed to handle
> a single character like ':', but can't cope with repeating
> characters like '::' or a series of spaces as in /etc/fstab.
> 
> So my question is shouldn't the cut delimiter handle 
> multiple instances of the same character internally or 
> failing that, shouldn't there be some way of specifying a 
> series of single delimiter characters such as -d':'+  ?

cut is required by POSIX to treat every separator character as delimiting a
field. 

"Output fields shall be separated by a single occurrence of the field
delimiter character."

However, what you suggest might be implemented as an extension, which the
user would have to enable explicitly (although I wouldn't bet that the
maintainers think this is a good idea, but I may be wrong).

On a side note, you mention awk which in your specific example of space as
separator happens to work fine. However, that is specifically special-cased
in awk; with any other single-character separator, awk works exactly like
cut:

echo 'a::b:c' | awk -F':' '{print "-"$1"--"$2"--"$3"--"$4"-"}'
-a----b--c-

note the empty second field. But of course in awk, unlike cut. you can say
-F ':+' and get the behavior you want.

-- 
D.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils@HIDDEN:
bug#6780; Package coreutils. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 2 Aug 2010 15:55:19 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Aug 02 11:55:19 2010
Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1OfxMF-0006BP-5c
	for submit <at> debbugs.gnu.org; Mon, 02 Aug 2010 11:55:19 -0400
Received: from mail.gnu.org ([199.232.76.166] helo=mx10.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <bill3@HIDDEN>) id 1OfuYy-0004uV-Nd
	for submit <at> debbugs.gnu.org; Mon, 02 Aug 2010 08:56:17 -0400
Received: from lists.gnu.org ([199.232.76.165]:36511)
	by monty-python.gnu.org with esmtps
	(TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60)
	(envelope-from <bill3@HIDDEN>) id 1OfuZM-0007qP-47
	for submit <at> debbugs.gnu.org; Mon, 02 Aug 2010 08:56:40 -0400
Received: from [140.186.70.92] (port=49868 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1OfuZK-0002Em-F8
	for bug-coreutils@HIDDEN; Mon, 02 Aug 2010 08:56:39 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00
	autolearn=unavailable version=3.3.1
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <bill3@HIDDEN>) id 1OfuZI-0007Yz-V6
	for bug-coreutils@HIDDEN; Mon, 02 Aug 2010 08:56:38 -0400
Received: from smtp-relay2f.uniserve.ca ([216.113.194.204]:36531
	helo=smtp-relay2.uniserve.ca) by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <bill3@HIDDEN>) id 1OfuZI-0007YR-QW
	for bug-coreutils@HIDDEN; Mon, 02 Aug 2010 08:56:36 -0400
Received: from [216.113.198.149] (helo=zefram.soho.lan)
	by smtp-relay2.uniserve.ca with esmtp (Exim 4.69)
	(envelope-from <bill3@HIDDEN>)
	id 1OfuZE-0003bg-2q; Mon, 02 Aug 2010 05:56:32 -0700
Subject: Problem with the cut command
From: Bill <bill3@HIDDEN>
To: bug-coreutils@HIDDEN
Content-Type: text/plain
Date: Mon, 02 Aug 2010 05:56:31 -0700
Message-Id: <1280753791.2833.135.camel@HIDDEN>
Mime-Version: 1.0
X-Mailer: Evolution 2.6.3 
Content-Transfer-Encoding: 7bit
X-Sender-Info: bill3@HIDDEN
X-Scanner: OK. Scanned.
X-Uniserve-Spam-Score: 2.7 27 (++)
X-Uniserve-Spam-Report: Content analysis details:   (2.7 points)
	pts rule name              description
	---- ----------------------
	--------------------------------------------------
	2.6 HELO_LH_HOME           HELO_LH_HOME
	0.1 RDNS_NONE Delivered to trusted network by a host with no rDNS
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6, seldom 2.4 (older,
	4)
X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6,
	seldom 2.4 (older, 4)
X-Spam-Score: -3.4 (---)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Mon, 02 Aug 2010 11:55:16 -0400
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.11
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/pipermail/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -4.7 (----)

Hello,

I'm not sure if this is a bug, a question or a feature request,
but there is a problem with the cut command, specifically with
it's delimiter option '-d'. 

In older times disk space was scarce and every byte was 
conserved. Fields in data files were delimited with a single
character such as ':'. This practise continues today. But 
sometimes it does not and fields in some files are separated
with multiple characters. Space is no longer precious.

Suppose I wish to import information about a disk partition
into my backup script. I want to assign the type of filesystem
to a variable. Compare the output of these two commands.

cat /etc/fstab |grep home | cut -d ' ' -f3
yields a blank output line

cat /etc/fstab |grep opt | awk -F " " '{print $3}'
yields the desired output - reiserfs.

The problem is that the cut command can't handle multiple 
instances of the same delimiter. It's designed to handle
a single character like ':', but can't cope with repeating
characters like '::' or a series of spaces as in /etc/fstab.

So my question is shouldn't the cut delimiter handle 
multiple instances of the same character internally or 
failing that, shouldn't there be some way of specifying a 
series of single delimiter characters such as -d':'+  ?

I hope this is useful feedback and look forward to your reply.

	Bill McGrath






Acknowledgement sent to Bill <bill3@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-coreutils@HIDDEN. Full text available.
Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils@HIDDEN:
bug#6780; Package coreutils. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Fri, 31 Oct 2014 17:00:04 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.