GNU bug report logs - #14224
Feature request for the `cut`: record delimiter

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: coreutils; Severity: wishlist; Reported by: George Brink <siberianowl@HIDDEN>; dated Wed, 17 Apr 2013 22:40:02 UTC; Maintainer for coreutils is bug-coreutils@HIDDEN.

Message received at 14224 <at> debbugs.gnu.org:


Received: (at 14224) by debbugs.gnu.org; 18 Apr 2013 19:03:04 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Apr 18 15:03:04 2013
Received: from localhost ([127.0.0.1]:58989 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1USu6p-0004I3-R0
	for submit <at> debbugs.gnu.org; Thu, 18 Apr 2013 15:03:04 -0400
Received: from joseki.proulx.com ([216.17.153.58]:45880)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <bob@HIDDEN>) id 1USu6n-0004Hf-Oh
	for 14224 <at> debbugs.gnu.org; Thu, 18 Apr 2013 15:03:02 -0400
Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119])
	by joseki.proulx.com (Postfix) with ESMTP id 8E173211DA;
	Thu, 18 Apr 2013 12:58:31 -0600 (MDT)
Received: by hysteria.proulx.com (Postfix, from userid 1000)
	id 52B952DCE3; Thu, 18 Apr 2013 12:58:31 -0600 (MDT)
Date: Thu, 18 Apr 2013 12:58:31 -0600
From: Bob Proulx <bob@HIDDEN>
To: 14224 <at> debbugs.gnu.org
Subject: Re: bug#14224: Feature request for the `cut`: record delimiter
Message-ID: <20130418185831.GD8048@HIDDEN>
References: <CAGXyeuhDGzByME80xwOkO5BtyepMexKUNgrP8rxv025-oQfH6w@HIDDEN>
	<20130417230913.GA19399@HIDDEN>
	<516F2F33.3070508@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <516F2F33.3070508@HIDDEN>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Score: -0.7 (/)
X-Debbugs-Envelope-To: 14224
Cc: George Brink <siberianowl@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -2.6 (--)

Eric Blake wrote:
> Should we patch README to include this URL to current HACKING contents,
> since we don't ship HACKING in our tarballs?  Or, should we reconsider
> our position and start shipping HACKING in the tarballs?  Of the
> statements currently in README:
> 
> > If you obtained this file as part of a "git clone", then see the
> > README-hacking file.  If this file came to you as part of a tar archive,
> > then see the file INSTALL for compilation and installation instructions.
> 
> This one makes sense (HACKING won't be present unless you are working
> from git), except that you are not told _how_ to do a "git clone".
> 
> > If you would like to suggest a patch, see the files README-hacking
> > and HACKING for tips.
> 
> But this one doesn't mention anything about the files being git-only.

I think it would definitely make sense to include some information
about the preferred method of getting the source in the main README
file.  That file is usually the one included in downstream
distributions.  It would enable people to bootstrap themselves to the
source.  And GNU is all about access to the source.  So I think that
would make a lot of sense.

Bob




Information forwarded to bug-coreutils@HIDDEN:
bug#14224; Package coreutils. Full text available.

Message received at 14224 <at> debbugs.gnu.org:


Received: (at 14224) by debbugs.gnu.org; 18 Apr 2013 19:00:56 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Apr 18 15:00:56 2013
Received: from localhost ([127.0.0.1]:58983 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1USu4l-0004Cb-NC
	for submit <at> debbugs.gnu.org; Thu, 18 Apr 2013 15:00:56 -0400
Received: from joseki.proulx.com ([216.17.153.58]:45868)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <bob@HIDDEN>) id 1USu4i-0004CQ-AP
	for 14224 <at> debbugs.gnu.org; Thu, 18 Apr 2013 15:00:53 -0400
Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119])
	by joseki.proulx.com (Postfix) with ESMTP id 72D7B211DA;
	Thu, 18 Apr 2013 12:56:21 -0600 (MDT)
Received: by hysteria.proulx.com (Postfix, from userid 1000)
	id 463932DCE3; Thu, 18 Apr 2013 12:56:21 -0600 (MDT)
Date: Thu, 18 Apr 2013 12:56:21 -0600
From: Bob Proulx <bob@HIDDEN>
To: George Brink <siberianowl@HIDDEN>
Subject: Re: bug#14224: Feature request for the `cut`: record delimiter
Message-ID: <20130418185621.GC8048@HIDDEN>
References: <CAGXyeuhDGzByME80xwOkO5BtyepMexKUNgrP8rxv025-oQfH6w@HIDDEN>
	<516F48CD.5000502@HIDDEN>
	<CAGXyeujSP03XA9XQGVvRV-zFLPTaH5nNQZGMuqXekA8eOr=hoQ@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
In-Reply-To: <CAGXyeujSP03XA9XQGVvRV-zFLPTaH5nNQZGMuqXekA8eOr=hoQ@HIDDEN>
User-Agent: Mutt/1.5.21 (2010-09-15)
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: 0.1 (/)
X-Debbugs-Envelope-To: 14224
Cc: 14224 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -2.6 (--)

George Brink wrote:
> Actually I just found yet another way to solve my problem:
> perl -0002 -F"\001" -an -e "print((join \"\001\", @F[0..2,14..46]), \"\=
002\");" data.dat >new_data.dat
> It works fine,

I was thinking of Perl's -0 option when I asked if you would say a few
words about the file and task.  But since you had described it yet I
was hesitant to suggest it.

> but I am a little concerned of the speed. I have over three
> hundreds of such files, from 3Mb to 30Mb each. And this process should =
be
> run every day... I thought that by using cut (which just looks for
> delimiters) I can gain a few minutes on the whole process.

I always recommend benchmarking before optimizing.  Knuth is quoted as
"We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil".

Don't forget programmer productivity either.  You might shave 10% off
of something now but making it imcomprehensible to future admin
maintainers who need to understand it later.  Simply upgrading the
hardware might give a 50% increase in performance.  In which case I
would leave the algorithm simple and more easily understand and not
worry about the performance.  Simple and easy to understand is better
than raw speed.

> Bob,
> I understand your desire to receive a discussion of features not inside=
 the
> bug related mail list, but here is a extract from the README:
> > Mail suggestions and bug reports for these programs to
> > the address on the last line of --help output.
> And guess what, the `cut --help` has the bug-coreutils email in the las=
t
> line! The coreutils email is not mentioned inside README at all. And
> bug-coreutils is mentioned several times in different context.
> I apologize for using this mail-list inappropriately, but I did not kno=
w
> about any other mail-lists

As P=E1draig said, no worries.  I didn't mean it to sound mean or
snarky.  But I can see that my last sentence did come out that way.
Sorry.

But if I didn't say anything then you wouldn't have said anything and
then we wouldn't have been reminded that the contact address hadn't
been updated in your version.  So it ended well.  The way to get the
word out is by continuing to talk about it.  If people even just read
it in passing then they might be informed for the future.

Bob




Information forwarded to bug-coreutils@HIDDEN:
bug#14224; Package coreutils. Full text available.

Message received at 14224 <at> debbugs.gnu.org:


Received: (at 14224) by debbugs.gnu.org; 18 Apr 2013 17:16:34 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Apr 18 13:16:34 2013
Received: from localhost ([127.0.0.1]:58843 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1USsRl-00006u-K5
	for submit <at> debbugs.gnu.org; Thu, 18 Apr 2013 13:16:33 -0400
Received: from mail-wi0-f179.google.com ([209.85.212.179]:38441)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <siberianowl@HIDDEN>) id 1USsRj-00006m-Mx
	for 14224 <at> debbugs.gnu.org; Thu, 18 Apr 2013 13:16:32 -0400
Received: by mail-wi0-f179.google.com with SMTP id l13so2826497wie.12
	for <14224 <at> debbugs.gnu.org>; Thu, 18 Apr 2013 10:12:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:x-received:in-reply-to:references:date:message-id
	:subject:from:to:cc:content-type;
	bh=mpHV6S5XSSz+mVR6FnnTvxkPAB1VHogTz+VtLI7oLLY=;
	b=T2D/uadwkLj2XS0zNebK6GRxRiUikEYJf6mofsvB6WYAXoQZ++mmytenapiwFirtpe
	iHRo3fY4yMEmnK1WoXqtW4BNlynH5DFpijtFa1LniDtkAsEyQ/VYRkoInGcUw5qIp4JG
	+GeJq3OniYcYOiXHRDTEX3wUE2VR7dd96MQ2jrOJnr9qSSxDd4ohjcQX73RgEfeZiSNk
	KEeVX8foOWeymZMK6jZ5BOHtMZvWWO+ShCfp5/8Lk7liSHu4y4ZNpHRDoG2lfEBBYM9o
	4v33EQzeWLByLZiPyCZhZY4UvTA+btzYHlS6B3SbQbDr+nfYMolWBx6KOCniPkqR+FcK
	UKaw==
MIME-Version: 1.0
X-Received: by 10.194.122.166 with SMTP id lt6mr7190917wjb.14.1366305121508;
	Thu, 18 Apr 2013 10:12:01 -0700 (PDT)
Received: by 10.194.55.4 with HTTP; Thu, 18 Apr 2013 10:12:01 -0700 (PDT)
In-Reply-To: <51701CD6.6050006@HIDDEN>
References: <CAGXyeuhDGzByME80xwOkO5BtyepMexKUNgrP8rxv025-oQfH6w@HIDDEN>
	<516F48CD.5000502@HIDDEN>
	<CAGXyeujSP03XA9XQGVvRV-zFLPTaH5nNQZGMuqXekA8eOr=hoQ@HIDDEN>
	<51701CD6.6050006@HIDDEN>
Date: Thu, 18 Apr 2013 13:12:01 -0400
Message-ID: <CAGXyeugw-kMy9ArnP28Hs+Q35Q5s1uZf-UWWbaEx9FZMrskUZQ@HIDDEN>
Subject: Re: bug#14224: Feature request for the `cut`: record delimiter
From: George Brink <siberianowl@HIDDEN>
To: =?ISO-8859-1?Q?P=E1draig_Brady?= <P@HIDDEN>
Content-Type: multipart/alternative; boundary=089e01227ed86c229d04daa5b36a
X-Spam-Score: -0.7 (/)
X-Debbugs-Envelope-To: 14224
Cc: 14224 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -2.6 (--)

--089e01227ed86c229d04daa5b36a
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Thu, Apr 18, 2013 at 12:18 PM, P=C3=A1draig Brady <P@HIDDEN> wro=
te:

>
> awk is often suggested too as an alternative to cut.
>
No, I looked at awk, but it does not have a convenient way to specify lists
of printed fields.
awk -e "BEGIN{FS=3D"=E2=98=BA"; RS=3D"=E2=98=BB"; OFS=3DFS; ORS=3DRS;}; {pr=
int $1,$2,$3,$15,$16,$17
??? ) }
You got the picture...
It is possible to repeat a cut in awk (and documentation for awk does show
how), but this would be a creation of an external application, not a
one-liner with a tool from the box.

--089e01227ed86c229d04daa5b36a
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">=
On Thu, Apr 18, 2013 at 12:18 PM, P=C3=A1draig Brady <span dir=3D"ltr">&lt;=
<a href=3D"mailto:P@HIDDEN" target=3D"_blank">P@HIDDEN</a>&=
gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><br>
awk is often suggested too as an alternative to cut.<br></blockquote><div>N=
o, I looked at awk, but it does not have a convenient way to specify lists =
of printed fields.<br></div><div>awk -e &quot;BEGIN{FS=3D&quot;=E2=98=BA&qu=
ot;; RS=3D&quot;=E2=98=BB&quot;; OFS=3DFS; ORS=3DRS;}; {print $1,$2,$3,$15,=
$16,$17 ??? ) }<br>
</div><div>You got the picture...<br></div><div>It is possible to repeat a =
cut in awk (and documentation for awk does show how), but this would be a c=
reation of an external application, not a one-liner with a tool from the bo=
x.<br>
<br></div></div><br></div></div>

--089e01227ed86c229d04daa5b36a--




Information forwarded to bug-coreutils@HIDDEN:
bug#14224; Package coreutils. Full text available.

Message received at 14224 <at> debbugs.gnu.org:


Received: (at 14224) by debbugs.gnu.org; 18 Apr 2013 16:23:10 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Apr 18 12:23:10 2013
Received: from localhost ([127.0.0.1]:58799 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1USrc5-0006Rc-Pq
	for submit <at> debbugs.gnu.org; Thu, 18 Apr 2013 12:23:10 -0400
Received: from mx1.redhat.com ([209.132.183.28]:19551)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <P@HIDDEN>) id 1USrc2-0006RR-Qw
	for 14224 <at> debbugs.gnu.org; Thu, 18 Apr 2013 12:23:08 -0400
Received: from int-mx11.intmail.prod.int.phx2.redhat.com
	(int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r3IGIafP027771
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
	Thu, 18 Apr 2013 12:18:36 -0400
Received: from [10.36.116.75] (ovpn-116-75.ams2.redhat.com [10.36.116.75])
	by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP
	id r3IGIV8l014627
	(version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
	Thu, 18 Apr 2013 12:18:34 -0400
Message-ID: <51701CD6.6050006@HIDDEN>
Date: Thu, 18 Apr 2013 09:18:30 -0700
From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= <P@HIDDEN>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:17.0) Gecko/20130110 Thunderbird/17.0.2
MIME-Version: 1.0
To: George Brink <siberianowl@HIDDEN>
Subject: Re: bug#14224: Feature request for the `cut`: record delimiter
References: <CAGXyeuhDGzByME80xwOkO5BtyepMexKUNgrP8rxv025-oQfH6w@HIDDEN>
	<516F48CD.5000502@HIDDEN>
	<CAGXyeujSP03XA9XQGVvRV-zFLPTaH5nNQZGMuqXekA8eOr=hoQ@HIDDEN>
In-Reply-To: <CAGXyeujSP03XA9XQGVvRV-zFLPTaH5nNQZGMuqXekA8eOr=hoQ@HIDDEN>
X-Enigmail-Version: 1.5.1
Content-Type: text/plain; charset=UTF-8
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id
	r3IGIafP027771
X-Spam-Score: -6.9 (------)
X-Debbugs-Envelope-To: 14224
Cc: 14224 <at> debbugs.gnu.org, Bob Proulx <bob@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -6.9 (------)

On 04/18/2013 08:41 AM, George Brink wrote:
> On Wed, Apr 17, 2013 at 9:13 PM, P=C3=A1draig Brady <P@HIDDEN> =
wrote:
>=20
>> On 04/17/2013 02:26 PM, George Brink wrote:
>>> Hello,
>>>
>>> I have a task of extracting several "fields" from the text file. The
>>> standard `cut` tool could be a perfect tool for a job, but...
>>> In my file the '\n' character is a legal symbol inside fields and
>> therefore
>>> the text file uses other symbol for record-separator. And the `cut` h=
as a
>>> hard-coded '\n' for record separator (I just checked the source from =
the
>>> coreutils-8.21 package).
>>
>> The patch would be simple but not without compatibility cost.
>> I.E. scripts using this would immediately become incompatible
>> with any systems without this feature.
>>
>> So you'd like something like tac -s, --separator
>> However cut -s is taken, so we'd have to avoid the short -s at least.
>> Also tac -s takes a string rather than a character, so
>> that gives some extra credence (and complexity) to that option there.
>>
>> Also related would be to support the -z, --zero-terminated option.
>> join, sort and uniq all have this option to use NUL as the record
>> separator,
>> however they're all closely related sort dependent utilities
>> and we're trying to unify options between them.
>>
>> If it is just a character you want to separate on,
>> then you can always use tr to convert before processing,
>> albeit with associated data copying overhead.
>>
>> SEP=3D^
>> tr "$SEP"'\n' '\n'"$SEP" | cut ... | tr "$SEP"'\n' '\n'"$SEP"
>>
>> So given that cut is not special here among the text filters,
>> and there is a workaround available, I'm 60:40 against
>> adding this feature.
>>
>> thanks,
>> P=C3=A1draig.
>>
>=20
> P=C3=A1draig,
>
> Thank you for alternative suggestions.
> Actually I just found yet another way to solve my problem:
> perl -0002 -F"\001" -an -e "print((join \"\001\", @F[0..2,14..46]),
> \"\002\");" data.dat >new_data.dat
> It works fine, but I am a little concerned of the speed. I have over th=
ree
> hundreds of such files, from 3Mb to 30Mb each. And this process should =
be
> run every day... I thought that by using cut (which just looks for
> delimiters) I can gain a few minutes on the whole process.
>
> Originally I though of adding "-r, --record-delimiter=3DDELIM" and
> "--output-record-delimiter=3DDELIM: keys to the cut.
> Then the example above could be done with
> cut -d=E2=98=BA -r=E2=98=BB --output-delimiter=3D=E2=98=BA --output-rec=
ord-delimiter=3D=E2=98=BB -f1-3,15-47
> data.dat >new_data.dat
> I think it is feasible and would be more convenient (and hopefully fast=
er)
> than using a whole perl or two calls to tr.

Yes they're the tradeoffs.
awk is often suggested too as an alternative to cut.

> Bob,
> I understand your desire to receive a discussion of features not inside=
 the
> bug related mail list, but here is a extract from the README:
>> Mail suggestions and bug reports for these programs to
>> the address on the last line of --help output.
> And guess what, the `cut --help` has the bug-coreutils email in the las=
t
> line! The coreutils email is not mentioned inside README at all. And
> bug-coreutils is mentioned several times in different context.
> I apologize for using this mail-list inappropriately, but I did not kno=
w
> about any other mail-lists

No worries.  I saw no issue with your mails.
In future cut --help will just point at the
following URL which hopefully is easier to follow:
http://www.gnu.org/software/coreutils/

thanks,
P=C3=A1draig.




Information forwarded to bug-coreutils@HIDDEN:
bug#14224; Package coreutils. Full text available.

Message received at 14224 <at> debbugs.gnu.org:


Received: (at 14224) by debbugs.gnu.org; 18 Apr 2013 15:45:51 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Apr 18 11:45:51 2013
Received: from localhost ([127.0.0.1]:58738 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1USr1y-0005F5-UH
	for submit <at> debbugs.gnu.org; Thu, 18 Apr 2013 11:45:51 -0400
Received: from mail-we0-f171.google.com ([74.125.82.171]:46285)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <siberianowl@HIDDEN>) id 1USr1v-0005Ev-Hk
	for 14224 <at> debbugs.gnu.org; Thu, 18 Apr 2013 11:45:49 -0400
Received: by mail-we0-f171.google.com with SMTP id i48so2547354wef.16
	for <14224 <at> debbugs.gnu.org>; Thu, 18 Apr 2013 08:41:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:x-received:in-reply-to:references:date:message-id
	:subject:from:to:cc:content-type;
	bh=vIuVDwxmPuPdEV1KbnG2B+sAAmhrGYYkI37xbgEK4WI=;
	b=B4eL/3uUPwLJiiKKhp2W/VgDzF4pdGMjc51H/7RiTYPEmZ/u2PtBfGV1m9cMBq1Wns
	dlsXshPhGfTyZUXt5oH6/Yik7jpo3kdkqv4iIvWDMhmMGAuMvvX452f7w3bKGxhLjOvj
	/OnuqXYxUkKivCQRJyfKg/S0/bT/kENUzHNir4+/mhGrwVeKQ7RXGxTl4EFPioZJBcwX
	Vm1sXpHjy8t7i84QCyaUAgh3jMtfkvK50LM1HFoDenbie4gp9swm1JaIIlEEpbkHmDAt
	i8dT3IyMT341EXHNuCGJF/GxEF+KmYK3/ezrij8yadEBpDqtrzEPSoJSea1O9rxx5AaN
	8zSQ==
MIME-Version: 1.0
X-Received: by 10.194.122.166 with SMTP id lt6mr6587820wjb.14.1366299677739;
	Thu, 18 Apr 2013 08:41:17 -0700 (PDT)
Received: by 10.194.55.4 with HTTP; Thu, 18 Apr 2013 08:41:17 -0700 (PDT)
In-Reply-To: <516F48CD.5000502@HIDDEN>
References: <CAGXyeuhDGzByME80xwOkO5BtyepMexKUNgrP8rxv025-oQfH6w@HIDDEN>
	<516F48CD.5000502@HIDDEN>
Date: Thu, 18 Apr 2013 11:41:17 -0400
Message-ID: <CAGXyeujSP03XA9XQGVvRV-zFLPTaH5nNQZGMuqXekA8eOr=hoQ@HIDDEN>
Subject: Re: bug#14224: Feature request for the `cut`: record delimiter
From: George Brink <siberianowl@HIDDEN>
To: =?ISO-8859-1?Q?P=E1draig_Brady?= <P@HIDDEN>, 
	Bob Proulx <bob@HIDDEN>
Content-Type: multipart/alternative; boundary=089e01227ed8f2cf3204daa46e9e
X-Spam-Score: -2.6 (--)
X-Debbugs-Envelope-To: 14224
Cc: 14224 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -2.6 (--)

--089e01227ed8f2cf3204daa46e9e
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

P=C3=A1draig,

Thank you for alternative suggestions.
Actually I just found yet another way to solve my problem:
perl -0002 -F"\001" -an -e "print((join \"\001\", @F[0..2,14..46]),
\"\002\");" data.dat >new_data.dat
It works fine, but I am a little concerned of the speed. I have over three
hundreds of such files, from 3Mb to 30Mb each. And this process should be
run every day... I thought that by using cut (which just looks for
delimiters) I can gain a few minutes on the whole process.

Originally I though of adding "-r, --record-delimiter=3DDELIM" and
"--output-record-delimiter=3DDELIM: keys to the cut.
Then the example above could be done with
cut -d=E2=98=BA -r=E2=98=BB --output-delimiter=3D=E2=98=BA --output-record-=
delimiter=3D=E2=98=BB -f1-3,15-47
data.dat >new_data.dat
I think it is feasible and would be more convenient (and hopefully faster)
than using a whole perl or two calls to tr.




Bob,
I understand your desire to receive a discussion of features not inside the
bug related mail list, but here is a extract from the README:
> Mail suggestions and bug reports for these programs to
> the address on the last line of --help output.
And guess what, the `cut --help` has the bug-coreutils email in the last
line! The coreutils email is not mentioned inside README at all. And
bug-coreutils is mentioned several times in different context.
I apologize for using this mail-list inappropriately, but I did not know
about any other mail-lists



On Wed, Apr 17, 2013 at 9:13 PM, P=C3=A1draig Brady <P@HIDDEN> wrot=
e:

> On 04/17/2013 02:26 PM, George Brink wrote:
> > Hello,
> >
> > I have a task of extracting several "fields" from the text file. The
> > standard `cut` tool could be a perfect tool for a job, but...
> > In my file the '\n' character is a legal symbol inside fields and
> therefore
> > the text file uses other symbol for record-separator. And the `cut` has=
 a
> > hard-coded '\n' for record separator (I just checked the source from th=
e
> > coreutils-8.21 package).
>
> The patch would be simple but not without compatibility cost.
> I.E. scripts using this would immediately become incompatible
> with any systems without this feature.
>
> So you'd like something like tac -s, --separator
> However cut -s is taken, so we'd have to avoid the short -s at least.
> Also tac -s takes a string rather than a character, so
> that gives some extra credence (and complexity) to that option there.
>
> Also related would be to support the -z, --zero-terminated option.
> join, sort and uniq all have this option to use NUL as the record
> separator,
> however they're all closely related sort dependent utilities
> and we're trying to unify options between them.
>
> If it is just a character you want to separate on,
> then you can always use tr to convert before processing,
> albeit with associated data copying overhead.
>
> SEP=3D^
> tr "$SEP"'\n' '\n'"$SEP" | cut ... | tr "$SEP"'\n' '\n'"$SEP"
>
> So given that cut is not special here among the text filters,
> and there is a workaround available, I'm 60:40 against
> adding this feature.
>
> thanks,
> P=C3=A1draig.
>

--089e01227ed8f2cf3204daa46e9e
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div><div>P=C3=A1draig,<br><br>Thank you for alternat=
ive suggestions.<br></div>Actually I just found yet another way to solve my=
 problem:<br></div><div></div>perl -0002 -F&quot;\001&quot; -an -e &quot;pr=
int((join \&quot;\001\&quot;, @F[0..2,14..46]), \&quot;\002\&quot;);&quot; =
data.dat &gt;new_data.dat<br>
</div><div>It works fine, but I am a little concerned of the speed. I have =
over three hundreds of such files, from 3Mb to 30Mb each. And this process =
should be run every day... I thought that by using cut (which just looks fo=
r delimiters) I can gain a few minutes on the whole process.<br>
</div><div><br></div><div>Originally I though of adding &quot;-r, --record-=
delimiter=3DDELIM&quot; and &quot;--output-record-delimiter=3DDELIM: keys t=
o the cut.<br></div><div>Then the example above could be done with<br></div=
>
<div>cut -d=E2=98=BA -r=E2=98=BB --output-delimiter=3D=E2=98=BA --output-re=
cord-delimiter=3D=E2=98=BB -f1-3,15-47 data.dat &gt;new_data.dat<br></div><=
div>I think it is feasible and would be more convenient (and hopefully fast=
er) than using a whole perl or two calls to tr.<br>
<br><br><br><br></div><div>Bob,<br></div><div></div><div>I understand your =
desire to receive a discussion of features not inside the bug related mail =
list, but here is a extract from the README:<br>&gt; Mail suggestions and b=
ug reports for these programs to<br>
</div><div>&gt; the address on the last line of --help output.<br></div><di=
v>And guess what, the `cut --help` has the bug-coreutils email in the last =
line! The coreutils email is not mentioned inside README at all. And bug-co=
reutils is mentioned several times in different context.<br>
</div><div>I apologize for using this mail-list inappropriately, but I did =
not know about any other mail-lists<br></div><div><br></div></div><div clas=
s=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Wed, Apr 17, 2013 a=
t 9:13 PM, P=C3=A1draig Brady <span dir=3D"ltr">&lt;<a href=3D"mailto:P@dra=
igbrady.com" target=3D"_blank">P@HIDDEN</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">On 04/17/2013 02:26 PM, George Brink wrote:<=
br>
&gt; Hello,<br>
<div class=3D"im">&gt;<br>
&gt; I have a task of extracting several &quot;fields&quot; from the text f=
ile. The<br>
&gt; standard `cut` tool could be a perfect tool for a job, but...<br>
</div><div class=3D"im">&gt; In my file the &#39;\n&#39; character is a leg=
al symbol inside fields and therefore<br>
</div>&gt; the text file uses other symbol for record-separator. And the `c=
ut` has a<br>
&gt; hard-coded &#39;\n&#39; for record separator (I just checked the sourc=
e from the<br>
&gt; coreutils-8.21 package).<br>
<br>
The patch would be simple but not without compatibility cost.<br>
I.E. scripts using this would immediately become incompatible<br>
with any systems without this feature.<br>
<br>
So you&#39;d like something like tac -s, --separator<br>
However cut -s is taken, so we&#39;d have to avoid the short -s at least.<b=
r>
Also tac -s takes a string rather than a character, so<br>
that gives some extra credence (and complexity) to that option there.<br>
<br>
Also related would be to support the -z, --zero-terminated option.<br>
join, sort and uniq all have this option to use NUL as the record separator=
,<br>
however they&#39;re all closely related sort dependent utilities<br>
and we&#39;re trying to unify options between them.<br>
<br>
If it is just a character you want to separate on,<br>
then you can always use tr to convert before processing,<br>
albeit with associated data copying overhead.<br>
<br>
SEP=3D^<br>
tr &quot;$SEP&quot;&#39;\n&#39; &#39;\n&#39;&quot;$SEP&quot; | cut ... | tr=
 &quot;$SEP&quot;&#39;\n&#39; &#39;\n&#39;&quot;$SEP&quot;<br>
<br>
So given that cut is not special here among the text filters,<br>
and there is a workaround available, I&#39;m 60:40 against<br>
adding this feature.<br>
<br>
thanks,<br>
P=C3=A1draig.<br>
</blockquote></div><br></div>

--089e01227ed8f2cf3204daa46e9e--




Information forwarded to bug-coreutils@HIDDEN:
bug#14224; Package coreutils. Full text available.

Message received at 14224 <at> debbugs.gnu.org:


Received: (at 14224) by debbugs.gnu.org; 18 Apr 2013 01:18:34 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Apr 17 21:18:34 2013
Received: from localhost ([127.0.0.1]:57556 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1USdUf-00013B-9e
	for submit <at> debbugs.gnu.org; Wed, 17 Apr 2013 21:18:34 -0400
Received: from mx1.redhat.com ([209.132.183.28]:16588)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <P@HIDDEN>) id 1USdUd-000133-5c
	for 14224 <at> debbugs.gnu.org; Wed, 17 Apr 2013 21:18:32 -0400
Received: from int-mx09.intmail.prod.int.phx2.redhat.com
	(int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r3I1E36w027027
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
	Wed, 17 Apr 2013 21:14:04 -0400
Received: from [10.36.116.20] (ovpn-116-20.ams2.redhat.com [10.36.116.20])
	by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP
	id r3I1DxuB007019
	(version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
	Wed, 17 Apr 2013 21:14:02 -0400
Message-ID: <516F48CD.5000502@HIDDEN>
Date: Wed, 17 Apr 2013 18:13:49 -0700
From: =?ISO-8859-1?Q?P=E1draig_Brady?= <P@HIDDEN>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:17.0) Gecko/20130110 Thunderbird/17.0.2
MIME-Version: 1.0
To: George Brink <siberianowl@HIDDEN>
Subject: Re: bug#14224: Feature request for the `cut`: record delimiter
References: <CAGXyeuhDGzByME80xwOkO5BtyepMexKUNgrP8rxv025-oQfH6w@HIDDEN>
In-Reply-To: <CAGXyeuhDGzByME80xwOkO5BtyepMexKUNgrP8rxv025-oQfH6w@HIDDEN>
X-Enigmail-Version: 1.5.1
Content-Type: text/plain; charset=ISO-8859-1
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id
	r3I1E36w027027
X-Spam-Score: -4.2 (----)
X-Debbugs-Envelope-To: 14224
Cc: 14224 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -6.9 (------)

On 04/17/2013 02:26 PM, George Brink wrote:
> Hello,
>=20
> I have a task of extracting several "fields" from the text file. The
> standard `cut` tool could be a perfect tool for a job, but...
> In my file the '\n' character is a legal symbol inside fields and there=
fore
> the text file uses other symbol for record-separator. And the `cut` has=
 a
> hard-coded '\n' for record separator (I just checked the source from th=
e
> coreutils-8.21 package).

The patch would be simple but not without compatibility cost.
I.E. scripts using this would immediately become incompatible
with any systems without this feature.

So you'd like something like tac -s, --separator
However cut -s is taken, so we'd have to avoid the short -s at least.
Also tac -s takes a string rather than a character, so
that gives some extra credence (and complexity) to that option there.

Also related would be to support the -z, --zero-terminated option.
join, sort and uniq all have this option to use NUL as the record separat=
or,
however they're all closely related sort dependent utilities
and we're trying to unify options between them.

If it is just a character you want to separate on,
then you can always use tr to convert before processing,
albeit with associated data copying overhead.

SEP=3D^
tr "$SEP"'\n' '\n'"$SEP" | cut ... | tr "$SEP"'\n' '\n'"$SEP"

So given that cut is not special here among the text filters,
and there is a workaround available, I'm 60:40 against
adding this feature.

thanks,
P=E1draig.




Information forwarded to bug-coreutils@HIDDEN:
bug#14224; Package coreutils. Full text available.

Message received at 14224 <at> debbugs.gnu.org:


Received: (at 14224) by debbugs.gnu.org; 17 Apr 2013 23:29:07 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Apr 17 19:29:06 2013
Received: from localhost ([127.0.0.1]:57422 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1USbmk-00066e-Jc
	for submit <at> debbugs.gnu.org; Wed, 17 Apr 2013 19:29:06 -0400
Received: from mx1.redhat.com ([209.132.183.28]:40933)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <eblake@HIDDEN>) id 1USbmh-00066U-4Q
	for 14224 <at> debbugs.gnu.org; Wed, 17 Apr 2013 19:29:05 -0400
Received: from int-mx01.intmail.prod.int.phx2.redhat.com
	(int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r3HNOajO010634
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
	Wed, 17 Apr 2013 19:24:36 -0400
Received: from [10.3.113.85] (ovpn-113-85.phx2.redhat.com [10.3.113.85])
	by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP
	id r3HNOZCY028587; Wed, 17 Apr 2013 19:24:35 -0400
Message-ID: <516F2F33.3070508@HIDDEN>
Date: Wed, 17 Apr 2013 17:24:35 -0600
From: Eric Blake <eblake@HIDDEN>
Organization: Red Hat, Inc.
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:17.0) Gecko/20130402 Thunderbird/17.0.5
MIME-Version: 1.0
To: Bob Proulx <bob@HIDDEN>
Subject: Re: bug#14224: Feature request for the `cut`: record delimiter
References: <CAGXyeuhDGzByME80xwOkO5BtyepMexKUNgrP8rxv025-oQfH6w@HIDDEN>
	<20130417230913.GA19399@HIDDEN>
In-Reply-To: <20130417230913.GA19399@HIDDEN>
X-Enigmail-Version: 1.5.1
OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg
Content-Type: multipart/signed; micalg=pgp-sha256;
	protocol="application/pgp-signature";
	boundary="----enig2HISBXLPQQXSRRODLGKHA"
X-Scanned-By: MIMEDefang 2.67 on 10.5.11.11
X-Spam-Score: -5.7 (-----)
X-Debbugs-Envelope-To: 14224
Cc: 14224 <at> debbugs.gnu.org, George Brink <siberianowl@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -7.6 (-------)

This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
------enig2HISBXLPQQXSRRODLGKHA
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 04/17/2013 05:09 PM, Bob Proulx wrote:

In addition to Bob's (highly useful!) comments,

>> The README in coreutils suggests to read README-hacking and HACKING fo=
r
>> guide-lines on making a patch, but there are no such files in the the
>> coreutils-8.21.tar.xz.
>=20
> Anyone working on the source code is expected to be working from the
> version control files.  Because the pace of change is rapid and doing
> so just makes it easier all around.
>=20
> Here is the current HACKING file from the vcs online web frontend.
>=20
>   http://git.savannah.gnu.org/gitweb/?p=3Dcoreutils.git;a=3Dblob;f=3DHA=
CKING;hb=3DHEAD

Should we patch README to include this URL to current HACKING contents,
since we don't ship HACKING in our tarballs?  Or, should we reconsider
our position and start shipping HACKING in the tarballs?  Of the
statements currently in README:

> If you obtained this file as part of a "git clone", then see the
> README-hacking file.  If this file came to you as part of a tar archive=
,
> then see the file INSTALL for compilation and installation instructions=
=2E

This one makes sense (HACKING won't be present unless you are working
from git), except that you are not told _how_ to do a "git clone".

> If you would like to suggest a patch, see the files README-hacking
> and HACKING for tips.

But this one doesn't mention anything about the files being git-only.

--=20
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


------enig2HISBXLPQQXSRRODLGKHA
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
Comment: Public key at http://people.redhat.com/eblake/eblake.gpg
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBCAAGBQJRby8zAAoJEKeha0olJ0NqwAQIAIPXFGsblAU2GE/SV7BG5vIW
WWmhafWj8fFiBl2VeYfNifFz1VrWh7KQEiqohHUt7AlhgQ3Ws5C7nrB5PnX7DmPJ
9qgP+FaGofBcEZbJEYQT903S/G5auOkN31dCKAihMKfiCE+prhb2f5mzYVxnDIkU
Zw2LdUbE8RH9sZaaGoXFxmnk1/NRCPyBIhy7RYfybPL+I4BNZIT++mQd0rL0zvm4
hQ94cVRaBnIdHdn2amWwTNMxiUExcKlNL2TtxMIchshhw31ioXXuifW2Q+rIFLYa
GJh6mj/RY/q9TntImJk6oPBSPwFxZFW/ZKwqSxic6+a4cfkn5oi6e+9rXLo8yPA=
=HAlz
-----END PGP SIGNATURE-----

------enig2HISBXLPQQXSRRODLGKHA--




Information forwarded to bug-coreutils@HIDDEN:
bug#14224; Package coreutils. Full text available.

Message received at 14224 <at> debbugs.gnu.org:


Received: (at 14224) by debbugs.gnu.org; 17 Apr 2013 23:13:44 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Apr 17 19:13:44 2013
Received: from localhost ([127.0.0.1]:57400 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1USbXr-0004kt-Rk
	for submit <at> debbugs.gnu.org; Wed, 17 Apr 2013 19:13:44 -0400
Received: from joseki.proulx.com ([216.17.153.58]:40402)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <bob@HIDDEN>)
	id 1USbXn-0004kd-OE; Wed, 17 Apr 2013 19:13:42 -0400
Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119])
	by joseki.proulx.com (Postfix) with ESMTP id 62E6F211DF;
	Wed, 17 Apr 2013 17:09:13 -0600 (MDT)
Received: by hysteria.proulx.com (Postfix, from userid 1000)
	id 270BC2DCE3; Wed, 17 Apr 2013 17:09:13 -0600 (MDT)
Date: Wed, 17 Apr 2013 17:09:13 -0600
From: Bob Proulx <bob@HIDDEN>
To: George Brink <siberianowl@HIDDEN>
Subject: Re: bug#14224: Feature request for the `cut`: record delimiter
Message-ID: <20130417230913.GA19399@HIDDEN>
References: <CAGXyeuhDGzByME80xwOkO5BtyepMexKUNgrP8rxv025-oQfH6w@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAGXyeuhDGzByME80xwOkO5BtyepMexKUNgrP8rxv025-oQfH6w@HIDDEN>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Score: 0.1 (/)
X-Debbugs-Envelope-To: 14224
Cc: 14224 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -2.6 (--)

severity 14224 wishlist
thanks

George Brink wrote:
> I have a task of extracting several "fields" from the text file. The
> standard `cut` tool could be a perfect tool for a job, but...

Thank you for the bug report.  However note that 'cut' is often not
the right tool for the job.  Almost always when people want more than
cut offers it is revealed that they should be using awk or other tool.

> In my file the '\n' character is a legal symbol inside fields and therefore
> the text file uses other symbol for record-separator.

Then it isn't a text file.  By definition.

  http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap03.html

    3.392 Text File

    A file that contains characters organized into one or more lines.
    The lines do not contain NUL characters and none can exceed
    {LINE_MAX} bytes in length, including the <newline>.  Although IEEE
    Std 1003.1-2001 does not distinguish between text files and binary
    files (see the ISO C standard), many utilities only produce
    predictable or meaningful output when operating on text files.  The
    standard utilities that have such restrictions always specify "text
    files" in their STDIN or INPUT FILES sections.

  http://pubs.opengroup.org/onlinepubs/009695399/utilities/cut.html

  INPUT FILES

    The input files shall be text files, except that line lengths
    shall be unlimited.

Of course GNU isn't Unix (nor POSIX) and we can extend them usefully
if it makes sense to do so.  However creeping featurism is the Evil
and therefore will need discussion and justification.

Could you please give a discription of your input syntax in more
detail?  Usually people will suggest a better tool for the job and
that often solves the problem immediately.

> The fix for this should be a simple one. I can probably make it
> myself but where to send the patch?

Since it isn't a bug then it isn't a "fix".  It would be an enhancement.
I have set the bug severity appropriately.

> The README in coreutils suggests to read README-hacking and HACKING for
> guide-lines on making a patch, but there are no such files in the the
> coreutils-8.21.tar.xz.

Anyone working on the source code is expected to be working from the
version control files.  Because the pace of change is rapid and doing
so just makes it easier all around.

Here is the current HACKING file from the vcs online web frontend.

  http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=HACKING;hb=HEAD

Please read through that document.  It should give you all of the
information you need to submit patches to the project.  Be sure to
read the "Copyright assignment" section so that it doesn't come as a
surprise later after a lot of work has been put into it.  Any
non-trivial contribution needs an assignment and it is good to get
that started early.

If you have any questions please ask them.  Since this bug is already
created it is okay to follow-up with questions here.  Please keep the
bug log address in the recipient list.

But if you are asking questions or generating random discussion then
please use the coreutils@HIDDEN mailing list instead of the bug
tracker.  We often spend a lot of time closing bug reports that are
doing nothing but asking questions.

Bob




Information forwarded to bug-coreutils@HIDDEN:
bug#14224; Package coreutils. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 17 Apr 2013 22:40:01 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Apr 17 18:40:01 2013
Received: from localhost ([127.0.0.1]:57359 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1USb1E-0003oI-NG
	for submit <at> debbugs.gnu.org; Wed, 17 Apr 2013 18:40:01 -0400
Received: from eggs.gnu.org ([208.118.235.92]:49662)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <siberianowl@HIDDEN>) id 1USZwJ-0001y2-0E
	for submit <at> debbugs.gnu.org; Wed, 17 Apr 2013 17:30:51 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <siberianowl@HIDDEN>) id 1USZrz-0006ha-1J
	for submit <at> debbugs.gnu.org; Wed, 17 Apr 2013 17:26:25 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM,
	HTML_MESSAGE,T_DKIM_INVALID autolearn=disabled version=3.3.2
Received: from lists.gnu.org ([208.118.235.17]:41504)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <siberianowl@HIDDEN>) id 1USZry-0006hU-Ui
	for submit <at> debbugs.gnu.org; Wed, 17 Apr 2013 17:26:22 -0400
Received: from eggs.gnu.org ([208.118.235.92]:32769)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <siberianowl@HIDDEN>) id 1USZrw-0005Zd-1A
	for bug-coreutils@HIDDEN; Wed, 17 Apr 2013 17:26:22 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <siberianowl@HIDDEN>) id 1USZrt-0006e3-H2
	for bug-coreutils@HIDDEN; Wed, 17 Apr 2013 17:26:19 -0400
Received: from mail-wi0-x22d.google.com ([2a00:1450:400c:c05::22d]:33491)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <siberianowl@HIDDEN>) id 1USZrt-0006ct-AQ
	for bug-coreutils@HIDDEN; Wed, 17 Apr 2013 17:26:17 -0400
Received: by mail-wi0-f173.google.com with SMTP id c10so905958wiw.12
	for <bug-coreutils@HIDDEN>; Wed, 17 Apr 2013 14:26:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:x-received:date:message-id:subject:from:to
	:content-type; bh=9ML1Foz8o9dBQS1b688Usp0W4clHt/eGXESX4NS5YzM=;
	b=Vc5kf+1O8tETplta1kcf4NwZIuuvxyfti3EAhUfS+7/A8Hrz/SwYF0zIkb6stDcKVS
	kWq+Kuz/YTqNx7NxIigrq6PR6Lkt7qK1kICI4Tk5FortHdnaYTsEHSorjNf7lxDrwfFT
	8Bv5U/A/3OUdxpttwnrIff7ai4zdmdJmIdIwXnfGLEgXEZlUYcWszcb+TegjhHCKM3sx
	jWeXtbazl5x9UGPm8jKliTq/pCyhsMpmLsZ1JCc0xkilVTUKUjJw4e1yQ/Qd1ZiD77VA
	yQqkBCpxVQ95GFNzxh1Hm7Ge311vfGziLRVx6WnxGv3jWC1s+NuuzZ28yDEeJ7yEu/Ed
	knag==
MIME-Version: 1.0
X-Received: by 10.180.109.197 with SMTP id hu5mr13940494wib.22.1366233976090; 
	Wed, 17 Apr 2013 14:26:16 -0700 (PDT)
Received: by 10.194.55.4 with HTTP; Wed, 17 Apr 2013 14:26:16 -0700 (PDT)
Date: Wed, 17 Apr 2013 17:26:16 -0400
Message-ID: <CAGXyeuhDGzByME80xwOkO5BtyepMexKUNgrP8rxv025-oQfH6w@HIDDEN>
Subject: Feature request for the `cut`: record delimiter
From: George Brink <siberianowl@HIDDEN>
To: bug-coreutils@HIDDEN
Content-Type: multipart/alternative; boundary=e89a8f2356f7d335ee04da952286
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
	(bad octet value).
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 208.118.235.17
X-Spam-Score: -3.4 (---)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Wed, 17 Apr 2013 18:40:00 -0400
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -6.1 (------)

--e89a8f2356f7d335ee04da952286
Content-Type: text/plain; charset=ISO-8859-1

Hello,

I have a task of extracting several "fields" from the text file. The
standard `cut` tool could be a perfect tool for a job, but...
In my file the '\n' character is a legal symbol inside fields and therefore
the text file uses other symbol for record-separator. And the `cut` has a
hard-coded '\n' for record separator (I just checked the source from the
coreutils-8.21 package).
The fix for this should be a simple one. I can probably make it myself  but
where to send the patch?
The README in coreutils suggests to read README-hacking and HACKING for
guide-lines on making a patch, but there are no such files in the the
coreutils-8.21.tar.xz.

--e89a8f2356f7d335ee04da952286
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div><div>Hello,<br><br></div>I have a task of extrac=
ting several &quot;fields&quot; from the text file. The standard `cut` tool=
 could be a perfect tool for a job, but...<br>In my file the &#39;\n&#39; c=
haracter is a legal symbol inside fields and therefore the text file uses o=
ther symbol for record-separator. And the `cut` has a hard-coded &#39;\n&#3=
9; for record separator (I just checked the source from the coreutils-8.21 =
package).<br>
</div>The fix for this should be a simple one. I can probably make it mysel=
f=A0 but where to send the patch?<br></div><div>The README in coreutils sug=
gests to read README-hacking and HACKING for guide-lines on making a patch,=
 but there are no such files in the the coreutils-8.21.tar.xz.<br>
<br></div><br></div>

--e89a8f2356f7d335ee04da952286--




Acknowledgement sent to George Brink <siberianowl@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-coreutils@HIDDEN. Full text available.
Report forwarded to bug-coreutils@HIDDEN:
bug#14224; Package coreutils. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Fri, 31 Oct 2014 17:00:04 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.