Received: (at 14224) by debbugs.gnu.org; 18 Apr 2013 19:03:04 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Apr 18 15:03:04 2013 Received: from localhost ([127.0.0.1]:58989 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1USu6p-0004I3-R0 for submit <at> debbugs.gnu.org; Thu, 18 Apr 2013 15:03:04 -0400 Received: from joseki.proulx.com ([216.17.153.58]:45880) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from <bob@HIDDEN>) id 1USu6n-0004Hf-Oh for 14224 <at> debbugs.gnu.org; Thu, 18 Apr 2013 15:03:02 -0400 Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119]) by joseki.proulx.com (Postfix) with ESMTP id 8E173211DA; Thu, 18 Apr 2013 12:58:31 -0600 (MDT) Received: by hysteria.proulx.com (Postfix, from userid 1000) id 52B952DCE3; Thu, 18 Apr 2013 12:58:31 -0600 (MDT) Date: Thu, 18 Apr 2013 12:58:31 -0600 From: Bob Proulx <bob@HIDDEN> To: 14224 <at> debbugs.gnu.org Subject: Re: bug#14224: Feature request for the `cut`: record delimiter Message-ID: <20130418185831.GD8048@HIDDEN> References: <CAGXyeuhDGzByME80xwOkO5BtyepMexKUNgrP8rxv025-oQfH6w@HIDDEN> <20130417230913.GA19399@HIDDEN> <516F2F33.3070508@HIDDEN> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <516F2F33.3070508@HIDDEN> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 14224 Cc: George Brink <siberianowl@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Sender: debbugs-submit-bounces <at> debbugs.gnu.org Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org X-Spam-Score: -2.6 (--) Eric Blake wrote: > Should we patch README to include this URL to current HACKING contents, > since we don't ship HACKING in our tarballs? Or, should we reconsider > our position and start shipping HACKING in the tarballs? Of the > statements currently in README: > > > If you obtained this file as part of a "git clone", then see the > > README-hacking file. If this file came to you as part of a tar archive, > > then see the file INSTALL for compilation and installation instructions. > > This one makes sense (HACKING won't be present unless you are working > from git), except that you are not told _how_ to do a "git clone". > > > If you would like to suggest a patch, see the files README-hacking > > and HACKING for tips. > > But this one doesn't mention anything about the files being git-only. I think it would definitely make sense to include some information about the preferred method of getting the source in the main README file. That file is usually the one included in downstream distributions. It would enable people to bootstrap themselves to the source. And GNU is all about access to the source. So I think that would make a lot of sense. Bob
bug-coreutils@HIDDEN
:bug#14224
; Package coreutils
.
Full text available.Received: (at 14224) by debbugs.gnu.org; 18 Apr 2013 19:00:56 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Apr 18 15:00:56 2013 Received: from localhost ([127.0.0.1]:58983 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1USu4l-0004Cb-NC for submit <at> debbugs.gnu.org; Thu, 18 Apr 2013 15:00:56 -0400 Received: from joseki.proulx.com ([216.17.153.58]:45868) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from <bob@HIDDEN>) id 1USu4i-0004CQ-AP for 14224 <at> debbugs.gnu.org; Thu, 18 Apr 2013 15:00:53 -0400 Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119]) by joseki.proulx.com (Postfix) with ESMTP id 72D7B211DA; Thu, 18 Apr 2013 12:56:21 -0600 (MDT) Received: by hysteria.proulx.com (Postfix, from userid 1000) id 463932DCE3; Thu, 18 Apr 2013 12:56:21 -0600 (MDT) Date: Thu, 18 Apr 2013 12:56:21 -0600 From: Bob Proulx <bob@HIDDEN> To: George Brink <siberianowl@HIDDEN> Subject: Re: bug#14224: Feature request for the `cut`: record delimiter Message-ID: <20130418185621.GC8048@HIDDEN> References: <CAGXyeuhDGzByME80xwOkO5BtyepMexKUNgrP8rxv025-oQfH6w@HIDDEN> <516F48CD.5000502@HIDDEN> <CAGXyeujSP03XA9XQGVvRV-zFLPTaH5nNQZGMuqXekA8eOr=hoQ@HIDDEN> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <CAGXyeujSP03XA9XQGVvRV-zFLPTaH5nNQZGMuqXekA8eOr=hoQ@HIDDEN> User-Agent: Mutt/1.5.21 (2010-09-15) Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.1 (/) X-Debbugs-Envelope-To: 14224 Cc: 14224 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Sender: debbugs-submit-bounces <at> debbugs.gnu.org Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org X-Spam-Score: -2.6 (--) George Brink wrote: > Actually I just found yet another way to solve my problem: > perl -0002 -F"\001" -an -e "print((join \"\001\", @F[0..2,14..46]), \"\= 002\");" data.dat >new_data.dat > It works fine, I was thinking of Perl's -0 option when I asked if you would say a few words about the file and task. But since you had described it yet I was hesitant to suggest it. > but I am a little concerned of the speed. I have over three > hundreds of such files, from 3Mb to 30Mb each. And this process should = be > run every day... I thought that by using cut (which just looks for > delimiters) I can gain a few minutes on the whole process. I always recommend benchmarking before optimizing. Knuth is quoted as "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil". Don't forget programmer productivity either. You might shave 10% off of something now but making it imcomprehensible to future admin maintainers who need to understand it later. Simply upgrading the hardware might give a 50% increase in performance. In which case I would leave the algorithm simple and more easily understand and not worry about the performance. Simple and easy to understand is better than raw speed. > Bob, > I understand your desire to receive a discussion of features not inside= the > bug related mail list, but here is a extract from the README: > > Mail suggestions and bug reports for these programs to > > the address on the last line of --help output. > And guess what, the `cut --help` has the bug-coreutils email in the las= t > line! The coreutils email is not mentioned inside README at all. And > bug-coreutils is mentioned several times in different context. > I apologize for using this mail-list inappropriately, but I did not kno= w > about any other mail-lists As P=E1draig said, no worries. I didn't mean it to sound mean or snarky. But I can see that my last sentence did come out that way. Sorry. But if I didn't say anything then you wouldn't have said anything and then we wouldn't have been reminded that the contact address hadn't been updated in your version. So it ended well. The way to get the word out is by continuing to talk about it. If people even just read it in passing then they might be informed for the future. Bob
bug-coreutils@HIDDEN
:bug#14224
; Package coreutils
.
Full text available.Received: (at 14224) by debbugs.gnu.org; 18 Apr 2013 17:16:34 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Apr 18 13:16:34 2013 Received: from localhost ([127.0.0.1]:58843 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1USsRl-00006u-K5 for submit <at> debbugs.gnu.org; Thu, 18 Apr 2013 13:16:33 -0400 Received: from mail-wi0-f179.google.com ([209.85.212.179]:38441) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from <siberianowl@HIDDEN>) id 1USsRj-00006m-Mx for 14224 <at> debbugs.gnu.org; Thu, 18 Apr 2013 13:16:32 -0400 Received: by mail-wi0-f179.google.com with SMTP id l13so2826497wie.12 for <14224 <at> debbugs.gnu.org>; Thu, 18 Apr 2013 10:12:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=mpHV6S5XSSz+mVR6FnnTvxkPAB1VHogTz+VtLI7oLLY=; b=T2D/uadwkLj2XS0zNebK6GRxRiUikEYJf6mofsvB6WYAXoQZ++mmytenapiwFirtpe iHRo3fY4yMEmnK1WoXqtW4BNlynH5DFpijtFa1LniDtkAsEyQ/VYRkoInGcUw5qIp4JG +GeJq3OniYcYOiXHRDTEX3wUE2VR7dd96MQ2jrOJnr9qSSxDd4ohjcQX73RgEfeZiSNk KEeVX8foOWeymZMK6jZ5BOHtMZvWWO+ShCfp5/8Lk7liSHu4y4ZNpHRDoG2lfEBBYM9o 4v33EQzeWLByLZiPyCZhZY4UvTA+btzYHlS6B3SbQbDr+nfYMolWBx6KOCniPkqR+FcK UKaw== MIME-Version: 1.0 X-Received: by 10.194.122.166 with SMTP id lt6mr7190917wjb.14.1366305121508; Thu, 18 Apr 2013 10:12:01 -0700 (PDT) Received: by 10.194.55.4 with HTTP; Thu, 18 Apr 2013 10:12:01 -0700 (PDT) In-Reply-To: <51701CD6.6050006@HIDDEN> References: <CAGXyeuhDGzByME80xwOkO5BtyepMexKUNgrP8rxv025-oQfH6w@HIDDEN> <516F48CD.5000502@HIDDEN> <CAGXyeujSP03XA9XQGVvRV-zFLPTaH5nNQZGMuqXekA8eOr=hoQ@HIDDEN> <51701CD6.6050006@HIDDEN> Date: Thu, 18 Apr 2013 13:12:01 -0400 Message-ID: <CAGXyeugw-kMy9ArnP28Hs+Q35Q5s1uZf-UWWbaEx9FZMrskUZQ@HIDDEN> Subject: Re: bug#14224: Feature request for the `cut`: record delimiter From: George Brink <siberianowl@HIDDEN> To: =?ISO-8859-1?Q?P=E1draig_Brady?= <P@HIDDEN> Content-Type: multipart/alternative; boundary=089e01227ed86c229d04daa5b36a X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 14224 Cc: 14224 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Sender: debbugs-submit-bounces <at> debbugs.gnu.org Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org X-Spam-Score: -2.6 (--) --089e01227ed86c229d04daa5b36a Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Thu, Apr 18, 2013 at 12:18 PM, P=C3=A1draig Brady <P@HIDDEN> wro= te: > > awk is often suggested too as an alternative to cut. > No, I looked at awk, but it does not have a convenient way to specify lists of printed fields. awk -e "BEGIN{FS=3D"=E2=98=BA"; RS=3D"=E2=98=BB"; OFS=3DFS; ORS=3DRS;}; {pr= int $1,$2,$3,$15,$16,$17 ??? ) } You got the picture... It is possible to repeat a cut in awk (and documentation for awk does show how), but this would be a creation of an external application, not a one-liner with a tool from the box. --089e01227ed86c229d04daa5b36a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">= On Thu, Apr 18, 2013 at 12:18 PM, P=C3=A1draig Brady <span dir=3D"ltr"><= <a href=3D"mailto:P@HIDDEN" target=3D"_blank">P@HIDDEN</a>&= gt;</span> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"><br> awk is often suggested too as an alternative to cut.<br></blockquote><div>N= o, I looked at awk, but it does not have a convenient way to specify lists = of printed fields.<br></div><div>awk -e "BEGIN{FS=3D"=E2=98=BA&qu= ot;; RS=3D"=E2=98=BB"; OFS=3DFS; ORS=3DRS;}; {print $1,$2,$3,$15,= $16,$17 ??? ) }<br> </div><div>You got the picture...<br></div><div>It is possible to repeat a = cut in awk (and documentation for awk does show how), but this would be a c= reation of an external application, not a one-liner with a tool from the bo= x.<br> <br></div></div><br></div></div> --089e01227ed86c229d04daa5b36a--
bug-coreutils@HIDDEN
:bug#14224
; Package coreutils
.
Full text available.Received: (at 14224) by debbugs.gnu.org; 18 Apr 2013 16:23:10 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Apr 18 12:23:10 2013 Received: from localhost ([127.0.0.1]:58799 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1USrc5-0006Rc-Pq for submit <at> debbugs.gnu.org; Thu, 18 Apr 2013 12:23:10 -0400 Received: from mx1.redhat.com ([209.132.183.28]:19551) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from <P@HIDDEN>) id 1USrc2-0006RR-Qw for 14224 <at> debbugs.gnu.org; Thu, 18 Apr 2013 12:23:08 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r3IGIafP027771 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 18 Apr 2013 12:18:36 -0400 Received: from [10.36.116.75] (ovpn-116-75.ams2.redhat.com [10.36.116.75]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r3IGIV8l014627 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 18 Apr 2013 12:18:34 -0400 Message-ID: <51701CD6.6050006@HIDDEN> Date: Thu, 18 Apr 2013 09:18:30 -0700 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= <P@HIDDEN> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 MIME-Version: 1.0 To: George Brink <siberianowl@HIDDEN> Subject: Re: bug#14224: Feature request for the `cut`: record delimiter References: <CAGXyeuhDGzByME80xwOkO5BtyepMexKUNgrP8rxv025-oQfH6w@HIDDEN> <516F48CD.5000502@HIDDEN> <CAGXyeujSP03XA9XQGVvRV-zFLPTaH5nNQZGMuqXekA8eOr=hoQ@HIDDEN> In-Reply-To: <CAGXyeujSP03XA9XQGVvRV-zFLPTaH5nNQZGMuqXekA8eOr=hoQ@HIDDEN> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=UTF-8 X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id r3IGIafP027771 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: 14224 Cc: 14224 <at> debbugs.gnu.org, Bob Proulx <bob@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Sender: debbugs-submit-bounces <at> debbugs.gnu.org Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org X-Spam-Score: -6.9 (------) On 04/18/2013 08:41 AM, George Brink wrote: > On Wed, Apr 17, 2013 at 9:13 PM, P=C3=A1draig Brady <P@HIDDEN> = wrote: >=20 >> On 04/17/2013 02:26 PM, George Brink wrote: >>> Hello, >>> >>> I have a task of extracting several "fields" from the text file. The >>> standard `cut` tool could be a perfect tool for a job, but... >>> In my file the '\n' character is a legal symbol inside fields and >> therefore >>> the text file uses other symbol for record-separator. And the `cut` h= as a >>> hard-coded '\n' for record separator (I just checked the source from = the >>> coreutils-8.21 package). >> >> The patch would be simple but not without compatibility cost. >> I.E. scripts using this would immediately become incompatible >> with any systems without this feature. >> >> So you'd like something like tac -s, --separator >> However cut -s is taken, so we'd have to avoid the short -s at least. >> Also tac -s takes a string rather than a character, so >> that gives some extra credence (and complexity) to that option there. >> >> Also related would be to support the -z, --zero-terminated option. >> join, sort and uniq all have this option to use NUL as the record >> separator, >> however they're all closely related sort dependent utilities >> and we're trying to unify options between them. >> >> If it is just a character you want to separate on, >> then you can always use tr to convert before processing, >> albeit with associated data copying overhead. >> >> SEP=3D^ >> tr "$SEP"'\n' '\n'"$SEP" | cut ... | tr "$SEP"'\n' '\n'"$SEP" >> >> So given that cut is not special here among the text filters, >> and there is a workaround available, I'm 60:40 against >> adding this feature. >> >> thanks, >> P=C3=A1draig. >> >=20 > P=C3=A1draig, > > Thank you for alternative suggestions. > Actually I just found yet another way to solve my problem: > perl -0002 -F"\001" -an -e "print((join \"\001\", @F[0..2,14..46]), > \"\002\");" data.dat >new_data.dat > It works fine, but I am a little concerned of the speed. I have over th= ree > hundreds of such files, from 3Mb to 30Mb each. And this process should = be > run every day... I thought that by using cut (which just looks for > delimiters) I can gain a few minutes on the whole process. > > Originally I though of adding "-r, --record-delimiter=3DDELIM" and > "--output-record-delimiter=3DDELIM: keys to the cut. > Then the example above could be done with > cut -d=E2=98=BA -r=E2=98=BB --output-delimiter=3D=E2=98=BA --output-rec= ord-delimiter=3D=E2=98=BB -f1-3,15-47 > data.dat >new_data.dat > I think it is feasible and would be more convenient (and hopefully fast= er) > than using a whole perl or two calls to tr. Yes they're the tradeoffs. awk is often suggested too as an alternative to cut. > Bob, > I understand your desire to receive a discussion of features not inside= the > bug related mail list, but here is a extract from the README: >> Mail suggestions and bug reports for these programs to >> the address on the last line of --help output. > And guess what, the `cut --help` has the bug-coreutils email in the las= t > line! The coreutils email is not mentioned inside README at all. And > bug-coreutils is mentioned several times in different context. > I apologize for using this mail-list inappropriately, but I did not kno= w > about any other mail-lists No worries. I saw no issue with your mails. In future cut --help will just point at the following URL which hopefully is easier to follow: http://www.gnu.org/software/coreutils/ thanks, P=C3=A1draig.
bug-coreutils@HIDDEN
:bug#14224
; Package coreutils
.
Full text available.Received: (at 14224) by debbugs.gnu.org; 18 Apr 2013 15:45:51 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Apr 18 11:45:51 2013 Received: from localhost ([127.0.0.1]:58738 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1USr1y-0005F5-UH for submit <at> debbugs.gnu.org; Thu, 18 Apr 2013 11:45:51 -0400 Received: from mail-we0-f171.google.com ([74.125.82.171]:46285) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from <siberianowl@HIDDEN>) id 1USr1v-0005Ev-Hk for 14224 <at> debbugs.gnu.org; Thu, 18 Apr 2013 11:45:49 -0400 Received: by mail-we0-f171.google.com with SMTP id i48so2547354wef.16 for <14224 <at> debbugs.gnu.org>; Thu, 18 Apr 2013 08:41:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=vIuVDwxmPuPdEV1KbnG2B+sAAmhrGYYkI37xbgEK4WI=; b=B4eL/3uUPwLJiiKKhp2W/VgDzF4pdGMjc51H/7RiTYPEmZ/u2PtBfGV1m9cMBq1Wns dlsXshPhGfTyZUXt5oH6/Yik7jpo3kdkqv4iIvWDMhmMGAuMvvX452f7w3bKGxhLjOvj /OnuqXYxUkKivCQRJyfKg/S0/bT/kENUzHNir4+/mhGrwVeKQ7RXGxTl4EFPioZJBcwX Vm1sXpHjy8t7i84QCyaUAgh3jMtfkvK50LM1HFoDenbie4gp9swm1JaIIlEEpbkHmDAt i8dT3IyMT341EXHNuCGJF/GxEF+KmYK3/ezrij8yadEBpDqtrzEPSoJSea1O9rxx5AaN 8zSQ== MIME-Version: 1.0 X-Received: by 10.194.122.166 with SMTP id lt6mr6587820wjb.14.1366299677739; Thu, 18 Apr 2013 08:41:17 -0700 (PDT) Received: by 10.194.55.4 with HTTP; Thu, 18 Apr 2013 08:41:17 -0700 (PDT) In-Reply-To: <516F48CD.5000502@HIDDEN> References: <CAGXyeuhDGzByME80xwOkO5BtyepMexKUNgrP8rxv025-oQfH6w@HIDDEN> <516F48CD.5000502@HIDDEN> Date: Thu, 18 Apr 2013 11:41:17 -0400 Message-ID: <CAGXyeujSP03XA9XQGVvRV-zFLPTaH5nNQZGMuqXekA8eOr=hoQ@HIDDEN> Subject: Re: bug#14224: Feature request for the `cut`: record delimiter From: George Brink <siberianowl@HIDDEN> To: =?ISO-8859-1?Q?P=E1draig_Brady?= <P@HIDDEN>, Bob Proulx <bob@HIDDEN> Content-Type: multipart/alternative; boundary=089e01227ed8f2cf3204daa46e9e X-Spam-Score: -2.6 (--) X-Debbugs-Envelope-To: 14224 Cc: 14224 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Sender: debbugs-submit-bounces <at> debbugs.gnu.org Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org X-Spam-Score: -2.6 (--) --089e01227ed8f2cf3204daa46e9e Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable P=C3=A1draig, Thank you for alternative suggestions. Actually I just found yet another way to solve my problem: perl -0002 -F"\001" -an -e "print((join \"\001\", @F[0..2,14..46]), \"\002\");" data.dat >new_data.dat It works fine, but I am a little concerned of the speed. I have over three hundreds of such files, from 3Mb to 30Mb each. And this process should be run every day... I thought that by using cut (which just looks for delimiters) I can gain a few minutes on the whole process. Originally I though of adding "-r, --record-delimiter=3DDELIM" and "--output-record-delimiter=3DDELIM: keys to the cut. Then the example above could be done with cut -d=E2=98=BA -r=E2=98=BB --output-delimiter=3D=E2=98=BA --output-record-= delimiter=3D=E2=98=BB -f1-3,15-47 data.dat >new_data.dat I think it is feasible and would be more convenient (and hopefully faster) than using a whole perl or two calls to tr. Bob, I understand your desire to receive a discussion of features not inside the bug related mail list, but here is a extract from the README: > Mail suggestions and bug reports for these programs to > the address on the last line of --help output. And guess what, the `cut --help` has the bug-coreutils email in the last line! The coreutils email is not mentioned inside README at all. And bug-coreutils is mentioned several times in different context. I apologize for using this mail-list inappropriately, but I did not know about any other mail-lists On Wed, Apr 17, 2013 at 9:13 PM, P=C3=A1draig Brady <P@HIDDEN> wrot= e: > On 04/17/2013 02:26 PM, George Brink wrote: > > Hello, > > > > I have a task of extracting several "fields" from the text file. The > > standard `cut` tool could be a perfect tool for a job, but... > > In my file the '\n' character is a legal symbol inside fields and > therefore > > the text file uses other symbol for record-separator. And the `cut` has= a > > hard-coded '\n' for record separator (I just checked the source from th= e > > coreutils-8.21 package). > > The patch would be simple but not without compatibility cost. > I.E. scripts using this would immediately become incompatible > with any systems without this feature. > > So you'd like something like tac -s, --separator > However cut -s is taken, so we'd have to avoid the short -s at least. > Also tac -s takes a string rather than a character, so > that gives some extra credence (and complexity) to that option there. > > Also related would be to support the -z, --zero-terminated option. > join, sort and uniq all have this option to use NUL as the record > separator, > however they're all closely related sort dependent utilities > and we're trying to unify options between them. > > If it is just a character you want to separate on, > then you can always use tr to convert before processing, > albeit with associated data copying overhead. > > SEP=3D^ > tr "$SEP"'\n' '\n'"$SEP" | cut ... | tr "$SEP"'\n' '\n'"$SEP" > > So given that cut is not special here among the text filters, > and there is a workaround available, I'm 60:40 against > adding this feature. > > thanks, > P=C3=A1draig. > --089e01227ed8f2cf3204daa46e9e Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div><div><div>P=C3=A1draig,<br><br>Thank you for alternat= ive suggestions.<br></div>Actually I just found yet another way to solve my= problem:<br></div><div></div>perl -0002 -F"\001" -an -e "pr= int((join \"\001\", @F[0..2,14..46]), \"\002\");" = data.dat >new_data.dat<br> </div><div>It works fine, but I am a little concerned of the speed. I have = over three hundreds of such files, from 3Mb to 30Mb each. And this process = should be run every day... I thought that by using cut (which just looks fo= r delimiters) I can gain a few minutes on the whole process.<br> </div><div><br></div><div>Originally I though of adding "-r, --record-= delimiter=3DDELIM" and "--output-record-delimiter=3DDELIM: keys t= o the cut.<br></div><div>Then the example above could be done with<br></div= > <div>cut -d=E2=98=BA -r=E2=98=BB --output-delimiter=3D=E2=98=BA --output-re= cord-delimiter=3D=E2=98=BB -f1-3,15-47 data.dat >new_data.dat<br></div><= div>I think it is feasible and would be more convenient (and hopefully fast= er) than using a whole perl or two calls to tr.<br> <br><br><br><br></div><div>Bob,<br></div><div></div><div>I understand your = desire to receive a discussion of features not inside the bug related mail = list, but here is a extract from the README:<br>> Mail suggestions and b= ug reports for these programs to<br> </div><div>> the address on the last line of --help output.<br></div><di= v>And guess what, the `cut --help` has the bug-coreutils email in the last = line! The coreutils email is not mentioned inside README at all. And bug-co= reutils is mentioned several times in different context.<br> </div><div>I apologize for using this mail-list inappropriately, but I did = not know about any other mail-lists<br></div><div><br></div></div><div clas= s=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Wed, Apr 17, 2013 a= t 9:13 PM, P=C3=A1draig Brady <span dir=3D"ltr"><<a href=3D"mailto:P@dra= igbrady.com" target=3D"_blank">P@HIDDEN</a>></span> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex">On 04/17/2013 02:26 PM, George Brink wrote:<= br> > Hello,<br> <div class=3D"im">><br> > I have a task of extracting several "fields" from the text f= ile. The<br> > standard `cut` tool could be a perfect tool for a job, but...<br> </div><div class=3D"im">> In my file the '\n' character is a leg= al symbol inside fields and therefore<br> </div>> the text file uses other symbol for record-separator. And the `c= ut` has a<br> > hard-coded '\n' for record separator (I just checked the sourc= e from the<br> > coreutils-8.21 package).<br> <br> The patch would be simple but not without compatibility cost.<br> I.E. scripts using this would immediately become incompatible<br> with any systems without this feature.<br> <br> So you'd like something like tac -s, --separator<br> However cut -s is taken, so we'd have to avoid the short -s at least.<b= r> Also tac -s takes a string rather than a character, so<br> that gives some extra credence (and complexity) to that option there.<br> <br> Also related would be to support the -z, --zero-terminated option.<br> join, sort and uniq all have this option to use NUL as the record separator= ,<br> however they're all closely related sort dependent utilities<br> and we're trying to unify options between them.<br> <br> If it is just a character you want to separate on,<br> then you can always use tr to convert before processing,<br> albeit with associated data copying overhead.<br> <br> SEP=3D^<br> tr "$SEP"'\n' '\n'"$SEP" | cut ... | tr= "$SEP"'\n' '\n'"$SEP"<br> <br> So given that cut is not special here among the text filters,<br> and there is a workaround available, I'm 60:40 against<br> adding this feature.<br> <br> thanks,<br> P=C3=A1draig.<br> </blockquote></div><br></div> --089e01227ed8f2cf3204daa46e9e--
bug-coreutils@HIDDEN
:bug#14224
; Package coreutils
.
Full text available.Received: (at 14224) by debbugs.gnu.org; 18 Apr 2013 01:18:34 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Apr 17 21:18:34 2013 Received: from localhost ([127.0.0.1]:57556 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1USdUf-00013B-9e for submit <at> debbugs.gnu.org; Wed, 17 Apr 2013 21:18:34 -0400 Received: from mx1.redhat.com ([209.132.183.28]:16588) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from <P@HIDDEN>) id 1USdUd-000133-5c for 14224 <at> debbugs.gnu.org; Wed, 17 Apr 2013 21:18:32 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r3I1E36w027027 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 17 Apr 2013 21:14:04 -0400 Received: from [10.36.116.20] (ovpn-116-20.ams2.redhat.com [10.36.116.20]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r3I1DxuB007019 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 17 Apr 2013 21:14:02 -0400 Message-ID: <516F48CD.5000502@HIDDEN> Date: Wed, 17 Apr 2013 18:13:49 -0700 From: =?ISO-8859-1?Q?P=E1draig_Brady?= <P@HIDDEN> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 MIME-Version: 1.0 To: George Brink <siberianowl@HIDDEN> Subject: Re: bug#14224: Feature request for the `cut`: record delimiter References: <CAGXyeuhDGzByME80xwOkO5BtyepMexKUNgrP8rxv025-oQfH6w@HIDDEN> In-Reply-To: <CAGXyeuhDGzByME80xwOkO5BtyepMexKUNgrP8rxv025-oQfH6w@HIDDEN> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id r3I1E36w027027 X-Spam-Score: -4.2 (----) X-Debbugs-Envelope-To: 14224 Cc: 14224 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Sender: debbugs-submit-bounces <at> debbugs.gnu.org Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org X-Spam-Score: -6.9 (------) On 04/17/2013 02:26 PM, George Brink wrote: > Hello, >=20 > I have a task of extracting several "fields" from the text file. The > standard `cut` tool could be a perfect tool for a job, but... > In my file the '\n' character is a legal symbol inside fields and there= fore > the text file uses other symbol for record-separator. And the `cut` has= a > hard-coded '\n' for record separator (I just checked the source from th= e > coreutils-8.21 package). The patch would be simple but not without compatibility cost. I.E. scripts using this would immediately become incompatible with any systems without this feature. So you'd like something like tac -s, --separator However cut -s is taken, so we'd have to avoid the short -s at least. Also tac -s takes a string rather than a character, so that gives some extra credence (and complexity) to that option there. Also related would be to support the -z, --zero-terminated option. join, sort and uniq all have this option to use NUL as the record separat= or, however they're all closely related sort dependent utilities and we're trying to unify options between them. If it is just a character you want to separate on, then you can always use tr to convert before processing, albeit with associated data copying overhead. SEP=3D^ tr "$SEP"'\n' '\n'"$SEP" | cut ... | tr "$SEP"'\n' '\n'"$SEP" So given that cut is not special here among the text filters, and there is a workaround available, I'm 60:40 against adding this feature. thanks, P=E1draig.
bug-coreutils@HIDDEN
:bug#14224
; Package coreutils
.
Full text available.Received: (at 14224) by debbugs.gnu.org; 17 Apr 2013 23:29:07 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Apr 17 19:29:06 2013 Received: from localhost ([127.0.0.1]:57422 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1USbmk-00066e-Jc for submit <at> debbugs.gnu.org; Wed, 17 Apr 2013 19:29:06 -0400 Received: from mx1.redhat.com ([209.132.183.28]:40933) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from <eblake@HIDDEN>) id 1USbmh-00066U-4Q for 14224 <at> debbugs.gnu.org; Wed, 17 Apr 2013 19:29:05 -0400 Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r3HNOajO010634 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 17 Apr 2013 19:24:36 -0400 Received: from [10.3.113.85] (ovpn-113-85.phx2.redhat.com [10.3.113.85]) by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id r3HNOZCY028587; Wed, 17 Apr 2013 19:24:35 -0400 Message-ID: <516F2F33.3070508@HIDDEN> Date: Wed, 17 Apr 2013 17:24:35 -0600 From: Eric Blake <eblake@HIDDEN> Organization: Red Hat, Inc. User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130402 Thunderbird/17.0.5 MIME-Version: 1.0 To: Bob Proulx <bob@HIDDEN> Subject: Re: bug#14224: Feature request for the `cut`: record delimiter References: <CAGXyeuhDGzByME80xwOkO5BtyepMexKUNgrP8rxv025-oQfH6w@HIDDEN> <20130417230913.GA19399@HIDDEN> In-Reply-To: <20130417230913.GA19399@HIDDEN> X-Enigmail-Version: 1.5.1 OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="----enig2HISBXLPQQXSRRODLGKHA" X-Scanned-By: MIMEDefang 2.67 on 10.5.11.11 X-Spam-Score: -5.7 (-----) X-Debbugs-Envelope-To: 14224 Cc: 14224 <at> debbugs.gnu.org, George Brink <siberianowl@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Sender: debbugs-submit-bounces <at> debbugs.gnu.org Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org X-Spam-Score: -7.6 (-------) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) ------enig2HISBXLPQQXSRRODLGKHA Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 04/17/2013 05:09 PM, Bob Proulx wrote: In addition to Bob's (highly useful!) comments, >> The README in coreutils suggests to read README-hacking and HACKING fo= r >> guide-lines on making a patch, but there are no such files in the the >> coreutils-8.21.tar.xz. >=20 > Anyone working on the source code is expected to be working from the > version control files. Because the pace of change is rapid and doing > so just makes it easier all around. >=20 > Here is the current HACKING file from the vcs online web frontend. >=20 > http://git.savannah.gnu.org/gitweb/?p=3Dcoreutils.git;a=3Dblob;f=3DHA= CKING;hb=3DHEAD Should we patch README to include this URL to current HACKING contents, since we don't ship HACKING in our tarballs? Or, should we reconsider our position and start shipping HACKING in the tarballs? Of the statements currently in README: > If you obtained this file as part of a "git clone", then see the > README-hacking file. If this file came to you as part of a tar archive= , > then see the file INSTALL for compilation and installation instructions= =2E This one makes sense (HACKING won't be present unless you are working from git), except that you are not told _how_ to do a "git clone". > If you would like to suggest a patch, see the files README-hacking > and HACKING for tips. But this one doesn't mention anything about the files being git-only. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org ------enig2HISBXLPQQXSRRODLGKHA Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJRby8zAAoJEKeha0olJ0NqwAQIAIPXFGsblAU2GE/SV7BG5vIW WWmhafWj8fFiBl2VeYfNifFz1VrWh7KQEiqohHUt7AlhgQ3Ws5C7nrB5PnX7DmPJ 9qgP+FaGofBcEZbJEYQT903S/G5auOkN31dCKAihMKfiCE+prhb2f5mzYVxnDIkU Zw2LdUbE8RH9sZaaGoXFxmnk1/NRCPyBIhy7RYfybPL+I4BNZIT++mQd0rL0zvm4 hQ94cVRaBnIdHdn2amWwTNMxiUExcKlNL2TtxMIchshhw31ioXXuifW2Q+rIFLYa GJh6mj/RY/q9TntImJk6oPBSPwFxZFW/ZKwqSxic6+a4cfkn5oi6e+9rXLo8yPA= =HAlz -----END PGP SIGNATURE----- ------enig2HISBXLPQQXSRRODLGKHA--
bug-coreutils@HIDDEN
:bug#14224
; Package coreutils
.
Full text available.Received: (at 14224) by debbugs.gnu.org; 17 Apr 2013 23:13:44 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Apr 17 19:13:44 2013 Received: from localhost ([127.0.0.1]:57400 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1USbXr-0004kt-Rk for submit <at> debbugs.gnu.org; Wed, 17 Apr 2013 19:13:44 -0400 Received: from joseki.proulx.com ([216.17.153.58]:40402) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from <bob@HIDDEN>) id 1USbXn-0004kd-OE; Wed, 17 Apr 2013 19:13:42 -0400 Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119]) by joseki.proulx.com (Postfix) with ESMTP id 62E6F211DF; Wed, 17 Apr 2013 17:09:13 -0600 (MDT) Received: by hysteria.proulx.com (Postfix, from userid 1000) id 270BC2DCE3; Wed, 17 Apr 2013 17:09:13 -0600 (MDT) Date: Wed, 17 Apr 2013 17:09:13 -0600 From: Bob Proulx <bob@HIDDEN> To: George Brink <siberianowl@HIDDEN> Subject: Re: bug#14224: Feature request for the `cut`: record delimiter Message-ID: <20130417230913.GA19399@HIDDEN> References: <CAGXyeuhDGzByME80xwOkO5BtyepMexKUNgrP8rxv025-oQfH6w@HIDDEN> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <CAGXyeuhDGzByME80xwOkO5BtyepMexKUNgrP8rxv025-oQfH6w@HIDDEN> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Score: 0.1 (/) X-Debbugs-Envelope-To: 14224 Cc: 14224 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Sender: debbugs-submit-bounces <at> debbugs.gnu.org Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org X-Spam-Score: -2.6 (--) severity 14224 wishlist thanks George Brink wrote: > I have a task of extracting several "fields" from the text file. The > standard `cut` tool could be a perfect tool for a job, but... Thank you for the bug report. However note that 'cut' is often not the right tool for the job. Almost always when people want more than cut offers it is revealed that they should be using awk or other tool. > In my file the '\n' character is a legal symbol inside fields and therefore > the text file uses other symbol for record-separator. Then it isn't a text file. By definition. http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap03.html 3.392 Text File A file that contains characters organized into one or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline>. Although IEEE Std 1003.1-2001 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections. http://pubs.opengroup.org/onlinepubs/009695399/utilities/cut.html INPUT FILES The input files shall be text files, except that line lengths shall be unlimited. Of course GNU isn't Unix (nor POSIX) and we can extend them usefully if it makes sense to do so. However creeping featurism is the Evil and therefore will need discussion and justification. Could you please give a discription of your input syntax in more detail? Usually people will suggest a better tool for the job and that often solves the problem immediately. > The fix for this should be a simple one. I can probably make it > myself but where to send the patch? Since it isn't a bug then it isn't a "fix". It would be an enhancement. I have set the bug severity appropriately. > The README in coreutils suggests to read README-hacking and HACKING for > guide-lines on making a patch, but there are no such files in the the > coreutils-8.21.tar.xz. Anyone working on the source code is expected to be working from the version control files. Because the pace of change is rapid and doing so just makes it easier all around. Here is the current HACKING file from the vcs online web frontend. http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=HACKING;hb=HEAD Please read through that document. It should give you all of the information you need to submit patches to the project. Be sure to read the "Copyright assignment" section so that it doesn't come as a surprise later after a lot of work has been put into it. Any non-trivial contribution needs an assignment and it is good to get that started early. If you have any questions please ask them. Since this bug is already created it is okay to follow-up with questions here. Please keep the bug log address in the recipient list. But if you are asking questions or generating random discussion then please use the coreutils@HIDDEN mailing list instead of the bug tracker. We often spend a lot of time closing bug reports that are doing nothing but asking questions. Bob
bug-coreutils@HIDDEN
:bug#14224
; Package coreutils
.
Full text available.Received: (at submit) by debbugs.gnu.org; 17 Apr 2013 22:40:01 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Apr 17 18:40:01 2013 Received: from localhost ([127.0.0.1]:57359 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1USb1E-0003oI-NG for submit <at> debbugs.gnu.org; Wed, 17 Apr 2013 18:40:01 -0400 Received: from eggs.gnu.org ([208.118.235.92]:49662) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from <siberianowl@HIDDEN>) id 1USZwJ-0001y2-0E for submit <at> debbugs.gnu.org; Wed, 17 Apr 2013 17:30:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <siberianowl@HIDDEN>) id 1USZrz-0006ha-1J for submit <at> debbugs.gnu.org; Wed, 17 Apr 2013 17:26:25 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, HTML_MESSAGE,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:41504) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <siberianowl@HIDDEN>) id 1USZry-0006hU-Ui for submit <at> debbugs.gnu.org; Wed, 17 Apr 2013 17:26:22 -0400 Received: from eggs.gnu.org ([208.118.235.92]:32769) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <siberianowl@HIDDEN>) id 1USZrw-0005Zd-1A for bug-coreutils@HIDDEN; Wed, 17 Apr 2013 17:26:22 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <siberianowl@HIDDEN>) id 1USZrt-0006e3-H2 for bug-coreutils@HIDDEN; Wed, 17 Apr 2013 17:26:19 -0400 Received: from mail-wi0-x22d.google.com ([2a00:1450:400c:c05::22d]:33491) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <siberianowl@HIDDEN>) id 1USZrt-0006ct-AQ for bug-coreutils@HIDDEN; Wed, 17 Apr 2013 17:26:17 -0400 Received: by mail-wi0-f173.google.com with SMTP id c10so905958wiw.12 for <bug-coreutils@HIDDEN>; Wed, 17 Apr 2013 14:26:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type; bh=9ML1Foz8o9dBQS1b688Usp0W4clHt/eGXESX4NS5YzM=; b=Vc5kf+1O8tETplta1kcf4NwZIuuvxyfti3EAhUfS+7/A8Hrz/SwYF0zIkb6stDcKVS kWq+Kuz/YTqNx7NxIigrq6PR6Lkt7qK1kICI4Tk5FortHdnaYTsEHSorjNf7lxDrwfFT 8Bv5U/A/3OUdxpttwnrIff7ai4zdmdJmIdIwXnfGLEgXEZlUYcWszcb+TegjhHCKM3sx jWeXtbazl5x9UGPm8jKliTq/pCyhsMpmLsZ1JCc0xkilVTUKUjJw4e1yQ/Qd1ZiD77VA yQqkBCpxVQ95GFNzxh1Hm7Ge311vfGziLRVx6WnxGv3jWC1s+NuuzZ28yDEeJ7yEu/Ed knag== MIME-Version: 1.0 X-Received: by 10.180.109.197 with SMTP id hu5mr13940494wib.22.1366233976090; Wed, 17 Apr 2013 14:26:16 -0700 (PDT) Received: by 10.194.55.4 with HTTP; Wed, 17 Apr 2013 14:26:16 -0700 (PDT) Date: Wed, 17 Apr 2013 17:26:16 -0400 Message-ID: <CAGXyeuhDGzByME80xwOkO5BtyepMexKUNgrP8rxv025-oQfH6w@HIDDEN> Subject: Feature request for the `cut`: record delimiter From: George Brink <siberianowl@HIDDEN> To: bug-coreutils@HIDDEN Content-Type: multipart/alternative; boundary=e89a8f2356f7d335ee04da952286 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 208.118.235.17 X-Spam-Score: -3.4 (---) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Wed, 17 Apr 2013 18:40:00 -0400 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Sender: debbugs-submit-bounces <at> debbugs.gnu.org Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org X-Spam-Score: -6.1 (------) --e89a8f2356f7d335ee04da952286 Content-Type: text/plain; charset=ISO-8859-1 Hello, I have a task of extracting several "fields" from the text file. The standard `cut` tool could be a perfect tool for a job, but... In my file the '\n' character is a legal symbol inside fields and therefore the text file uses other symbol for record-separator. And the `cut` has a hard-coded '\n' for record separator (I just checked the source from the coreutils-8.21 package). The fix for this should be a simple one. I can probably make it myself but where to send the patch? The README in coreutils suggests to read README-hacking and HACKING for guide-lines on making a patch, but there are no such files in the the coreutils-8.21.tar.xz. --e89a8f2356f7d335ee04da952286 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div><div><div>Hello,<br><br></div>I have a task of extrac= ting several "fields" from the text file. The standard `cut` tool= could be a perfect tool for a job, but...<br>In my file the '\n' c= haracter is a legal symbol inside fields and therefore the text file uses o= ther symbol for record-separator. And the `cut` has a hard-coded '\n= 9; for record separator (I just checked the source from the coreutils-8.21 = package).<br> </div>The fix for this should be a simple one. I can probably make it mysel= f=A0 but where to send the patch?<br></div><div>The README in coreutils sug= gests to read README-hacking and HACKING for guide-lines on making a patch,= but there are no such files in the the coreutils-8.21.tar.xz.<br> <br></div><br></div> --e89a8f2356f7d335ee04da952286--
George Brink <siberianowl@HIDDEN>
:bug-coreutils@HIDDEN
.
Full text available.bug-coreutils@HIDDEN
:bug#14224
; Package coreutils
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.