GNU bug report logs - #51792
coreutils - csplit - feature request

Previous Next

Package: coreutils;

Reported by: Rodolfo Aramayo <raramayo <at> tamu.edu>

Date: Fri, 12 Nov 2021 17:08:02 UTC

Severity: wishlist

To reply to this bug, email your comments to 51792 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#51792; Package coreutils. (Fri, 12 Nov 2021 17:08:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Rodolfo Aramayo <raramayo <at> tamu.edu>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Fri, 12 Nov 2021 17:08:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Rodolfo Aramayo <raramayo <at> tamu.edu>
To: bug-coreutils <at> gnu.org
Subject: coreutils - csplit - feature request
Date: Fri, 12 Nov 2021 11:05:09 -0600
[Message part 1 (text/plain, inline)]
Dear Coreutils Maintainers,

First, thank you for your work. I use coreutils daily both for my research
and teaching. It is a great set of tools.

Second, I recently needed to extract Coding Sequences information from a
GenBank file. GenBank files are used in Computational
Genomics/Bioinformatics extensively. I used csplit, and it works like a
charm.

The command I used is:

csplit -sz -n 5 --prefix=02_ 01_00001
/[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]CDS[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]/
{*};

I was unable to declare: "[[:space:]]\+" as I expected for POSIX aware code.

My question is: Is csplit POSIX compatible? and if it is not, can we make
it POSIX compatible?

Many Thanks

Rodolfo

--
Dr. Rodolfo Aramayo, PhD
Faculty of Biology and Genetics
Department of Biology, Texas A&M University
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#51792; Package coreutils. (Fri, 12 Nov 2021 18:24:01 GMT) Full text and rfc822 format available.

Message #8 received at 51792 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Rodolfo Aramayo <raramayo <at> tamu.edu>, 51792 <at> debbugs.gnu.org
Subject: Re: bug#51792: coreutils - csplit - feature request
Date: Fri, 12 Nov 2021 18:23:37 +0000
On 12/11/2021 17:05, Rodolfo Aramayo wrote:
> Dear Coreutils Maintainers,
> 
> First, thank you for your work. I use coreutils daily both for my research
> and teaching. It is a great set of tools.
> 
> Second, I recently needed to extract Coding Sequences information from a
> GenBank file. GenBank files are used in Computational
> Genomics/Bioinformatics extensively. I used csplit, and it works like a
> charm.
> 
> The command I used is:
> 
> csplit -sz -n 5 --prefix=02_ 01_00001
> /[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]CDS[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]/
> {*};
> 
> I was unable to declare: "[[:space:]]\+" as I expected for POSIX aware code.
> 
> My question is: Is csplit POSIX compatible? and if it is not, can we make
> it POSIX compatible?


Well POSIX defines BRE and ERE, with csplit supporting the former.
From the code we have:

  re_syntax_options =
    RE_SYNTAX_POSIX_BASIC & ~RE_CONTEXT_INVALID_DUP & ~RE_NO_EMPTY_RANGES;

Generally one can replace '+' functionality from ERE, with '\{1,\}' in BRE.
So you'd be using something like:

  [[:space:]]\{1,\}CDS[[:space:]]\{1,\}

We might add an option to use ERE, though there isn't a big need
for that I think for csplit use cases.

cheers,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#51792; Package coreutils. (Wed, 17 Nov 2021 20:08:02 GMT) Full text and rfc822 format available.

Message #11 received at 51792 <at> debbugs.gnu.org (full text, mbox):

From: Rodolfo Aramayo <raramayo <at> tamu.edu>
To: Pádraig Brady <p <at> draigbrady.com>
Cc: Rodolfo Aramayo <raramayo <at> tamu.edu>, 51792 <at> debbugs.gnu.org
Subject: Re: bug#51792: coreutils - csplit - feature request
Date: Wed, 17 Nov 2021 13:33:24 -0600
[Message part 1 (text/plain, inline)]
Pádraig,

Thank you for your response

Unfortunately, even the command pattern you are proposing as an alternative:

  [[:space:]]\{1,\}CDS[[:space:]]\{1,\}

does not work, therefore I have to conclude that csplit is neither BRE
and ERE compatible

Thanks for your help

R




On Fri, Nov 12, 2021 at 12:23 PM Pádraig Brady <P <at> draigbrady.com> wrote:

> On 12/11/2021 17:05, Rodolfo Aramayo wrote: > Dear Coreutils Maintainers,
> > > First, thank you for your work. I use coreutils daily both for my
> research > and teaching. It is a great set of tools. > > Second, I recently
> needed ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
> ZjQcmQRYFpfptBannerEnd
>
> On 12/11/2021 17:05, Rodolfo Aramayo wrote:
> > Dear Coreutils Maintainers,
> >
> > First, thank you for your work. I use coreutils daily both for my research
> > and teaching. It is a great set of tools.
> >
> > Second, I recently needed to extract Coding Sequences information from a
> > GenBank file. GenBank files are used in Computational
> > Genomics/Bioinformatics extensively. I used csplit, and it works like a
> > charm.
> >
> > The command I used is:
> >
> > csplit -sz -n 5 --prefix=02_ 01_00001
> > /[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]CDS[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]/
> > {*};
> >
> > I was unable to declare: "[[:space:]]\+" as I expected for POSIX aware code.
> >
> > My question is: Is csplit POSIX compatible? and if it is not, can we make
> > it POSIX compatible?
>
>
> Well POSIX defines BRE and ERE, with csplit supporting the former.
>  From the code we have:
>
>    re_syntax_options =
>      RE_SYNTAX_POSIX_BASIC & ~RE_CONTEXT_INVALID_DUP & ~RE_NO_EMPTY_RANGES;
>
> Generally one can replace '+' functionality from ERE, with '\{1,\}' in BRE.
> So you'd be using something like:
>
>    [[:space:]]\{1,\}CDS[[:space:]]\{1,\}
>
> We might add an option to use ERE, though there isn't a big need
> for that I think for csplit use cases.
>
> cheers,
> Pádraig
>
>

-- 
Dr. Rodolfo Aramayo, PhD
Faculty of Biology and Genetics
Department of Biology, Texas A&M University
PeerJ
PeerJ - the Journal of Life & Environmental Sciences
<https://peerj.com/> Academic
Editor peerj.com/RodolfoAramayo
[Message part 2 (text/html, inline)]

This bug report was last modified 2 years and 159 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.