GNU bug report logs -
#51792
coreutils - csplit - feature request
Previous Next
To reply to this bug, email your comments to 51792 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#51792
; Package
coreutils
.
(Fri, 12 Nov 2021 17:08:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Rodolfo Aramayo <raramayo <at> tamu.edu>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Fri, 12 Nov 2021 17:08:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Dear Coreutils Maintainers,
First, thank you for your work. I use coreutils daily both for my research
and teaching. It is a great set of tools.
Second, I recently needed to extract Coding Sequences information from a
GenBank file. GenBank files are used in Computational
Genomics/Bioinformatics extensively. I used csplit, and it works like a
charm.
The command I used is:
csplit -sz -n 5 --prefix=02_ 01_00001
/[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]CDS[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]/
{*};
I was unable to declare: "[[:space:]]\+" as I expected for POSIX aware code.
My question is: Is csplit POSIX compatible? and if it is not, can we make
it POSIX compatible?
Many Thanks
Rodolfo
--
Dr. Rodolfo Aramayo, PhD
Faculty of Biology and Genetics
Department of Biology, Texas A&M University
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#51792
; Package
coreutils
.
(Fri, 12 Nov 2021 18:24:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 51792 <at> debbugs.gnu.org (full text, mbox):
On 12/11/2021 17:05, Rodolfo Aramayo wrote:
> Dear Coreutils Maintainers,
>
> First, thank you for your work. I use coreutils daily both for my research
> and teaching. It is a great set of tools.
>
> Second, I recently needed to extract Coding Sequences information from a
> GenBank file. GenBank files are used in Computational
> Genomics/Bioinformatics extensively. I used csplit, and it works like a
> charm.
>
> The command I used is:
>
> csplit -sz -n 5 --prefix=02_ 01_00001
> /[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]CDS[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]/
> {*};
>
> I was unable to declare: "[[:space:]]\+" as I expected for POSIX aware code.
>
> My question is: Is csplit POSIX compatible? and if it is not, can we make
> it POSIX compatible?
Well POSIX defines BRE and ERE, with csplit supporting the former.
From the code we have:
re_syntax_options =
RE_SYNTAX_POSIX_BASIC & ~RE_CONTEXT_INVALID_DUP & ~RE_NO_EMPTY_RANGES;
Generally one can replace '+' functionality from ERE, with '\{1,\}' in BRE.
So you'd be using something like:
[[:space:]]\{1,\}CDS[[:space:]]\{1,\}
We might add an option to use ERE, though there isn't a big need
for that I think for csplit use cases.
cheers,
Pádraig
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#51792
; Package
coreutils
.
(Wed, 17 Nov 2021 20:08:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 51792 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Pádraig,
Thank you for your response
Unfortunately, even the command pattern you are proposing as an alternative:
[[:space:]]\{1,\}CDS[[:space:]]\{1,\}
does not work, therefore I have to conclude that csplit is neither BRE
and ERE compatible
Thanks for your help
R
On Fri, Nov 12, 2021 at 12:23 PM Pádraig Brady <P <at> draigbrady.com> wrote:
> On 12/11/2021 17:05, Rodolfo Aramayo wrote: > Dear Coreutils Maintainers,
> > > First, thank you for your work. I use coreutils daily both for my
> research > and teaching. It is a great set of tools. > > Second, I recently
> needed ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
> ZjQcmQRYFpfptBannerEnd
>
> On 12/11/2021 17:05, Rodolfo Aramayo wrote:
> > Dear Coreutils Maintainers,
> >
> > First, thank you for your work. I use coreutils daily both for my research
> > and teaching. It is a great set of tools.
> >
> > Second, I recently needed to extract Coding Sequences information from a
> > GenBank file. GenBank files are used in Computational
> > Genomics/Bioinformatics extensively. I used csplit, and it works like a
> > charm.
> >
> > The command I used is:
> >
> > csplit -sz -n 5 --prefix=02_ 01_00001
> > /[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]CDS[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]/
> > {*};
> >
> > I was unable to declare: "[[:space:]]\+" as I expected for POSIX aware code.
> >
> > My question is: Is csplit POSIX compatible? and if it is not, can we make
> > it POSIX compatible?
>
>
> Well POSIX defines BRE and ERE, with csplit supporting the former.
> From the code we have:
>
> re_syntax_options =
> RE_SYNTAX_POSIX_BASIC & ~RE_CONTEXT_INVALID_DUP & ~RE_NO_EMPTY_RANGES;
>
> Generally one can replace '+' functionality from ERE, with '\{1,\}' in BRE.
> So you'd be using something like:
>
> [[:space:]]\{1,\}CDS[[:space:]]\{1,\}
>
> We might add an option to use ERE, though there isn't a big need
> for that I think for csplit use cases.
>
> cheers,
> Pádraig
>
>
--
Dr. Rodolfo Aramayo, PhD
Faculty of Biology and Genetics
Department of Biology, Texas A&M University
PeerJ
PeerJ - the Journal of Life & Environmental Sciences
<https://peerj.com/> Academic
Editor peerj.com/RodolfoAramayo
[Message part 2 (text/html, inline)]
This bug report was last modified 3 years and 92 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.