GNU bug report logs - #31979
csplit: a regexp pattern does not consider the negative offset of a previous regexp pattern

Previous Next

Package: coreutils;

Reported by: Stéphane Campinas <stephane.campinas <at> gmail.com>

Date: Tue, 26 Jun 2018 15:12:02 UTC

Severity: normal

Tags: notabug

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 31979 in the body.
You can then email your comments to 31979 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#31979; Package coreutils. (Tue, 26 Jun 2018 15:12:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Stéphane Campinas <stephane.campinas <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Tue, 26 Jun 2018 15:12:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stéphane Campinas <stephane.campinas <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: csplit: a regexp pattern does not consider the negative offset of a
 previous regexp pattern
Date: Tue, 26 Jun 2018 10:24:29 +0200
[Message part 1 (text/plain, inline)]
Hi,

When using two consecutive regexp patterns with a negative offset
applied to the first one, the second one doesn't start its input section
after the offset.

From the invocation [0] page it should:

	> [...] If it is given, the input up to (but not including) the
	  matching line plus or minus offset is put into the output file,
	  and the line after that begins the next section of input.

Here is an example of the problem, where I want to split a file of 50
lines having a number on each line, ranging from 1 to 50.

# My environment:

	- Linux mars 4.17.2-1-ARCH #1 SMP PREEMPT Sat Jun 16 11:08:59 UTC 2018 x86_64 GNU/Linux
	- csplit (GNU coreutils) 8.29

# A failing example with the unexpected behavior

	$ csplit numbers50.txt /15/-5 /12/
	18
	csplit: ‘/12/’: match not found
	123

# A working example when using a regexp pattern followed by a linenum pattern

	$ csplit numbers50.txt /15/-5 12
	18
	6
	117
	
	$ head xx*
	==> xx00 <==
	1
	2
	3
	4
	5
	6
	7
	8
	9
	
	==> xx01 <==
	10
	11
	
	==> xx02 <==
	12
	13
	14
	15
	16
	17
	18
	19
	20
	21

I think that both should work and output the same thing. I have found
this while trying to port csplit to rust at [1] for some more
information, as I have tried to understand the cause of this behavior
in the code.

Cheers,

[0] https://www.gnu.org/software/coreutils/manual/html_node/csplit-invocation.html#csplit-invocation
[1] https://github.com/uutils/coreutils/issues/501#issuecomment-399569870

-- 
Stephane Campinas
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#31979; Package coreutils. (Mon, 24 Sep 2018 12:41:02 GMT) Full text and rfc822 format available.

Message #8 received at 31979 <at> debbugs.gnu.org (full text, mbox):

From: Stéphane Campinas <stephane.campinas <at> gmail.com>
To: 31979 <at> debbugs.gnu.org
Subject: csplit: a regexp pattern does not consider the negative offset of a
Date: Mon, 24 Sep 2018 14:40:45 +0200
[Message part 1 (text/plain, inline)]
Hi,

After attempting to port csplit, I think I understand why it is like
that: it is to stop the iteration in case a pattern should be executed
several times. Therefore, maybe an easy fix is to alter the
documentation to indicate that lines within a negative offset are not
matched in subsequent patterns, with the exception of the line-based
pattern.

Cheers,

-- 
Stephane Campinas
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#31979; Package coreutils. (Tue, 25 Sep 2018 06:54:02 GMT) Full text and rfc822 format available.

Message #11 received at 31979 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Stéphane Campinas <stephane.campinas <at> gmail.com>,
 31979 <at> debbugs.gnu.org
Subject: Re: bug#31979: csplit: a regexp pattern does not consider the
 negative offset of a
Date: Mon, 24 Sep 2018 23:52:57 -0700
tag 31979 notabug
close 31979
stop

On 24/09/18 05:40, Stéphane Campinas wrote:
> Hi,
> 
> After attempting to port csplit, I think I understand why it is like
> that: it is to stop the iteration in case a pattern should be executed
> several times. Therefore, maybe an easy fix is to alter the
> documentation to indicate that lines within a negative offset are not
> matched in subsequent patterns, with the exception of the line-based
> pattern.

Thanks for following up.
I pushed that clarification in your name at:
https://git.sv.gnu.org/cgit/coreutils.git/commit/?id=7262994

cheers,
Pádraig




Added tag(s) notabug. Request was from Pádraig Brady <P <at> draigBrady.com> to control <at> debbugs.gnu.org. (Tue, 25 Sep 2018 06:54:03 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 31979 <at> debbugs.gnu.org and Stéphane Campinas <stephane.campinas <at> gmail.com> Request was from Pádraig Brady <P <at> draigBrady.com> to control <at> debbugs.gnu.org. (Tue, 25 Sep 2018 06:54:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 23 Oct 2018 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 186 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.