GNU bug report logs -
#25832
split (v 8.25) with numeric suffixes beyond 89
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 25832 in the body.
You can then email your comments to 25832 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#25832
; Package
coreutils
.
(Wed, 22 Feb 2017 00:58:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Holger Wolff <holger-bug-coreutils <at> wolffh.de>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Wed, 22 Feb 2017 00:58:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hello
Incorrect numeric suffixes are sometimes produced when going beyond
number 89:
Assume a file "test.txt" with 1000 lines, and the command
$ split -d -l 10 test.txt test_
I expect files test_00 through test_99, but what I get are test_00
through test_89 and test_9000 through test_9009.
The same happens when I use
$ split --numeric-suffixes -l 10 test.txt test_
but not when I use this line:
$ split --numeric-suffixes=0 -l 10 test.txt test_
I have not found this bug mentioned before, but if I missed this, I am
sorry.
$ split --version
split (GNU coreutils) 8.25
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Torbjörn Granlund and Richard M. Stallman.
Thank you
Holger
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#25832
; Package
coreutils
.
(Wed, 22 Feb 2017 02:41:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 25832 <at> debbugs.gnu.org (full text, mbox):
Hello,
> On Feb 21, 2017, at 19:55, Holger Wolff <holger-bug-coreutils <at> wolffh.de> wrote:
>
> Incorrect numeric suffixes are sometimes produced when going beyond number 89:
> Assume a file "test.txt" with 1000 lines, and the command
>
> $ split -d -l 10 test.txt test_
>
> I expect files test_00 through test_99, but what I get are test_00 through test_89 and test_9000 through test_9009.
Thank you for the bug report.
I can confirm this is reproducible in the latest revision.
The immediate reason is that without a starting value,
coreutil's split has a feature to 'widen' the filename,
but the logic to widen it follows the alphabet widening
and doesn't work well for numeric widening.
That is, when not using numeric-suffixes,
'yz' (the last two letters) are widened to 'zaaa':
$ seq 1000 | split -l 1 - foo_
will result in:
...
foo_yy
foo_yz
foo_zaaa
foo_zaab
...
And you are seeing the last two digits ('89')
widened in the same logic (to '9000').
Technically, if 'numeric_suffix_start'
is left as 'null' in the parsing of --numeric-suffix:
http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/split.c#n1455
then the widening logic behaves as if those were letters, not digits
in 'split.c:next_file_name()':
http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/split.c#n403
An immediate band-aid of defaulting to numeric_suffix_start=0
will result in an unintended consequences (a regression, perhaps):
If more files needs to be created, an explicit numeric start value prevents
filename widening (this wasn't the case in your example because 1000 lines fit in 100 files of 10 lines):
# Works, filenames will be widened to 9010.
$ seq 1001 | split -l 10 --numeric-suffix - foo_
# Widening is not allowed (from default of 2 digits), split fails:
$ seq 1001 | split -l 10 --numeric-suffix=0 - foo_
split: output file suffixes exhausted
What do others think: default to no-widening for numeric suffixes,
or add code to 'next_file_name()' for numeric widening ?
-assaf
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#25832
; Package
coreutils
.
(Wed, 22 Feb 2017 03:33:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 25832 <at> debbugs.gnu.org (full text, mbox):
unarchive 20874
forcemerge 20874 25832
stop
On 21/02/17 18:40, Assaf Gordon wrote:
> Hello,
>
>> On Feb 21, 2017, at 19:55, Holger Wolff <holger-bug-coreutils <at> wolffh.de> wrote:
>>
>> Incorrect numeric suffixes are sometimes produced when going beyond number 89:
>> Assume a file "test.txt" with 1000 lines, and the command
>>
>> $ split -d -l 10 test.txt test_
>>
>> I expect files test_00 through test_99, but what I get are test_00 through test_89 and test_9000 through test_9009.
>
> Thank you for the bug report.
>
> I can confirm this is reproducible in the latest revision.
>
> The immediate reason is that without a starting value,
> coreutil's split has a feature to 'widen' the filename,
> but the logic to widen it follows the alphabet widening
> and doesn't work well for numeric widening.
>
> That is, when not using numeric-suffixes,
> 'yz' (the last two letters) are widened to 'zaaa':
>
> $ seq 1000 | split -l 1 - foo_
>
> will result in:
>
> ...
> foo_yy
> foo_yz
> foo_zaaa
> foo_zaab
> ...
>
> And you are seeing the last two digits ('89')
> widened in the same logic (to '9000').
>
>
> Technically, if 'numeric_suffix_start'
> is left as 'null' in the parsing of --numeric-suffix:
> http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/split.c#n1455
>
> then the widening logic behaves as if those were letters, not digits
> in 'split.c:next_file_name()':
> http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/split.c#n403
>
>
>
> An immediate band-aid of defaulting to numeric_suffix_start=0
> will result in an unintended consequences (a regression, perhaps):
> If more files needs to be created, an explicit numeric start value prevents
> filename widening (this wasn't the case in your example because 1000 lines fit in 100 files of 10 lines):
>
> # Works, filenames will be widened to 9010.
> $ seq 1001 | split -l 10 --numeric-suffix - foo_
>
> # Widening is not allowed (from default of 2 digits), split fails:
> $ seq 1001 | split -l 10 --numeric-suffix=0 - foo_
> split: output file suffixes exhausted
>
>
> What do others think: default to no-widening for numeric suffixes,
> or add code to 'next_file_name()' for numeric widening ?
This was discussed at http://bugs.gnu.org/20874
I'm not sure anything needs to be done here,
since for backward compat for concat operations
expecting lexical sort we use the current auto widening scheme.
cheers,
Pádraig
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#25832
; Package
coreutils
.
(Wed, 22 Feb 2017 04:02:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 25832 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
> On Feb 21, 2017, at 22:32, Pádraig Brady <P <at> draigBrady.com> wrote:
>
> This was discussed at http://bugs.gnu.org/20874
Missed that - sorry. I should've looked through the archives first...
> I'm not sure anything needs to be done here,
> since for backward compat for concat operations
> expecting lexical sort we use the current auto widening scheme.
I wonder if users who ask for --numeric-suffixes also
implicitly prefer an intuitive order (one that won't work
for lexical sorting but would with version sort).
But that is a new feature, and perhaps a backwards-incompatible one.
However the fact that "--numeric-suffixes=0" and "--numeric-suffixes"
both start from zero but behave differently if there's more than 90 output
files is a bit unintuitive (because '=0' implies max-length).
Perhaps worth adding to the 'coreutils gotchas' page?
Attached is a suggestion for such text.
regards,
-assaf
[split-gotcha.patch (application/octet-stream, attachment)]
[Message part 3 (text/plain, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#25832
; Package
coreutils
.
(Wed, 22 Feb 2017 05:00:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 25832 <at> debbugs.gnu.org (full text, mbox):
On 21/02/17 20:01, Assaf Gordon wrote:
>
>> On Feb 21, 2017, at 22:32, Pádraig Brady <P <at> draigBrady.com> wrote:
>>
>> This was discussed at http://bugs.gnu.org/20874
>
> Missed that - sorry. I should've looked through the archives first...
>
>> I'm not sure anything needs to be done here,
>> since for backward compat for concat operations
>> expecting lexical sort we use the current auto widening scheme.
>
> I wonder if users who ask for --numeric-suffixes also
> implicitly prefer an intuitive order (one that won't work
> for lexical sorting but would with version sort).
>
> But that is a new feature, and perhaps a backwards-incompatible one.
>
> However the fact that "--numeric-suffixes=0" and "--numeric-suffixes"
> both start from zero but behave differently if there's more than 90 output
> files is a bit unintuitive (because '=0' implies max-length).
>
> Perhaps worth adding to the 'coreutils gotchas' page?
> Attached is a suggestion for such text.
Excellent, used that for the basis of the update at:
https://www.pixelbeat.org/docs/coreutils-gotchas.html#split
thanks!
Pádraig
Forcibly Merged 20874 25832.
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Mon, 29 Oct 2018 02:45:02 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Mon, 26 Nov 2018 12:24:06 GMT)
Full text and
rfc822 format available.
This bug report was last modified 5 years and 124 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.