GNU bug report logs - #9418
case sensitivity buggy in sort

Previous Next

Package: coreutils;

Reported by: Michał Janke <jankeso <at> gmail.com>

Date: Thu, 1 Sep 2011 16:05:02 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 9418 in the body.
You can then email your comments to 9418 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9418; Package coreutils. (Thu, 01 Sep 2011 16:05:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Michał Janke <jankeso <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Thu, 01 Sep 2011 16:05:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Michał Janke <jankeso <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: case sensitivity buggy in sort
Date: Thu, 1 Sep 2011 10:58:58 +0200
sort (GNU coreutils) 8.12

The case-sensitivity looks buggy in sort. Have a look at these examples:

$ cat bbb
A B b 0
a B b 0
A b b 1

$ sort bbb
a B b 0
A B b 0
A b b 1

$ sort -k1,2 bbb
a B b 0
A b b 1
A B b 0


$ cat ccc
A 2 b 0
a 2 b 0
A 1 b 1

$ sort ccc
A 1 b 1
a 2 b 0
A 2 b 0

$ sort -k1 ccc
A 1 b 1
a 2 b 0
A 2 b 0

$ sort -k1,2 ccc
A 1 b 1
a 2 b 0
A 2 b 0

$ sort -k1,1 ccc
a 2 b 0
A 1 b 1
A 2 b 0


$ cat ddd
A2 b 0
a2 b 0
A1 b 1

$ sort ddd
A1 b 1
a2 b 0
A2 b 0

$ sort -k1 ddd
A1 b 1
a2 b 0
A2 b 0

$ sort -k1,1 ddd
A1 b 1
a2 b 0
A2 b 0

$ sort -k1,2 ddd
A1 b 1
a2 b 0
A2 b 0

$ sort -k1,3 ddd
A1 b 1
a2 b 0
A2 b 0




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9418; Package coreutils. (Thu, 01 Sep 2011 17:31:02 GMT) Full text and rfc822 format available.

Message #8 received at 9418 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Michał Janke <jankeso <at> gmail.com>
Cc: 9418 <at> debbugs.gnu.org
Subject: Re: bug#9418: case sensitivity buggy in sort
Date: Thu, 01 Sep 2011 10:27:04 -0700
This is surely a problem with your locale.
Please try setting LC_ALL=C in your environment
before running the tests.  E.g., in bash:

export LC_ALL=C

If that fixes the problem, it's definitely your locale.




Reply sent to Eric Blake <eblake <at> redhat.com>:
You have taken responsibility. (Thu, 01 Sep 2011 17:36:01 GMT) Full text and rfc822 format available.

Notification sent to Michał Janke <jankeso <at> gmail.com>:
bug acknowledged by developer. (Thu, 01 Sep 2011 17:36:02 GMT) Full text and rfc822 format available.

Message #13 received at 9418-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Michał Janke <jankeso <at> gmail.com>
Cc: 9418-done <at> debbugs.gnu.org
Subject: Re: bug#9418: case sensitivity buggy in sort
Date: Thu, 01 Sep 2011 10:32:45 -0600
tag 9418 notabug
thanks

On 09/01/2011 02:58 AM, Michał Janke wrote:
> sort (GNU coreutils) 8.12
>
> The case-sensitivity looks buggy in sort. Have a look at these examples:

Thanks for the report.  However, this is most likely due to your choice 
of locale, and not a bug in sort; this is a FAQ:
https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021

Using 'sort --debug' will help expose the issue.

> $ sort -k1,2 bbb
> a B b 0
> A b b 1
> A B b 0

$ sort --debug bbb -k1,2
sort: using `en_US.UTF-8' sorting rules
sort: leading blanks are significant in key 1; consider also specifying `b'
a B b 0
___
_______
A b b 1
___
_______
A B b 0
___
_______
$ LC_ALL=C ../coreutils/src/sort --debug bbb -k1,2
../coreutils/src/sort: using simple byte comparison
A B b 0
___
_______
A b b 1
___
_______
a B b 0
___
_______

See the difference?  In the C locale, you get ascii sorting (A comes 
before B comes before a comes before b), in the en_US.UTF-8 locale, you 
get dictionary collation sorting (a comes before A comes before b comes 
before B).

-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org




Message #14 received at 9418-done <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Michał Janke <jankeso <at> gmail.com>
Cc: 9418-done <at> debbugs.gnu.org
Subject: Re: bug#9418: case sensitivity buggy in sort
Date: Thu, 01 Sep 2011 18:35:14 +0100
On 09/01/2011 06:27 PM, Paul Eggert wrote:
> This is surely a problem with your locale.
> Please try setting LC_ALL=C in your environment
> before running the tests.  E.g., in bash:
> 
> export LC_ALL=C
> 
> If that fixes the problem, it's definitely your locale.

I'm marking this done as it's a locale issue.
Your locale is treating 'a' and 'A' as the equal if
there is some other part of the string to distinguish on,
or else is putting lower case before upper case.

Do as Paul suggested above to disable this.
Also note the -s and --debug options.

$ printf "%s\n" 'A' 'a' 'A 1' 'a 2' | sort -bs --debug
sort: using `en_US.utf8' sorting rules
a
_
A
_
A 1
___
a 2
___

cheers,
Pádraig




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9418; Package coreutils. (Thu, 01 Sep 2011 19:09:01 GMT) Full text and rfc822 format available.

Message #17 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mike Frysinger <vapier <at> gentoo.org>
To: bug-coreutils <at> gnu.org
Cc: 9418 <at> debbugs.gnu.org, Michał Janke <jankeso <at> gmail.com>
Subject: Re: bug#9418: case sensitivity buggy in sort
Date: Thu, 1 Sep 2011 15:04:52 -0400
[Message part 1 (text/plain, inline)]
On Thursday, September 01, 2011 04:58:58 Michał Janke wrote:
> sort (GNU coreutils) 8.12
> 
> The case-sensitivity looks buggy in sort. Have a look at these examples:

the million dollar question: what are your locale settings ?  post the output 
of running `locale`.  then see if you get different results after you do: 
export LC_COLLATE=C (unless you've got an LC_ALL clobber in effect, then set 
LC_ALL instead).
-mike
[signature.asc (application/pgp-signature, inline)]

Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9418; Package coreutils. (Thu, 01 Sep 2011 19:09:02 GMT) Full text and rfc822 format available.

Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9418; Package coreutils. (Fri, 02 Sep 2011 06:50:02 GMT) Full text and rfc822 format available.

Message #23 received at 9418 <at> debbugs.gnu.org (full text, mbox):

From: Michał Janke <jankeso <at> gmail.com>
To: 9418 <at> debbugs.gnu.org
Subject: Re: bug#9418: closed (Re: bug#9418: case sensitivity buggy in sort)
Date: Fri, 2 Sep 2011 08:46:23 +0200
2011/9/1 GNU bug Tracking System <help-debbugs <at> gnu.org>:
> Your bug report
>
> #9418: case sensitivity buggy in sort
>
> which was filed against the coreutils package, has been closed.
>
> The explanation is attached below, along with your original report.
> If you require more details, please reply to 9418 <at> debbugs.gnu.org.
>
> --
> 9418: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=9418
> GNU Bug Tracking System
> Contact help-debbugs <at> gnu.org with problems
>
>
> ---------- Wiadomość przekazana dalej ----------
> From: Eric Blake <eblake <at> redhat.com>
> To: "Michał Janke" <jankeso <at> gmail.com>
> Date: Thu, 01 Sep 2011 10:32:45 -0600
> Subject: Re: bug#9418: case sensitivity buggy in sort
> tag 9418 notabug
> thanks
>
> On 09/01/2011 02:58 AM, Michał Janke wrote:
>>
>> sort (GNU coreutils) 8.12
>>
>> The case-sensitivity looks buggy in sort. Have a look at these examples:
>
> Thanks for the report.  However, this is most likely due to your choice of locale, and not a bug in sort; this is a FAQ:
> https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021
>
> Using 'sort --debug' will help expose the issue.
>
>> $ sort -k1,2 bbb
>> a B b 0
>> A b b 1
>> A B b 0
>
> $ sort --debug bbb -k1,2
> sort: using `en_US.UTF-8' sorting rules
> sort: leading blanks are significant in key 1; consider also specifying `b'
> a B b 0
> ___
> _______
> A b b 1
> ___
> _______
> A B b 0
> ___
> _______
> $ LC_ALL=C ../coreutils/src/sort --debug bbb -k1,2
> ../coreutils/src/sort: using simple byte comparison
> A B b 0
> ___
> _______
> A b b 1
> ___
> _______
> a B b 0
> ___
> _______
>
> See the difference?  In the C locale, you get ascii sorting (A comes before B comes before a comes before b), in the en_US.UTF-8 locale, you get dictionary collation sorting (a comes before A comes before b comes before B).
>
> --
> Eric Blake   eblake <at> redhat.com    +1-801-349-2682
> Libvirt virtualization library http://libvirt.org
>
>
>
> ---------- Wiadomość przekazana dalej ----------
> From: "Michał Janke" <jankeso <at> gmail.com>
> To: bug-coreutils <at> gnu.org
> Date: Thu, 1 Sep 2011 10:58:58 +0200
> Subject: case sensitivity buggy in sort
> sort (GNU coreutils) 8.12
>
> The case-sensitivity looks buggy in sort. Have a look at these examples:
>
> $ cat bbb
> A B b 0
> a B b 0
> A b b 1
>
> $ sort bbb
> a B b 0
> A B b 0
> A b b 1
>
> $ sort -k1,2 bbb
> a B b 0
> A b b 1
> A B b 0
>
>
> $ cat ccc
> A 2 b 0
> a 2 b 0
> A 1 b 1
>
> $ sort ccc
> A 1 b 1
> a 2 b 0
> A 2 b 0
>
> $ sort -k1 ccc
> A 1 b 1
> a 2 b 0
> A 2 b 0
>
> $ sort -k1,2 ccc
> A 1 b 1
> a 2 b 0
> A 2 b 0
>
> $ sort -k1,1 ccc
> a 2 b 0
> A 1 b 1
> A 2 b 0
>
>
> $ cat ddd
> A2 b 0
> a2 b 0
> A1 b 1
>
> $ sort ddd
> A1 b 1
> a2 b 0
> A2 b 0
>
> $ sort -k1 ddd
> A1 b 1
> a2 b 0
> A2 b 0
>
> $ sort -k1,1 ddd
> A1 b 1
> a2 b 0
> A2 b 0
>
> $ sort -k1,2 ddd
> A1 b 1
> a2 b 0
> A2 b 0
>
> $ sort -k1,3 ddd
> A1 b 1
> a2 b 0
> A2 b 0
>
>
>
>

I definitely don't agree with "locale issue" explanation. This is not
a problem of some letter being treated as > or < than other
- the problem is that it is _sometimes_ one way, sometimes the other!
Please have a closer look at this one:

$ cat aaa
aa 1
AA 1
Aa 0

Now consider what should be the output of sort in two cases: A>a and A<a.
If A>a, the result should be
aa 1
Aa 0
AA 1

If A<a, it should be
AA 1
Aa 0
aa 1

And now the actual result:

$ sort aaa
Aa 0
aa 1
AA 1

So the lines are sorted in first place according to the second column!

But true, when locale is changed to native POSIX, the sorting is done reasonably

$ LC_ALL=C sort aaa
AA 1
Aa 0
aa 1

So yes, the bug is visible only with non-standard defined locale, but
_no_ - the results in cases of other locales are not correct.
The capital and lower-case letters seem to just aliased.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9418; Package coreutils. (Fri, 02 Sep 2011 07:01:01 GMT) Full text and rfc822 format available.

Message #26 received at 9418 <at> debbugs.gnu.org (full text, mbox):

From: Michał Janke <jankeso <at> gmail.com>
To: 9418 <at> debbugs.gnu.org
Subject: Re: bug#9418: closed (Re: bug#9418: case sensitivity buggy in sort)
Date: Fri, 2 Sep 2011 08:57:14 +0200
2011/9/2 Michał Janke <jankeso <at> gmail.com>:
> 2011/9/1 GNU bug Tracking System <help-debbugs <at> gnu.org>:
>> Your bug report
>>
>> #9418: case sensitivity buggy in sort
>>
>> which was filed against the coreutils package, has been closed.
>>
>> The explanation is attached below, along with your original report.
>> If you require more details, please reply to 9418 <at> debbugs.gnu.org.
>>
>> --
>> 9418: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=9418
>> GNU Bug Tracking System
>> Contact help-debbugs <at> gnu.org with problems
>>
>>
>> ---------- Wiadomość przekazana dalej ----------
>> From: Eric Blake <eblake <at> redhat.com>
>> To: "Michał Janke" <jankeso <at> gmail.com>
>> Date: Thu, 01 Sep 2011 10:32:45 -0600
>> Subject: Re: bug#9418: case sensitivity buggy in sort
>> tag 9418 notabug
>> thanks
>>
>> On 09/01/2011 02:58 AM, Michał Janke wrote:
>>>
>>> sort (GNU coreutils) 8.12
>>>
>>> The case-sensitivity looks buggy in sort. Have a look at these examples:
>>
>> Thanks for the report.  However, this is most likely due to your choice of locale, and not a bug in sort; this is a FAQ:
>> https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021
>>
>> Using 'sort --debug' will help expose the issue.
>>
>>> $ sort -k1,2 bbb
>>> a B b 0
>>> A b b 1
>>> A B b 0
>>
>> $ sort --debug bbb -k1,2
>> sort: using `en_US.UTF-8' sorting rules
>> sort: leading blanks are significant in key 1; consider also specifying `b'
>> a B b 0
>> ___
>> _______
>> A b b 1
>> ___
>> _______
>> A B b 0
>> ___
>> _______
>> $ LC_ALL=C ../coreutils/src/sort --debug bbb -k1,2
>> ../coreutils/src/sort: using simple byte comparison
>> A B b 0
>> ___
>> _______
>> A b b 1
>> ___
>> _______
>> a B b 0
>> ___
>> _______
>>
>> See the difference?  In the C locale, you get ascii sorting (A comes before B comes before a comes before b), in the en_US.UTF-8 locale, you get dictionary collation sorting (a comes before A comes before b comes before B).
>>
>> --
>> Eric Blake   eblake <at> redhat.com    +1-801-349-2682
>> Libvirt virtualization library http://libvirt.org
>>
>>
>>
>> ---------- Wiadomość przekazana dalej ----------
>> From: "Michał Janke" <jankeso <at> gmail.com>
>> To: bug-coreutils <at> gnu.org
>> Date: Thu, 1 Sep 2011 10:58:58 +0200
>> Subject: case sensitivity buggy in sort
>> sort (GNU coreutils) 8.12
>>
>> The case-sensitivity looks buggy in sort. Have a look at these examples:
>>
>> $ cat bbb
>> A B b 0
>> a B b 0
>> A b b 1
>>
>> $ sort bbb
>> a B b 0
>> A B b 0
>> A b b 1
>>
>> $ sort -k1,2 bbb
>> a B b 0
>> A b b 1
>> A B b 0
>>
>>
>> $ cat ccc
>> A 2 b 0
>> a 2 b 0
>> A 1 b 1
>>
>> $ sort ccc
>> A 1 b 1
>> a 2 b 0
>> A 2 b 0
>>
>> $ sort -k1 ccc
>> A 1 b 1
>> a 2 b 0
>> A 2 b 0
>>
>> $ sort -k1,2 ccc
>> A 1 b 1
>> a 2 b 0
>> A 2 b 0
>>
>> $ sort -k1,1 ccc
>> a 2 b 0
>> A 1 b 1
>> A 2 b 0
>>
>>
>> $ cat ddd
>> A2 b 0
>> a2 b 0
>> A1 b 1
>>
>> $ sort ddd
>> A1 b 1
>> a2 b 0
>> A2 b 0
>>
>> $ sort -k1 ddd
>> A1 b 1
>> a2 b 0
>> A2 b 0
>>
>> $ sort -k1,1 ddd
>> A1 b 1
>> a2 b 0
>> A2 b 0
>>
>> $ sort -k1,2 ddd
>> A1 b 1
>> a2 b 0
>> A2 b 0
>>
>> $ sort -k1,3 ddd
>> A1 b 1
>> a2 b 0
>> A2 b 0
>>
>>
>>
>>
>
> I definitely don't agree with "locale issue" explanation. This is not
> a problem of some letter being treated as > or < than other
> - the problem is that it is _sometimes_ one way, sometimes the other!
> Please have a closer look at this one:
>
> $ cat aaa
> aa 1
> AA 1
> Aa 0
>
> Now consider what should be the output of sort in two cases: A>a and A<a.
> If A>a, the result should be
> aa 1
> Aa 0
> AA 1
>
> If A<a, it should be
> AA 1
> Aa 0
> aa 1
>
> And now the actual result:
>
> $ sort aaa
> Aa 0
> aa 1
> AA 1
>
> So the lines are sorted in first place according to the second column!
>
> But true, when locale is changed to native POSIX, the sorting is done reasonably
>
> $ LC_ALL=C sort aaa
> AA 1
> Aa 0
> aa 1
>
> So yes, the bug is visible only with non-standard defined locale, but
> _no_ - the results in cases of other locales are not correct.
> The capital and lower-case letters seem to just aliased.
>

If it is the _locale_ that decides on upper and lower case letters
being equal, then the bug is in locale - the results look absurd.
Where should a bugreport about locale go?




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9418; Package coreutils. (Fri, 02 Sep 2011 07:15:02 GMT) Full text and rfc822 format available.

Message #29 received at 9418 <at> debbugs.gnu.org (full text, mbox):

From: Michał Janke <jankeso <at> gmail.com>
To: 9418 <at> debbugs.gnu.org
Subject: Fwd: bug#9418: case sensitivity buggy in sort
Date: Fri, 2 Sep 2011 09:11:30 +0200
---------- Forwarded message ----------
From: Michał Janke <jankeso <at> gmail.com>
Date: 2011/9/2
Subject: Re: bug#9418: case sensitivity buggy in sort
To: Pádraig Brady <P <at> draigbrady.com>


2011/9/1 Pádraig Brady <P <at> draigbrady.com>:
> On 09/01/2011 06:27 PM, Paul Eggert wrote:
>> This is surely a problem with your locale.
>> Please try setting LC_ALL=C in your environment
>> before running the tests.  E.g., in bash:
>>
>> export LC_ALL=C
>>
>> If that fixes the problem, it's definitely your locale.
>
> I'm marking this done as it's a locale issue.
> Your locale is treating 'a' and 'A' as the equal if
> there is some other part of the string to distinguish on,
> or else is putting lower case before upper case.
>

Yes, that is exactly the case - why on earth would someone want that?
This results in just some sorting madness!

> Do as Paul suggested above to disable this.
> Also note the -s and --debug options.
>
> $ printf "%s\n" 'A' 'a' 'A 1' 'a 2' | sort -bs --debug
> sort: using `en_US.utf8' sorting rules
> a
> _
> A
> _
> A 1
> ___
> a 2
> ___
>
> cheers,
> Pádraig
>




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9418; Package coreutils. (Fri, 02 Sep 2011 09:15:02 GMT) Full text and rfc822 format available.

Message #32 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Davide Brini <dave_br <at> gmx.com>
To: bug-coreutils <at> gnu.org
Subject: Re: bug#9418: closed (Re: bug#9418: case sensitivity buggy in sort)
Date: Fri, 2 Sep 2011 11:10:26 +0200
On Fri, 2 Sep 2011 08:46:23 +0200, Michał Janke <jankeso <at> gmail.com> wrote:

> I definitely don't agree with "locale issue" explanation. This is not
> a problem of some letter being treated as > or < than other
> - the problem is that it is _sometimes_ one way, sometimes the other!
> Please have a closer look at this one:
> 
> $ cat aaa
> aa 1
> AA 1
> Aa 0
> 
> Now consider what should be the output of sort in two cases: A>a and A<a.
> If A>a, the result should be
> aa 1
> Aa 0
> AA 1
> 
> If A<a, it should be
> AA 1
> Aa 0
> aa 1
> 
> And now the actual result:
> 
> $ sort aaa
> Aa 0
> aa 1
> AA 1
> 
> So the lines are sorted in first place according to the second column!

I think what's happening is that you're seeing that unicode sort is
multilevel. In a nutshell (and very simplified), "A" and "a", for unicode,
are "the same base letter" and so are equivalent when compared with "1" or
"0", so the second column in your example is what determines the sort order.
Within themeselves, however, "a" sorts before "A", so that explains lines 2
and 3 of your output. Again, this is a gross oversimplification, but
hopefully gives you the idea.

A bit less simplified (but still quite far from the real thing):

- If there are any differences in base letters, that determines the result
- Otherwise, if there are any differences in accents*, that determines the
  results
- Otherwise, if there are any differences in case*, that determines the
  results
- Otherwise, if there are any differences in punctuation*, that determines
  the results

(taken from one of the pages linked below)

You may want to read 

http://www.unicode.org/reports/tr10/

and play with (for example)

http://demo.icu-project.org/icu-bin/locexp?_=en_US&d_=en&x=col

(choosing the locale of your choice) to get an idea of how it works.

With your example and the US locale, the tool gives

03: Aa 0
27 27 04 12 01 08 01 8f 07 00
01: aa 1
27 27 04 14 01 08 01 08 00
02: AA 1
27 27 04 14 01 08 01 8f 8f 06 00 

which you can then interpret with the help of the document that explains
the unicode collation algorithm.

(not that I agree with any of the localization madness, but
understanding always helps).

-- 
D.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9418; Package coreutils. (Sun, 04 Sep 2011 14:45:02 GMT) Full text and rfc822 format available.

Message #35 received at 9418 <at> debbugs.gnu.org (full text, mbox):

From: James Cloos <cloos <at> jhcloos.com>
To: Michał Janke <jankeso <at> gmail.com>
Cc: 9418 <at> debbugs.gnu.org
Subject: Re: bug#9418: Fwd: bug#9418: case sensitivity buggy in sort
Date: Sun, 04 Sep 2011 10:39:58 -0400
>>>>> "MJ" == Michał Janke <jankeso <at> gmail.com> writes:

MJ> Yes, that is exactly the case - why on earth would someone want that?
MJ> This results in just some sorting madness!

Complaints have been made about glibc's absurd and insane preference for
case insensitive collation (at least in en and the euro locales) for
nearly 20 years now.  All w/o resolution.

The other place this hits, and where many first saw it, is that it
allows a command like 'rm [a-z]*' to unlink(2) files like Makefile.

-JimC
-- 
James Cloos <cloos <at> jhcloos.com>         OpenPGP: 1024D/ED7DAEA6




Added tag(s) notabug. Request was from Eric Blake <eblake <at> redhat.com> to control <at> debbugs.gnu.org. (Tue, 20 Sep 2011 18:01:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-coreutils <at> gnu.org:
bug#9418; Package coreutils. (Mon, 17 Oct 2011 09:04:01 GMT) Full text and rfc822 format available.

Message #40 received at 9418 <at> debbugs.gnu.org (full text, mbox):

From: Michał Janke <jankeso <at> gmail.com>
To: James Cloos <cloos <at> jhcloos.com>
Cc: 9418 <at> debbugs.gnu.org
Subject: Re: bug#9418: Fwd: bug#9418: case sensitivity buggy in sort
Date: Mon, 17 Oct 2011 11:02:49 +0200
2011/9/4 James Cloos <cloos <at> jhcloos.com>:
>>>>>> "MJ" == Michał Janke <jankeso <at> gmail.com> writes:
>
> MJ> Yes, that is exactly the case - why on earth would someone want that?
> MJ> This results in just some sorting madness!
>
> Complaints have been made about glibc's absurd and insane preference for
> case insensitive collation (at least in en and the euro locales) for
> nearly 20 years now.  All w/o resolution.
>
> The other place this hits, and where many first saw it, is that it
> allows a command like 'rm [a-z]*' to unlink(2) files like Makefile.
>
> -JimC

This is much more severe than I imagined. I'd like to look into this -
how come, for so many years, this behavior could have been
ignored/accepted? There must be some explanation to it, I'm just
wondering how good it is for people who get their stuff deleted by
scripts which failed to reset locale settings.




Information forwarded to bug-coreutils <at> gnu.org:
bug#9418; Package coreutils. (Mon, 17 Oct 2011 12:03:02 GMT) Full text and rfc822 format available.

Message #43 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Philipp Thomas <pth <at> suse.de>
To: bug-coreutils <at> gnu.org
Subject: Re: bug#9418: Fwd: bug#9418: case sensitivity buggy in sort
Date: Mon, 17 Oct 2011 14:01:40 +0200
* James Cloos (cloos <at> jhcloos.com) [20110904 16:41]:

> Complaints have been made about glibc's absurd and insane preference for
> case insensitive collation (at least in en and the euro locales)

It's not glibc's preference but the collation rules for a given locale! If
you want to complain, do so to the relevant ISO standardisation commitee.


> The other place this hits, and where many first saw it, is that it
> allows a command like 'rm [a-z]*' to unlink(2) files like Makefile.

Set your locale to C and you don't have that problem.

Philipp





Information forwarded to bug-coreutils <at> gnu.org:
bug#9418; Package coreutils. (Mon, 17 Oct 2011 12:16:02 GMT) Full text and rfc822 format available.

Message #46 received at 9418 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Michał Janke <jankeso <at> gmail.com>
Cc: 9418 <at> debbugs.gnu.org, James Cloos <cloos <at> jhcloos.com>
Subject: Re: bug#9418: Fwd: bug#9418: case sensitivity buggy in sort
Date: Mon, 17 Oct 2011 14:15:03 +0200
tags 9418 + notabug
close 9418
thanks

Michał Janke wrote:
...
> This is much more severe than I imagined. I'd like to look into this -
> how come, for so many years, this behavior could have been
> ignored/accepted? There must be some explanation to it, I'm just
> wondering how good it is for people who get their stuff deleted by
> scripts which failed to reset locale settings.

Thanks for the report, but this is not a coreutils bug.
You or your distribution chose which locale you use,
and that is what determines whether you see this behavior.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 14 Nov 2011 12:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 12 years and 159 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.