GNU bug report logs -
#9612
sort: avoid a NaN-induced infloop
Previous Next
Reported by: Jim Meyering <jim <at> meyering.net>
Date: Tue, 27 Sep 2011 14:50:02 UTC
Severity: normal
Done: Jim Meyering <jim <at> meyering.net>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 9612 in the body.
You can then email your comments to 9612 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#9612
; Package
coreutils
.
(Tue, 27 Sep 2011 14:50:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Jim Meyering <jim <at> meyering.net>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Tue, 27 Sep 2011 14:50:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
This was reported by Aaron Denney in http://bugs.debian.org/642557.
Who would have thought that including a few NaNs in the input to sort
would make it infloop.
The original failure arose only when sort was reading from a pipe:
yes -- -nan | head -156903 | sort -g > /dev/null
But it's not always easy to reproduce.
For example, on a uni-core Debian unstable amd64 system, the bug strikes
every time with the distro-supplied binary, but not at all on a fedora
rawhide-supplied binary in a multi-core VM. However, when I compile
coreutils-8.13 myself on that same rawhide system, *that* sort fails
every time.
Setting up to debug was a little tricky.
In order to invoke sort-reading-from-pipe from gdb, I had to use a fifo:
mkfifo sort-test-pipe
yes -- -nan | head -156903 | sort-test-pipe &
Then, with gdb, you can do this:
gdb sort
run -g < sort-test-pipe
Debugging it, I was surprised to see the infinite loop iterating through
these "*"-marked lines, with nfiles == 2 and always resetting i = 0:
(gdb) l
2893 Since this only reorders two items if one is strictly greater than
2894 the other, it is stable. */
2895 for (i = 0; i < nfiles; ++i)
2896 ord[i] = i;
* 2897 for (i = 1; i < nfiles; ++i)
* 2898 if (0 < compare (cur[ord[i - 1]], cur[ord[i]]))
* 2899 t = ord[i - 1], ord[i - 1] = ord[i], ord[i] = t, i = 0;
2900
2901 /* Repeatedly output the smallest line until no input remains. */
2902 while (nfiles)
(gdb) p ord[i-1]
$6 = 0
(gdb) p ord[i]
$7 = 1
(gdb) p compare (cur[0], cur[1])
$8 = 64
(gdb) p compare (cur[1], cur[0])
$9 = 64
That shows cur[0] > cur[1], and then, on the very next line
a contradiction, cur[1] > cur[0]
But it's not the classic NaN problem at all. That would be too easy ;-)
That code already handles NaNs (via the general_numcompare function),
detecting the unusual case and comparing NaNs with memcmp.
Here's most of the general_numcompare function:
static int
general_numcompare (char const *sa, char const *sb)
{
char *ea;
char *eb;
long_double a = strtold (sa, &ea);
long_double b = strtold (sb, &eb);
...
/* Sort numbers in the usual way, where -0 == +0. Put NaNs after
conversion errors but before numbers; sort them by internal
bit-pattern, for lack of a more portable alternative. */
return (a < b ? -1
: a > b ? 1
: a == b ? 0
: b == b ? -1
: a == a ? 1
: memcmp (&a, &b, sizeof a));
}
What could cause such bogosity?
Here's the quick answer:
The statement "long double x = NAN;" (inside glibc's strtold) leaves many
bits of "x" uninitialized. For most uses of a NaN, that doesn't matter,
but here, since we're actually trying to order the different NaNs, it makes
all of the difference. [ I'm about to file the glibc bug -- or maybe it's
a bug in the definition of gcc's __builtin_nan. Haven't looked yet. ]
What happens when we infloop is that the surrounding code always happens
to ensure that differing bits are 0x40 in the left operand and 0x00 in the
right operand to memcmp. Then you will see precisely the above behavior.
One "fix" is to return 0 rather than bothering with the memcmp call above,
thus treating all NaNs as equal, rather than attempting to order them
based on their internal bit patterns. But that is undesirable since
it would render sort's output indeterminate whenever it contains different
types of NaN values. E.g., NaN, NaN(1), NaN(2), etc. all result in
different bit patterns.
For example, currently, this works as you'd expect on a system with a
NaN(...)-aware strtold, like anything glibc-based:
$ for i in $(seq 9); do echo "NaN($i)";done|sort -gr
NaN(9)
NaN(8)
NaN(7)
NaN(6)
NaN(5)
NaN(4)
NaN(3)
NaN(2)
NaN(1)
If we were to remove the memcmp, those would not be sorted.
----------------------------------------
A kludgey fix that serves to demonstrate the problem would be to
replace the two strtold invocations with these:
long_double a = 0; a = strtold (sa, &ea);
long_double b = 0; b = strtold (sb, &eb);
That does the job at least when compiling with gcc -ggdb3,
but minimal optimization could well eliminate the dead stores.
----------------------------------------
The real solution is to change glibc's strtold so that its "long double"
result contains no uninitialized bit. In the shorter term, I'd like
a gnulib strtold module that works around the problem.
However, gnulib's strtod.c simply ignores the digits in
NaN(DDDDD), while glibc's implementation actually uses them.
That should be easy to fix, right? Just merge in the glibc changes?
But no, they rely on ieee754.h's "union ieee854_long_double",
and that is not portable. So I have resorted to an even shorter-term
work-around in the patch below.
=-=================================
Consequences: whenever qsort's comparison function "lies" (i.e., returns
1 for a < b, -1 for a > b, or 0 for a != b, qsort's behavior is undefined
(i.e., it may well segfault). GNU sort(1) uses qsort only incidentally,
to sort month names, and nowhere else, but now, you've seen that sort(1)
had a similar problem when its comparison function is inconsistent.
--------------------------------------------------
BTW, once I found the problem code, I realized I could provoke
the infloop with two one-line inputs using sort's --merge option:
echo nan > 1; sort -m -g 1 1
--------------------------------
Here's a little program to demonstrate the problem.
Note how the results differ with compilation options:
#include <stdlib.h>
#include <stdio.h>
static char *
fmt_nan (long double x)
{
unsigned int i;
static char buf[33];
unsigned char const *p = (unsigned char const *) &x;
for (i = 0; i < sizeof x; i++)
sprintf (buf + 2*i, "%02x", *p++);
return buf;
}
int
main ()
{
const char *q = "nan";
long double x = strtold (q, NULL);
printf ("%s\n", fmt_nan (x));
x = 0;
x = strtold (q, NULL);
printf ("%s\n", fmt_nan (x));
return 0;
}
$ gcc -O0 -Wall -Wextra -W /t/strtold-bogosity.c && ./a.out
00000000000000c0ff7f400000000000
00000000000000c0ff7f000000000000
$ gcc -O1 -Wall -Wextra -W /t/strtold-bogosity.c && ./a.out
00000000000000c0ff7f000000000000
00000000000000c0ff7f000000000000
-------------------------------------------------------
And here's the patch I expect to apply for now:
From 17f70721d10f02543ae9e2d4e2c4b2babe0206a7 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Tue, 27 Sep 2011 16:32:35 +0200
Subject: [PATCH] sort: avoid a NaN-induced infloop
These commands would fail to terminate:
yes -- -nan | head -156903 | sort -g > /dev/null
echo nan > F; sort -m -g F F
That can happen with any strtold implementation that includes
uninitialized data in its return value. The problem arises in the
mergefps function when bubble-sorting the two or more lines, each
from one of the input streams being merged: compare(a,b) returns 64,
yet compare(b,a) also returns a positive value. With a broken
comparison function like that, the bubble sort never terminates.
Why do the long-double bit strings corresponding to two identical
"nan" strings not compare equal? Because some parts of the result
are uninitialized and thus depend on the state of the stack.
For more details, see http://bugs.gnu.org/FIXME.
* src/sort.c (nan_compare): New function.
(general_numcompare): Use it rather than bare memcmp.
Reported by Aaron Denney in http://bugs.debian.org/642557.
* NEWS (Bug fixes): Mention it.
* tests/misc/sort-NaN-infloop: New file.
* tests/Makefile.am (TESTS): Add it.
---
NEWS | 4 ++++
src/sort.c | 20 +++++++++++++++++++-
tests/Makefile.am | 1 +
tests/misc/sort-NaN-infloop | 28 ++++++++++++++++++++++++++++
4 files changed, 52 insertions(+), 1 deletions(-)
create mode 100755 tests/misc/sort-NaN-infloop
diff --git a/NEWS b/NEWS
index 140e6fa..f05a088 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,10 @@ GNU coreutils NEWS -*- outline -*-
* Noteworthy changes in release ?.? (????-??-??) [?]
+** Bug fixes
+
+ sort -g no longer infloops for certain inputs containing NaNs
+
** Improvements
md5sum --check now supports the -r format from the corresponding BSD tool.
diff --git a/src/sort.c b/src/sort.c
index 3d3119d..3e94a6e 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -1910,6 +1910,24 @@ numcompare (char const *a, char const *b)
return strnumcmp (a, b, decimal_point, thousands_sep);
}
+/* Work around a problem whereby the long double value returned by glibc's
+ strtold ("NaN", ...) contains uninitialized bits: clear all bytes of
+ A and B before calling strtold. FIXME: remove this function once
+ gnulib guarantees that strtold's result is always well defined. */
+static int
+nan_compare (char const *sa, char const *sb)
+{
+ long_double a;
+ memset (&a, 0, sizeof a);
+ a = strtold (sa, NULL);
+
+ long_double b;
+ memset (&b, 0, sizeof b);
+ b = strtold (sb, NULL);
+
+ return memcmp (&a, &b, sizeof a);
+}
+
static int
general_numcompare (char const *sa, char const *sb)
{
@@ -1935,7 +1953,7 @@ general_numcompare (char const *sa, char const *sb)
: a == b ? 0
: b == b ? -1
: a == a ? 1
- : memcmp (&a, &b, sizeof a));
+ : nan_compare (sa, sb));
}
/* Return an integer in 1..12 of the month name MONTH.
diff --git a/tests/Makefile.am b/tests/Makefile.am
index eeb4cab..2cf409a 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -250,6 +250,7 @@ TESTS = \
misc/sort-unique \
misc/sort-unique-segv \
misc/sort-version \
+ misc/sort-NaN-infloop \
split/filter \
split/suffix-length \
split/b-chunk \
diff --git a/tests/misc/sort-NaN-infloop b/tests/misc/sort-NaN-infloop
new file mode 100755
index 0000000..ead871e
--- /dev/null
+++ b/tests/misc/sort-NaN-infloop
@@ -0,0 +1,28 @@
+#!/bin/sh
+# exercise the NaN-infloop bug in coreutils-8.13
+
+# Copyright (C) 2011 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+. "${srcdir=.}/init.sh"; path_prepend_ ../src
+print_ver_ sort
+
+echo nan > F || fail=1
+printf 'nan\nnan\n' > exp || fail=1
+timeout 10 sort -g -m F F > out || fail=1
+
+compare out exp || fail=1
+
+Exit $fail
--
1.7.7.rc0.362.g5a14
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#9612
; Package
coreutils
.
(Tue, 27 Sep 2011 17:15:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 9612 <at> debbugs.gnu.org (full text, mbox):
Jim Meyering <jim <at> meyering.net> writes:
> The statement "long double x = NAN;" (inside glibc's strtold) leaves many
> bits of "x" uninitialized.
You are looking at padding bits, which have unspecified contents.
Andreas.
--
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#9612
; Package
coreutils
.
(Tue, 27 Sep 2011 17:26:03 GMT)
Full text and
rfc822 format available.
Message #11 received at 9612 <at> debbugs.gnu.org (full text, mbox):
Andreas Schwab wrote:
> Jim Meyering <jim <at> meyering.net> writes:
>
>> The statement "long double x = NAN;" (inside glibc's strtold) leaves many
>> bits of "x" uninitialized.
>
> You are looking at padding bits, which have unspecified contents.
I realize that they are unspecified.
That is why I did not claim that this is a POSIX violation.
POSIX says little about what strtold must do for a "NaN" string.
However, I am dismayed that with glibc's strtold the values of those bits
is not deterministic. That makes sorting NaN values obtained from strtold
unnecessarily hard.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#9612
; Package
coreutils
.
(Tue, 27 Sep 2011 20:09:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 9612 <at> debbugs.gnu.org (full text, mbox):
Jim Meyering <jim <at> meyering.net> writes:
> However, I am dismayed that with glibc's strtold the values of those bits
> is not deterministic.
Padding bits can change any time.
Andreas.
--
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#9612
; Package
coreutils
.
(Tue, 27 Sep 2011 20:13:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 9612 <at> debbugs.gnu.org (full text, mbox):
On 09/27/11 13:07, Andreas Schwab wrote:
> Padding bits can change any time.
Is there any way to compare the non-padding parts of long doubles?
There ought to be *some* way to get the fractional part of a NaN, no?
For example, with glibc, is there some way to sprintf to a buffer
and get the fraction out of the text in the buffer?
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#9612
; Package
coreutils
.
(Tue, 27 Sep 2011 20:34:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 9612 <at> debbugs.gnu.org (full text, mbox):
Paul Eggert wrote:
> On 09/27/11 13:07, Andreas Schwab wrote:
>> Padding bits can change any time.
>
> Is there any way to compare the non-padding parts of long doubles?
> There ought to be *some* way to get the fractional part of a NaN, no?
> For example, with glibc, is there some way to sprintf to a buffer
> and get the fraction out of the text in the buffer?
Good idea.
Since the problem may be glibc-specific the work-around can be, too.
/usr/include/ieee754.h defines this:
union ieee854_long_double
{
long double d;
/* This is the IEEE 854 double-extended-precision format. */
struct
{
#if __BYTE_ORDER == __BIG_ENDIAN
unsigned int negative:1;
unsigned int exponent:15;
unsigned int empty:16;
unsigned int mantissa0:32;
unsigned int mantissa1:32;
#endif
#if __BYTE_ORDER == __LITTLE_ENDIAN
# if __FLOAT_WORD_ORDER == __BIG_ENDIAN
unsigned int exponent:15;
unsigned int negative:1;
unsigned int empty:16;
unsigned int mantissa0:32;
unsigned int mantissa1:32;
# else
unsigned int mantissa1:32;
unsigned int mantissa0:32;
unsigned int exponent:15;
unsigned int negative:1;
unsigned int empty:16;
# endif
#endif
} ieee;
...
I went ahead and committed my expedient patch,
with adjusted URL in the log.
I'm sure we'll get something better, though...
BTW, this was introduced between 8.5 and 8.6, and I'll update NEWS with that.
While I haven't found the specific commit, it's probably for this feature:
* Noteworthy changes in release 8.6 (2010-10-15) [stable]
sort -g now uses long doubles for greater range and precision.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#9612
; Package
coreutils
.
(Tue, 27 Sep 2011 20:54:02 GMT)
Full text and
rfc822 format available.
Message #23 received at submit <at> debbugs.gnu.org (full text, mbox):
> Is there any way to compare the non-padding parts of long doubles?
> There ought to be *some* way to get the fractional part of a NaN, no?
frexp() [man 3 frexp] would be the right idea,
except that it fails explicitly for NaN.
--
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#9612
; Package
coreutils
.
(Tue, 27 Sep 2011 21:43:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 9612 <at> debbugs.gnu.org (full text, mbox):
Paul Eggert <eggert <at> cs.ucla.edu> writes:
> On 09/27/11 13:07, Andreas Schwab wrote:
>> Padding bits can change any time.
>
> Is there any way to compare the non-padding parts of long doubles?
By ignoring the padding.
> There ought to be *some* way to get the fractional part of a NaN, no?
You need to inspect the bytes of the representation yourself.
Andreas.
--
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#9612
; Package
coreutils
.
(Sat, 01 Oct 2011 17:44:02 GMT)
Full text and
rfc822 format available.
Message #29 received at 9612 <at> debbugs.gnu.org (full text, mbox):
Jim Meyering wrote:
> Who would have thought that including a few NaNs in the input to sort
> would make it infloop.
>
> The original failure arose only when sort was reading from a pipe:
>
> yes -- -nan | head -156903 | sort -g > /dev/null
>
> But it's not always easy to reproduce.
...
> [ I'm about to file the glibc bug
...
I've done that:
RFE: strtold: do not include uninitialized bytes when converting "NaN"
http://sourceware.org/bugzilla/show_bug.cgi?id=13246
Reply sent
to
Jim Meyering <jim <at> meyering.net>
:
You have taken responsibility.
(Thu, 06 Oct 2011 14:58:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Jim Meyering <jim <at> meyering.net>
:
bug acknowledged by developer.
(Thu, 06 Oct 2011 14:58:02 GMT)
Full text and
rfc822 format available.
Message #34 received at 9612-done <at> debbugs.gnu.org (full text, mbox):
Jim Meyering wrote:
> Jim Meyering wrote:
>> Who would have thought that including a few NaNs in the input to sort
>> would make it infloop.
>>
>> The original failure arose only when sort was reading from a pipe:
>>
>> yes -- -nan | head -156903 | sort -g > /dev/null
>>
>> But it's not always easy to reproduce.
> ...
>> [ I'm about to file the glibc bug
> ...
>
> I've done that:
>
> RFE: strtold: do not include uninitialized bytes when converting "NaN"
> http://sourceware.org/bugzilla/show_bug.cgi?id=13246
In the above BZ comments, Jakub Jelinek proposed something more robust,
but I think it's not necessary. If we find an optimizer that thinks the
added memset is useless, we'll discover it via the failing test, so I'd
prefer not to add the volatile/union hack.
I'm closing this.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 04 Nov 2011 11:24:02 GMT)
Full text and
rfc822 format available.
This bug report was last modified 13 years and 23 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.