GNU bug report logs - #9612
sort: avoid a NaN-induced infloop

Previous Next

Package: coreutils;

Reported by: Jim Meyering <jim <at> meyering.net>

Date: Tue, 27 Sep 2011 14:50:02 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 9612 in the body.
You can then email your comments to 9612 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#9612; Package coreutils. (Tue, 27 Sep 2011 14:50:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jim Meyering <jim <at> meyering.net>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Tue, 27 Sep 2011 14:50:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: bug-coreutils <at> gnu.org
Subject: sort: avoid a NaN-induced infloop
Date: Tue, 27 Sep 2011 16:48:45 +0200
This was reported by Aaron Denney in http://bugs.debian.org/642557.

Who would have thought that including a few NaNs in the input to sort
would make it infloop.

The original failure arose only when sort was reading from a pipe:

    yes -- -nan | head -156903 | sort -g > /dev/null

But it's not always easy to reproduce.
For example, on a uni-core Debian unstable amd64 system, the bug strikes
every time with the distro-supplied binary, but not at all on a fedora
rawhide-supplied binary in a multi-core VM.  However, when I compile
coreutils-8.13 myself on that same rawhide system, *that* sort fails
every time.

Setting up to debug was a little tricky.
In order to invoke sort-reading-from-pipe from gdb, I had to use a fifo:

    mkfifo sort-test-pipe
    yes -- -nan | head -156903 | sort-test-pipe &

Then, with gdb, you can do this:

    gdb sort
    run -g < sort-test-pipe


Debugging it, I was surprised to see the infinite loop iterating through
these "*"-marked lines, with nfiles == 2 and always resetting i = 0:

  (gdb) l
  2893         Since this only reorders two items if one is strictly greater than
  2894         the other, it is stable. */
  2895      for (i = 0; i < nfiles; ++i)
  2896        ord[i] = i;
* 2897      for (i = 1; i < nfiles; ++i)
* 2898        if (0 < compare (cur[ord[i - 1]], cur[ord[i]]))
* 2899          t = ord[i - 1], ord[i - 1] = ord[i], ord[i] = t, i = 0;
  2900
  2901      /* Repeatedly output the smallest line until no input remains. */
  2902      while (nfiles)
  (gdb) p ord[i-1]
  $6 = 0
  (gdb) p ord[i]
  $7 = 1
  (gdb) p compare (cur[0], cur[1])
  $8 = 64
  (gdb) p compare (cur[1], cur[0])
  $9 = 64

That shows       cur[0] > cur[1], and then, on the very next line
a contradiction, cur[1] > cur[0]

But it's not the classic NaN problem at all.  That would be too easy ;-)
That code already handles NaNs (via the general_numcompare function),
detecting the unusual case and comparing NaNs with memcmp.
Here's most of the general_numcompare function:

    static int
    general_numcompare (char const *sa, char const *sb)
    {
      char *ea;
      char *eb;
      long_double a = strtold (sa, &ea);
      long_double b = strtold (sb, &eb);
      ...
      /* Sort numbers in the usual way, where -0 == +0.  Put NaNs after
         conversion errors but before numbers; sort them by internal
         bit-pattern, for lack of a more portable alternative.  */
      return (a < b ? -1
              : a > b ? 1
              : a == b ? 0
              : b == b ? -1
              : a == a ? 1
              : memcmp (&a, &b, sizeof a));
    }

What could cause such bogosity?
Here's the quick answer:

The statement "long double x = NAN;" (inside glibc's strtold) leaves many
bits of "x" uninitialized.  For most uses of a NaN, that doesn't matter,
but here, since we're actually trying to order the different NaNs, it makes
all of the difference.  [ I'm about to file the glibc bug -- or maybe it's
a bug in the definition of gcc's __builtin_nan.  Haven't looked yet.  ]

What happens when we infloop is that the surrounding code always happens
to ensure that differing bits are 0x40 in the left operand and 0x00 in the
right operand to memcmp.  Then you will see precisely the above behavior.

One "fix" is to return 0 rather than bothering with the memcmp call above,
thus treating all NaNs as equal, rather than attempting to order them
based on their internal bit patterns.  But that is undesirable since
it would render sort's output indeterminate whenever it contains different
types of NaN values.   E.g., NaN, NaN(1), NaN(2), etc. all result in
different bit patterns.

For example, currently, this works as you'd expect on a system with a
NaN(...)-aware strtold, like anything glibc-based:

    $ for i in $(seq 9); do echo "NaN($i)";done|sort -gr
    NaN(9)
    NaN(8)
    NaN(7)
    NaN(6)
    NaN(5)
    NaN(4)
    NaN(3)
    NaN(2)
    NaN(1)

If we were to remove the memcmp, those would not be sorted.

----------------------------------------
A kludgey fix that serves to demonstrate the problem would be to
replace the two strtold invocations with these:

      long_double a = 0; a = strtold (sa, &ea);
      long_double b = 0; b = strtold (sb, &eb);

That does the job at least when compiling with gcc -ggdb3,
but minimal optimization could well eliminate the dead stores.

----------------------------------------
The real solution is to change glibc's strtold so that its "long double"
result contains no uninitialized bit.  In the shorter term, I'd like
a gnulib strtold module that works around the problem.
However, gnulib's strtod.c simply ignores the digits in
NaN(DDDDD), while glibc's implementation actually uses them.
That should be easy to fix, right?  Just merge in the glibc changes?
But no, they rely on ieee754.h's "union ieee854_long_double",
and that is not portable.  So I have resorted to an even shorter-term
work-around in the patch below.

=-=================================

Consequences: whenever qsort's comparison function "lies" (i.e., returns
1 for a < b, -1 for a > b, or 0 for a != b, qsort's behavior is undefined
(i.e., it may well segfault).  GNU sort(1) uses qsort only incidentally,
to sort month names, and nowhere else, but now, you've seen that sort(1)
had a similar problem when its comparison function is inconsistent.

--------------------------------------------------
BTW, once I found the problem code, I realized I could provoke
the infloop with two one-line inputs using sort's --merge option:

    echo nan > 1; sort -m -g 1 1

--------------------------------
Here's a little program to demonstrate the problem.
Note how the results differ with compilation options:

  #include <stdlib.h>
  #include <stdio.h>

  static char *
  fmt_nan (long double x)
  {
    unsigned int i;
    static char buf[33];
    unsigned char const *p = (unsigned char const *) &x;
    for (i = 0; i < sizeof x; i++)
      sprintf (buf + 2*i, "%02x", *p++);
    return buf;
  }

  int
  main ()
  {
    const char *q = "nan";
    long double x = strtold (q, NULL);
    printf ("%s\n", fmt_nan (x));

    x = 0;
    x = strtold (q, NULL);
    printf ("%s\n", fmt_nan (x));

    return 0;
  }

  $ gcc -O0 -Wall -Wextra -W /t/strtold-bogosity.c && ./a.out
  00000000000000c0ff7f400000000000
  00000000000000c0ff7f000000000000
  $ gcc -O1 -Wall -Wextra -W /t/strtold-bogosity.c && ./a.out
  00000000000000c0ff7f000000000000
  00000000000000c0ff7f000000000000

-------------------------------------------------------
And here's the patch I expect to apply for now:

From 17f70721d10f02543ae9e2d4e2c4b2babe0206a7 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Tue, 27 Sep 2011 16:32:35 +0200
Subject: [PATCH] sort: avoid a NaN-induced infloop

These commands would fail to terminate:
    yes -- -nan | head -156903 | sort -g > /dev/null
    echo nan > F; sort -m -g F F
That can happen with any strtold implementation that includes
uninitialized data in its return value.  The problem arises in the
mergefps function when bubble-sorting the two or more lines, each
from one of the input streams being merged: compare(a,b) returns 64,
yet compare(b,a) also returns a positive value.  With a broken
comparison function like that, the bubble sort never terminates.
Why do the long-double bit strings corresponding to two identical
"nan" strings not compare equal?  Because some parts of the result
are uninitialized and thus depend on the state of the stack.
For more details, see http://bugs.gnu.org/FIXME.
* src/sort.c (nan_compare): New function.
(general_numcompare): Use it rather than bare memcmp.
Reported by Aaron Denney in http://bugs.debian.org/642557.
* NEWS (Bug fixes): Mention it.
* tests/misc/sort-NaN-infloop: New file.
* tests/Makefile.am (TESTS): Add it.
---
 NEWS                        |    4 ++++
 src/sort.c                  |   20 +++++++++++++++++++-
 tests/Makefile.am           |    1 +
 tests/misc/sort-NaN-infloop |   28 ++++++++++++++++++++++++++++
 4 files changed, 52 insertions(+), 1 deletions(-)
 create mode 100755 tests/misc/sort-NaN-infloop

diff --git a/NEWS b/NEWS
index 140e6fa..f05a088 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,10 @@ GNU coreutils NEWS                                    -*- outline -*-

 * Noteworthy changes in release ?.? (????-??-??) [?]

+** Bug fixes
+
+  sort -g no longer infloops for certain inputs containing NaNs
+
 ** Improvements

   md5sum --check now supports the -r format from the corresponding BSD tool.
diff --git a/src/sort.c b/src/sort.c
index 3d3119d..3e94a6e 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -1910,6 +1910,24 @@ numcompare (char const *a, char const *b)
   return strnumcmp (a, b, decimal_point, thousands_sep);
 }

+/* Work around a problem whereby the long double value returned by glibc's
+   strtold ("NaN", ...) contains uninitialized bits: clear all bytes of
+   A and B before calling strtold.  FIXME: remove this function once
+   gnulib guarantees that strtold's result is always well defined.  */
+static int
+nan_compare (char const *sa, char const *sb)
+{
+  long_double a;
+  memset (&a, 0, sizeof a);
+  a = strtold (sa, NULL);
+
+  long_double b;
+  memset (&b, 0, sizeof b);
+  b = strtold (sb, NULL);
+
+  return memcmp (&a, &b, sizeof a);
+}
+
 static int
 general_numcompare (char const *sa, char const *sb)
 {
@@ -1935,7 +1953,7 @@ general_numcompare (char const *sa, char const *sb)
           : a == b ? 0
           : b == b ? -1
           : a == a ? 1
-          : memcmp (&a, &b, sizeof a));
+          : nan_compare (sa, sb));
 }

 /* Return an integer in 1..12 of the month name MONTH.
diff --git a/tests/Makefile.am b/tests/Makefile.am
index eeb4cab..2cf409a 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -250,6 +250,7 @@ TESTS =						\
   misc/sort-unique				\
   misc/sort-unique-segv				\
   misc/sort-version				\
+  misc/sort-NaN-infloop				\
   split/filter					\
   split/suffix-length				\
   split/b-chunk					\
diff --git a/tests/misc/sort-NaN-infloop b/tests/misc/sort-NaN-infloop
new file mode 100755
index 0000000..ead871e
--- /dev/null
+++ b/tests/misc/sort-NaN-infloop
@@ -0,0 +1,28 @@
+#!/bin/sh
+# exercise the NaN-infloop bug in coreutils-8.13
+
+# Copyright (C) 2011 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+. "${srcdir=.}/init.sh"; path_prepend_ ../src
+print_ver_ sort
+
+echo nan > F || fail=1
+printf 'nan\nnan\n' > exp || fail=1
+timeout 10 sort -g -m F F > out || fail=1
+
+compare out exp || fail=1
+
+Exit $fail
--
1.7.7.rc0.362.g5a14




Information forwarded to bug-coreutils <at> gnu.org:
bug#9612; Package coreutils. (Tue, 27 Sep 2011 17:15:01 GMT) Full text and rfc822 format available.

Message #8 received at 9612 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 9612 <at> debbugs.gnu.org
Subject: Re: bug#9612: sort: avoid a NaN-induced infloop
Date: Tue, 27 Sep 2011 19:13:20 +0200
Jim Meyering <jim <at> meyering.net> writes:

> The statement "long double x = NAN;" (inside glibc's strtold) leaves many
> bits of "x" uninitialized.

You are looking at padding bits, which have unspecified contents.

Andreas.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."




Information forwarded to bug-coreutils <at> gnu.org:
bug#9612; Package coreutils. (Tue, 27 Sep 2011 17:26:03 GMT) Full text and rfc822 format available.

Message #11 received at 9612 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Andreas Schwab <schwab <at> linux-m68k.org>
Cc: 9612 <at> debbugs.gnu.org
Subject: Re: bug#9612: sort: avoid a NaN-induced infloop
Date: Tue, 27 Sep 2011 19:24:25 +0200
Andreas Schwab wrote:
> Jim Meyering <jim <at> meyering.net> writes:
>
>> The statement "long double x = NAN;" (inside glibc's strtold) leaves many
>> bits of "x" uninitialized.
>
> You are looking at padding bits, which have unspecified contents.

I realize that they are unspecified.
That is why I did not claim that this is a POSIX violation.
POSIX says little about what strtold must do for a "NaN" string.
However, I am dismayed that with glibc's strtold the values of those bits
is not deterministic.  That makes sorting NaN values obtained from strtold
unnecessarily hard.




Information forwarded to bug-coreutils <at> gnu.org:
bug#9612; Package coreutils. (Tue, 27 Sep 2011 20:09:01 GMT) Full text and rfc822 format available.

Message #14 received at 9612 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 9612 <at> debbugs.gnu.org
Subject: Re: bug#9612: sort: avoid a NaN-induced infloop
Date: Tue, 27 Sep 2011 22:07:52 +0200
Jim Meyering <jim <at> meyering.net> writes:

> However, I am dismayed that with glibc's strtold the values of those bits
> is not deterministic.

Padding bits can change any time.

Andreas.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."




Information forwarded to bug-coreutils <at> gnu.org:
bug#9612; Package coreutils. (Tue, 27 Sep 2011 20:13:01 GMT) Full text and rfc822 format available.

Message #17 received at 9612 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Andreas Schwab <schwab <at> linux-m68k.org>
Cc: Jim Meyering <jim <at> meyering.net>, 9612 <at> debbugs.gnu.org
Subject: Re: bug#9612: sort: avoid a NaN-induced infloop
Date: Tue, 27 Sep 2011 13:11:33 -0700
On 09/27/11 13:07, Andreas Schwab wrote:
> Padding bits can change any time.

Is there any way to compare the non-padding parts of long doubles?
There ought to be *some* way to get the fractional part of a NaN, no?
For example, with glibc, is there some way to sprintf to a buffer
and get the fraction out of the text in the buffer?




Information forwarded to bug-coreutils <at> gnu.org:
bug#9612; Package coreutils. (Tue, 27 Sep 2011 20:34:02 GMT) Full text and rfc822 format available.

Message #20 received at 9612 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Andreas Schwab <schwab <at> linux-m68k.org>, 9612 <at> debbugs.gnu.org
Subject: Re: bug#9612: sort: avoid a NaN-induced infloop
Date: Tue, 27 Sep 2011 22:32:54 +0200
Paul Eggert wrote:
> On 09/27/11 13:07, Andreas Schwab wrote:
>> Padding bits can change any time.
>
> Is there any way to compare the non-padding parts of long doubles?
> There ought to be *some* way to get the fractional part of a NaN, no?
> For example, with glibc, is there some way to sprintf to a buffer
> and get the fraction out of the text in the buffer?

Good idea.

Since the problem may be glibc-specific the work-around can be, too.
/usr/include/ieee754.h defines this:

union ieee854_long_double
  {
    long double d;

    /* This is the IEEE 854 double-extended-precision format.  */
    struct
      {
#if     __BYTE_ORDER == __BIG_ENDIAN
        unsigned int negative:1;
        unsigned int exponent:15;
        unsigned int empty:16;
        unsigned int mantissa0:32;
        unsigned int mantissa1:32;
#endif
#if     __BYTE_ORDER == __LITTLE_ENDIAN
# if    __FLOAT_WORD_ORDER == __BIG_ENDIAN
        unsigned int exponent:15;
        unsigned int negative:1;
        unsigned int empty:16;
        unsigned int mantissa0:32;
        unsigned int mantissa1:32;
# else
        unsigned int mantissa1:32;
        unsigned int mantissa0:32;
        unsigned int exponent:15;
        unsigned int negative:1;
        unsigned int empty:16;
# endif
#endif
      } ieee;

      ...

I went ahead and committed my expedient patch,
with adjusted URL in the log.

I'm sure we'll get something better, though...

BTW, this was introduced between 8.5 and 8.6, and I'll update NEWS with that.
While I haven't found the specific commit, it's probably for this feature:

    * Noteworthy changes in release 8.6 (2010-10-15) [stable]
      sort -g now uses long doubles for greater range and precision.




Information forwarded to bug-coreutils <at> gnu.org:
bug#9612; Package coreutils. (Tue, 27 Sep 2011 20:54:02 GMT) Full text and rfc822 format available.

Message #23 received at submit <at> debbugs.gnu.org (full text, mbox):

From: John Reiser <jreiser <at> bitwagon.com>
To: bug-coreutils <at> gnu.org
Subject: Re: bug#9612: sort: avoid a NaN-induced infloop
Date: Tue, 27 Sep 2011 13:54:38 -0700
> Is there any way to compare the non-padding parts of long doubles?
> There ought to be *some* way to get the fractional part of a NaN, no?

frexp() [man 3 frexp] would be the right idea,
except that it fails explicitly for NaN.

-- 




Information forwarded to bug-coreutils <at> gnu.org:
bug#9612; Package coreutils. (Tue, 27 Sep 2011 21:43:02 GMT) Full text and rfc822 format available.

Message #26 received at 9612 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Jim Meyering <jim <at> meyering.net>, 9612 <at> debbugs.gnu.org
Subject: Re: bug#9612: sort: avoid a NaN-induced infloop
Date: Tue, 27 Sep 2011 23:41:05 +0200
Paul Eggert <eggert <at> cs.ucla.edu> writes:

> On 09/27/11 13:07, Andreas Schwab wrote:
>> Padding bits can change any time.
>
> Is there any way to compare the non-padding parts of long doubles?

By ignoring the padding.

> There ought to be *some* way to get the fractional part of a NaN, no?

You need to inspect the bytes of the representation yourself.

Andreas.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."




Information forwarded to bug-coreutils <at> gnu.org:
bug#9612; Package coreutils. (Sat, 01 Oct 2011 17:44:02 GMT) Full text and rfc822 format available.

Message #29 received at 9612 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: 9612 <at> debbugs.gnu.org
Subject: Re: bug#9612: sort: avoid a NaN-induced infloop
Date: Sat, 01 Oct 2011 19:41:53 +0200
Jim Meyering wrote:
> Who would have thought that including a few NaNs in the input to sort
> would make it infloop.
>
> The original failure arose only when sort was reading from a pipe:
>
>     yes -- -nan | head -156903 | sort -g > /dev/null
>
> But it's not always easy to reproduce.
...
> [ I'm about to file the glibc bug
...

I've done that:

  RFE: strtold: do not include uninitialized bytes when converting "NaN"
  http://sourceware.org/bugzilla/show_bug.cgi?id=13246




Reply sent to Jim Meyering <jim <at> meyering.net>:
You have taken responsibility. (Thu, 06 Oct 2011 14:58:02 GMT) Full text and rfc822 format available.

Notification sent to Jim Meyering <jim <at> meyering.net>:
bug acknowledged by developer. (Thu, 06 Oct 2011 14:58:02 GMT) Full text and rfc822 format available.

Message #34 received at 9612-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: 9612-done <at> debbugs.gnu.org
Subject: Re: bug#9612: sort: avoid a NaN-induced infloop
Date: Thu, 06 Oct 2011 16:56:57 +0200
Jim Meyering wrote:
> Jim Meyering wrote:
>> Who would have thought that including a few NaNs in the input to sort
>> would make it infloop.
>>
>> The original failure arose only when sort was reading from a pipe:
>>
>>     yes -- -nan | head -156903 | sort -g > /dev/null
>>
>> But it's not always easy to reproduce.
> ...
>> [ I'm about to file the glibc bug
> ...
>
> I've done that:
>
>   RFE: strtold: do not include uninitialized bytes when converting "NaN"
>   http://sourceware.org/bugzilla/show_bug.cgi?id=13246

In the above BZ comments, Jakub Jelinek proposed something more robust,
but I think it's not necessary.  If we find an optimizer that thinks the
added memset is useless, we'll discover it via the failing test, so I'd
prefer not to add the volatile/union hack.

I'm closing this.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 04 Nov 2011 11:24:02 GMT) Full text and rfc822 format available.

This bug report was last modified 13 years and 23 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.