GNU bug report logs - #78476
GNU 'factor' problems with 128-bit word

Previous Next

Package: coreutils;

Reported by: Paul Eggert <eggert <at> cs.ucla.edu>

Date: Sun, 18 May 2025 07:40:02 UTC

Severity: normal

To reply to this bug, email your comments to 78476 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#78476; Package coreutils. (Sun, 18 May 2025 07:40:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Paul Eggert <eggert <at> cs.ucla.edu>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Sun, 18 May 2025 07:40:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Coreutils bugs <bug-coreutils <at> gnu.org>
Cc: Torbjorn Granlund <tg <at> gmplib.org>
Subject: GNU 'factor' problems with 128-bit word
Date: Sun, 18 May 2025 00:39:38 -0700
I recently found out that GNU coreutils 'factor' misbehaves if its 
internal word is 128 bits rather than the usual 64. This could happen if 
one builds Coreutils 9.7 on a platform with 128-bit uintmax_t, something 
that POSIX allows (though it's only theoretical now as far as I know).

The problem occurs when src/factor.c's prime_p function assumes that as 
a quick test, its argument N must be prime if N is less than 
FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME. I guess this quick-test 
assumption is true for 64-bit uintmax_t, but not for 128-bit.

I worked around the bug by installed a patch 
<https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=fe51b33859d2e7286425d6d5600288ce9ae43bd1>. 
This patch changes prime_p to rely on the quick-test assumption only 
when the internal word size is 64 bits. On platforms with wider words, 
the code now tests smaller numbers for primes too; this works around the 
bug, but is slower.

It'd be nice if the underlying problem were fixed, so that the code is 
more portable to 128-bit word.

The issue might be related to Bug#25135 "Infinite loop in factor" 
<https://bugs.gnu.org/25135> so I am cc'ing this to Torbjörn. The 
current factor.c is not the same as what Torbjörn contributed years ago 
so it is of course possible that I or someone else introduced the bug 
more recently.

To reproduce the problem on Fedora 42 x86-64:

git clone https://git.savannah.gnu.org/git/coreutils.git
cd coreutils
git checkout fe51b33859d2e7286425d6d5600288ce9ae43bd1
./bootstrap
./configure
make CFLAGS='-DUSE_INT128 -DEXHIBIT_INT128_BUG' WERROR_CFLAGS=
echo 340282366920938463463374607431768211355 | src/factor

The output is:

340282366920938463463374607431768211355: 155 
2195370109167344925570158757624311041

which is wrong, as 155 is not prime.





Information forwarded to bug-coreutils <at> gnu.org:
bug#78476; Package coreutils. (Sun, 18 May 2025 09:08:02 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Torbjörn Granlund <tg <at> gmplib.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Coreutils bugs <bug-coreutils <at> gnu.org>
Subject: Re: GNU 'factor' problems with 128-bit word
Date: Sun, 18 May 2025 11:07:01 +0200
Just a quick note.

I don't think is makes much sense to handle two 128-bit words in this
code.  In fact, the use of uintmax_t was a mistake, it should use
"unsigned long" or "unsigned long long" whichever is efficiently
supported directly by the hardware.

While uintmax_t could be made to work also for the cases you report, it
will be inefficient.

A few months ago, I contributed a suggested new core factoring function
to GNU coreutils, which is, IIRC, 2-3 times faster than the current code
for the bignum range.  It uses GMP's mpn functions with "Montgomery
multiplication".

There were two problems with my new core code:

1. It will not work with mini-GMP, which I think is undesiable.

2. It did not optimise for smaller factoring tasks.  The fix for this
would be to merge that from the current coreutils factor.c.

-- 
Torbjörn
Please encrypt, key id 0xC8601622




Information forwarded to bug-coreutils <at> gnu.org:
bug#78476; Package coreutils. (Mon, 19 May 2025 19:28:02 GMT) Full text and rfc822 format available.

Message #11 received at 78476 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Torbjörn Granlund <tg <at> gmplib.org>
Cc: 78476 <at> debbugs.gnu.org
Subject: Re: bug#78476: GNU 'factor' problems with 128-bit word
Date: Mon, 19 May 2025 12:27:27 -0700
On 2025-05-18 02:07, Torbjörn Granlund wrote:

> I don't think is makes much sense to handle two 128-bit words in this
> code.  In fact, the use of uintmax_t was a mistake, it should use
> "unsigned long" or "unsigned long long" whichever is efficiently
> supported directly by the hardware.

Fine, but the 128-bit bug is independent of uintmax_t. Suppose we change 
factor.c to never use uintmax_t, and suppose we have a (theoretical but 
allowed) platform where unsigned long is 128 bits and where factor.c 
uses that type for its words. Then before my Saturday coreutils 
commit[1], this code in prime_p:

  /* We have already cast out small primes.  */
  if (n < (wide_uint) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME)
    return true;

was incorrect. The problem is that when given 2**128 - 101, 
factor_using_pollard_rho generates 155 and tests it with prime_p, and 
the above code causes prime_p to incorrectly report that 155 is prime.


> While uintmax_t could be made to work also for the cases you report, it
> will be inefficient.
> 
> A few months ago, I contributed a suggested new core factoring function
> to GNU coreutils, which is, IIRC, 2-3 times faster than the current code
> for the bignum range.  It uses GMP's mpn functions with "Montgomery
> multiplication".
> 
> There were two problems with my new core code:
> 
> 1. It will not work with mini-GMP, which I think is undesiable.
> 
> 2. It did not optimise for smaller factoring tasks.  The fix for this
> would be to merge that from the current coreutils factor.c.

Thanks, I didn't know about all that. I plan to take a look at it.


[1]: 
https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=fe51b33859d2e7286425d6d5600288ce9ae43bd1




Information forwarded to bug-coreutils <at> gnu.org:
bug#78476; Package coreutils. (Mon, 19 May 2025 22:24:02 GMT) Full text and rfc822 format available.

Message #14 received at 78476 <at> debbugs.gnu.org (full text, mbox):

From: Torbjörn Granlund <tg <at> gmplib.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 78476 <at> debbugs.gnu.org
Subject: Re: bug#78476: GNU 'factor' problems with 128-bit word
Date: Tue, 20 May 2025 00:23:50 +0200
Paul Eggert <eggert <at> cs.ucla.edu> writes:

  Fine, but the 128-bit bug is independent of uintmax_t. Suppose we
  change factor.c to never use uintmax_t, and suppose we have a
  (theoretical but allowed) platform where unsigned long is 128 bits and
  where factor.c uses that type for its words. Then before my Saturday
  coreutils commit[1], this code in prime_p:

    /* We have already cast out small primes.  */
    if (n < (wide_uint) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME)
      return true;

  was incorrect.

Why is it incorrect?

I read you change, and I strongly disgree with it.

I actually find it extremely worrying that a change is motivated by "We
don't know why the code misbehaves when wide_uint is wider;" and then
the presumed bug is masked with a pretty questionable change.  Really?

How about debugging it?

Furthernorem tinkering with mature code for porting it to something with
does not exist and likely will never exist, is a bad idea.

No, we are very unlikely to see systems with 128x128->256 bit hardware
multiply support, which is the only case where forcing factor.c to do
128-bit arithmetic.

Let's use our time and effort on improving things for the word we live
in.

-- 
Torbjörn
Please encrypt, key id 0xC8601622




Information forwarded to bug-coreutils <at> gnu.org:
bug#78476; Package coreutils. (Mon, 19 May 2025 22:52:01 GMT) Full text and rfc822 format available.

Message #17 received at 78476 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Torbjörn Granlund <tg <at> gmplib.org>
Cc: 78476 <at> debbugs.gnu.org
Subject: Re: bug#78476: GNU 'factor' problems with 128-bit word
Date: Mon, 19 May 2025 15:51:12 -0700
On 5/19/25 15:23, Torbjörn Granlund wrote:
> Paul Eggert <eggert <at> cs.ucla.edu> writes:
> 
>    before my Saturday coreutils commit[1], this code in prime_p:
> 
>      /* We have already cast out small primes.  */
>      if (n < (wide_uint) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME)
>        return true;
> 
>    was incorrect.
> 
> Why is it incorrect?

Because when you run it on a machine with 128-bit unsigned long (or 
whatever type is being used for the word size), N can be 155 and 155 is 
not prime.


> I actually find it extremely worrying that a change is motivated by "We
> don't know why the code misbehaves when wide_uint is wider;" and then
> the presumed bug is masked with a pretty questionable change.  Really?

The change is questionable only because it hurts performance when the 
word size exceeds 64 bits. It's not questionable on correctness grounds. 
And it doesn't affect performance in the usual 64-bit case.


> How about debugging it?

Yes, that'd be better and that's why I filed the bug report. However, 
what's a good way to debug it?

My *guess* is that prime_p is sometimes called in a context where it's 
already guaranteed that we have cast out small primes, and it's 
sometimes called in a context where there is no such guarantee. The 
latter context is the bug. It appears that this context can occur in 
factor_using_pollard_rho (n, a, factors), which contains this code after 
the "factor_found:" label:

      if (!prime_p (g))
        factor_using_pollard_rho (g, a + 1, factors);
      else
        factor_insert (factors, g);

If I understand things correctly, there's no guarantee here that G 
cannot be a small prime, which means factor_insert can be called even 
when G is not prime, and that is a bug.


> Let's use our time and effort on improving things for the word we live
> in.

That would be reasonable if we knew that the existing code works for all 
cases in 64 bits. Is there some way to show that? (We can't exhaustively 
test it.)

Even a comment would be helpful. And, if the code is not designed to 
work with any word size other than 64 bits, there should be a static 
check to that effect. (There's already a static check that words must be 
at least 64 bits.)





Information forwarded to bug-coreutils <at> gnu.org:
bug#78476; Package coreutils. (Tue, 20 May 2025 07:05:02 GMT) Full text and rfc822 format available.

Message #20 received at 78476 <at> debbugs.gnu.org (full text, mbox):

From: Torbjörn Granlund <tg <at> gmplib.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 78476 <at> debbugs.gnu.org
Subject: Re: bug#78476: GNU 'factor' problems with 128-bit word
Date: Tue, 20 May 2025 09:04:24 +0200
Paul Eggert <eggert <at> cs.ucla.edu> writes:

  >      if (n < (wide_uint) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME)
  >        return true;
  >    was incorrect.
  > Why is it incorrect?

  Because when you run it on a machine with 128-bit unsigned long (or
  whatever type is being used for the word size), N can be 155 and 155
  is not prime.

I thought you said that test was wrong, and I asked what was wrong with
it, and why did you make that test return false.

Now I understand that you made the test return false in order to mask an
underlying bug.  Please don't do that!

  The change is questionable only because it hurts performance when the
  word size exceeds 64 bits. It's not questionable on correctness
  grounds.

I cannot tell for sure that you change cannot coase other problems.
Say, an infinite loop?  (I will not lose sleep over that until somebody
creates a computer with native 128-bit multiply hardware.)

  However, what's a good way to debug it?

As a last resort, compare the execution for a working base type with one
which fails?  When do they diverge?  Why do they diverge?

I am too busy to take care of this.  Sorry.

The current factor.c code is becoming insanely complex.  Contructs
masking bugs triggered on hypothetical systems doesn't help with
managing complexity.

Preferably, this problem should be fully understood.  If it can be
argued that it is a real bug which affects real systems, it needs a
clean fix.

-- 
Torbjörn
Please encrypt, key id 0xC8601622




Information forwarded to bug-coreutils <at> gnu.org:
bug#78476; Package coreutils. (Fri, 23 May 2025 07:08:02 GMT) Full text and rfc822 format available.

Message #23 received at 78476 <at> debbugs.gnu.org (full text, mbox):

From: Bernhard Voelker <mail <at> bernhard-voelker.de>
To: Torbjörn Granlund <tg <at> gmplib.org>,
 Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 78476 <at> debbugs.gnu.org
Subject: Re: bug#78476: GNU 'factor' problems with 128-bit word
Date: Fri, 23 May 2025 09:07:26 +0200
On 5/20/25 09:04, Torbjörn Granlund wrote:
> The current factor.c code is becoming insanely complex.  Contructs
> masking bugs triggered on hypothetical systems doesn't help with
> managing complexity.

If such hardware seems to be quite some time away, what about
guarding the code for the ranges we know it to work, and throw
an error - no matter whether at compile time or runtime - with
a diagnostic to revisit the problem once such hardware will be
available?

Have a nice day,
Berny




This bug report was last modified 1 day ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.