GNU bug report logs - #78476
GNU 'factor' problems with 128-bit word

Reported by: Paul Eggert <eggert <at> cs.ucla.edu>

Date: Sun, 18 May 2025 07:40:02 UTC

Severity: normal

To reply to this bug, email your comments to 78476 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-coreutils <at> gnu.org:
bug#78476; Package coreutils. (Sun, 18 May 2025 07:40:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Paul Eggert <eggert <at> cs.ucla.edu>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Sun, 18 May 2025 07:40:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Coreutils bugs <bug-coreutils <at> gnu.org>
Cc: Torbjorn Granlund <tg <at> gmplib.org>
Subject: GNU 'factor' problems with 128-bit word
Date: Sun, 18 May 2025 00:39:38 -0700

I recently found out that GNU coreutils 'factor' misbehaves if its
internal word is 128 bits rather than the usual 64. This could happen if
one builds Coreutils 9.7 on a platform with 128-bit uintmax_t, something
that POSIX allows (though it's only theoretical now as far as I know).

The problem occurs when src/factor.c's prime_p function assumes that as
a quick test, its argument N must be prime if N is less than
FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME. I guess this quick-test
assumption is true for 64-bit uintmax_t, but not for 128-bit.

I worked around the bug by installed a patch
<https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=fe51b33859d2e7286425d6d5600288ce9ae43bd1>.
This patch changes prime_p to rely on the quick-test assumption only
when the internal word size is 64 bits. On platforms with wider words,
the code now tests smaller numbers for primes too; this works around the
bug, but is slower.

It'd be nice if the underlying problem were fixed, so that the code is
more portable to 128-bit word.

The issue might be related to Bug#25135 "Infinite loop in factor"
<https://bugs.gnu.org/25135> so I am cc'ing this to Torbjörn. The
current factor.c is not the same as what Torbjörn contributed years ago
so it is of course possible that I or someone else introduced the bug
more recently.

To reproduce the problem on Fedora 42 x86-64:

git clone https://git.savannah.gnu.org/git/coreutils.git
cd coreutils
git checkout fe51b33859d2e7286425d6d5600288ce9ae43bd1
./bootstrap
./configure
make CFLAGS='-DUSE_INT128 -DEXHIBIT_INT128_BUG' WERROR_CFLAGS=
echo 340282366920938463463374607431768211355 | src/factor

The output is:

340282366920938463463374607431768211355: 155
2195370109167344925570158757624311041

which is wrong, as 155 is not prime.

Information forwarded to bug-coreutils <at> gnu.org:
bug#78476; Package coreutils. (Sun, 18 May 2025 09:08:02 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Torbjörn Granlund <tg <at> gmplib.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Coreutils bugs <bug-coreutils <at> gnu.org>
Subject: Re: GNU 'factor' problems with 128-bit word
Date: Sun, 18 May 2025 11:07:01 +0200

Just a quick note.

I don't think is makes much sense to handle two 128-bit words in this
code.  In fact, the use of uintmax_t was a mistake, it should use
"unsigned long" or "unsigned long long" whichever is efficiently
supported directly by the hardware.

While uintmax_t could be made to work also for the cases you report, it
will be inefficient.

A few months ago, I contributed a suggested new core factoring function
to GNU coreutils, which is, IIRC, 2-3 times faster than the current code
for the bignum range.  It uses GMP's mpn functions with "Montgomery
multiplication".

There were two problems with my new core code:

1. It will not work with mini-GMP, which I think is undesiable.

2. It did not optimise for smaller factoring tasks.  The fix for this
would be to merge that from the current coreutils factor.c.

-- 
Torbjörn
Please encrypt, key id 0xC8601622

Information forwarded to bug-coreutils <at> gnu.org:
bug#78476; Package coreutils. (Mon, 19 May 2025 19:28:02 GMT) Full text and rfc822 format available.

Message #11 received at 78476 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Torbjörn Granlund <tg <at> gmplib.org>
Cc: 78476 <at> debbugs.gnu.org
Subject: Re: bug#78476: GNU 'factor' problems with 128-bit word
Date: Mon, 19 May 2025 12:27:27 -0700

On 2025-05-18 02:07, Torbjörn Granlund wrote:

> I don't think is makes much sense to handle two 128-bit words in this
> code.  In fact, the use of uintmax_t was a mistake, it should use
> "unsigned long" or "unsigned long long" whichever is efficiently
> supported directly by the hardware.

Fine, but the 128-bit bug is independent of uintmax_t. Suppose we change 
factor.c to never use uintmax_t, and suppose we have a (theoretical but 
allowed) platform where unsigned long is 128 bits and where factor.c 
uses that type for its words. Then before my Saturday coreutils 
commit[1], this code in prime_p:

  /* We have already cast out small primes.  */
  if (n < (wide_uint) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME)
    return true;

was incorrect. The problem is that when given 2**128 - 101, 
factor_using_pollard_rho generates 155 and tests it with prime_p, and 
the above code causes prime_p to incorrectly report that 155 is prime.

> While uintmax_t could be made to work also for the cases you report, it
> will be inefficient.
> 
> A few months ago, I contributed a suggested new core factoring function
> to GNU coreutils, which is, IIRC, 2-3 times faster than the current code
> for the bignum range.  It uses GMP's mpn functions with "Montgomery
> multiplication".
> 
> There were two problems with my new core code:
> 
> 1. It will not work with mini-GMP, which I think is undesiable.
> 
> 2. It did not optimise for smaller factoring tasks.  The fix for this
> would be to merge that from the current coreutils factor.c.

Thanks, I didn't know about all that. I plan to take a look at it.

[1]: 
https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=fe51b33859d2e7286425d6d5600288ce9ae43bd1

Information forwarded to bug-coreutils <at> gnu.org:
bug#78476; Package coreutils. (Mon, 19 May 2025 22:24:02 GMT) Full text and rfc822 format available.

Message #14 received at 78476 <at> debbugs.gnu.org (full text, mbox):

From: Torbjörn Granlund <tg <at> gmplib.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 78476 <at> debbugs.gnu.org
Subject: Re: bug#78476: GNU 'factor' problems with 128-bit word
Date: Tue, 20 May 2025 00:23:50 +0200

Paul Eggert <eggert <at> cs.ucla.edu> writes:

  Fine, but the 128-bit bug is independent of uintmax_t. Suppose we
  change factor.c to never use uintmax_t, and suppose we have a
  (theoretical but allowed) platform where unsigned long is 128 bits and
  where factor.c uses that type for its words. Then before my Saturday
  coreutils commit[1], this code in prime_p:

    /* We have already cast out small primes.  */
    if (n < (wide_uint) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME)
      return true;

  was incorrect.

Why is it incorrect?

I read you change, and I strongly disgree with it.

I actually find it extremely worrying that a change is motivated by "We
don't know why the code misbehaves when wide_uint is wider;" and then
the presumed bug is masked with a pretty questionable change.  Really?

How about debugging it?

Furthernorem tinkering with mature code for porting it to something with
does not exist and likely will never exist, is a bad idea.

No, we are very unlikely to see systems with 128x128->256 bit hardware
multiply support, which is the only case where forcing factor.c to do
128-bit arithmetic.

Let's use our time and effort on improving things for the word we live
in.

-- 
Torbjörn
Please encrypt, key id 0xC8601622

Information forwarded to bug-coreutils <at> gnu.org:
bug#78476; Package coreutils. (Mon, 19 May 2025 22:52:01 GMT) Full text and rfc822 format available.

Message #17 received at 78476 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Torbjörn Granlund <tg <at> gmplib.org>
Cc: 78476 <at> debbugs.gnu.org
Subject: Re: bug#78476: GNU 'factor' problems with 128-bit word
Date: Mon, 19 May 2025 15:51:12 -0700

On 5/19/25 15:23, Torbjörn Granlund wrote:
> Paul Eggert <eggert <at> cs.ucla.edu> writes:
> 
>    before my Saturday coreutils commit[1], this code in prime_p:
> 
>      /* We have already cast out small primes.  */
>      if (n < (wide_uint) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME)
>        return true;
> 
>    was incorrect.
> 
> Why is it incorrect?

Because when you run it on a machine with 128-bit unsigned long (or 
whatever type is being used for the word size), N can be 155 and 155 is 
not prime.

> I actually find it extremely worrying that a change is motivated by "We
> don't know why the code misbehaves when wide_uint is wider;" and then
> the presumed bug is masked with a pretty questionable change.  Really?

The change is questionable only because it hurts performance when the 
word size exceeds 64 bits. It's not questionable on correctness grounds. 
And it doesn't affect performance in the usual 64-bit case.

> How about debugging it?

Yes, that'd be better and that's why I filed the bug report. However, 
what's a good way to debug it?

My *guess* is that prime_p is sometimes called in a context where it's 
already guaranteed that we have cast out small primes, and it's 
sometimes called in a context where there is no such guarantee. The 
latter context is the bug. It appears that this context can occur in 
factor_using_pollard_rho (n, a, factors), which contains this code after 
the "factor_found:" label:

      if (!prime_p (g))
        factor_using_pollard_rho (g, a + 1, factors);
      else
        factor_insert (factors, g);

If I understand things correctly, there's no guarantee here that G 
cannot be a small prime, which means factor_insert can be called even 
when G is not prime, and that is a bug.

> Let's use our time and effort on improving things for the word we live
> in.

That would be reasonable if we knew that the existing code works for all 
cases in 64 bits. Is there some way to show that? (We can't exhaustively 
test it.)

Even a comment would be helpful. And, if the code is not designed to 
work with any word size other than 64 bits, there should be a static 
check to that effect. (There's already a static check that words must be 
at least 64 bits.)

Information forwarded to bug-coreutils <at> gnu.org:
bug#78476; Package coreutils. (Tue, 20 May 2025 07:05:02 GMT) Full text and rfc822 format available.

Message #20 received at 78476 <at> debbugs.gnu.org (full text, mbox):

From: Torbjörn Granlund <tg <at> gmplib.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 78476 <at> debbugs.gnu.org
Subject: Re: bug#78476: GNU 'factor' problems with 128-bit word
Date: Tue, 20 May 2025 09:04:24 +0200

Paul Eggert <eggert <at> cs.ucla.edu> writes:

  >      if (n < (wide_uint) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME)
  >        return true;
  >    was incorrect.
  > Why is it incorrect?

  Because when you run it on a machine with 128-bit unsigned long (or
  whatever type is being used for the word size), N can be 155 and 155
  is not prime.

I thought you said that test was wrong, and I asked what was wrong with
it, and why did you make that test return false.

Now I understand that you made the test return false in order to mask an
underlying bug.  Please don't do that!

  The change is questionable only because it hurts performance when the
  word size exceeds 64 bits. It's not questionable on correctness
  grounds.

I cannot tell for sure that you change cannot coase other problems.
Say, an infinite loop?  (I will not lose sleep over that until somebody
creates a computer with native 128-bit multiply hardware.)

  However, what's a good way to debug it?

As a last resort, compare the execution for a working base type with one
which fails?  When do they diverge?  Why do they diverge?

I am too busy to take care of this.  Sorry.

The current factor.c code is becoming insanely complex.  Contructs
masking bugs triggered on hypothetical systems doesn't help with
managing complexity.

Preferably, this problem should be fully understood.  If it can be
argued that it is a real bug which affects real systems, it needs a
clean fix.

-- 
Torbjörn
Please encrypt, key id 0xC8601622

Information forwarded to bug-coreutils <at> gnu.org:
bug#78476; Package coreutils. (Fri, 23 May 2025 07:08:02 GMT) Full text and rfc822 format available.

Message #23 received at 78476 <at> debbugs.gnu.org (full text, mbox):

From: Bernhard Voelker <mail <at> bernhard-voelker.de>
To: Torbjörn Granlund <tg <at> gmplib.org>,
 Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 78476 <at> debbugs.gnu.org
Subject: Re: bug#78476: GNU 'factor' problems with 128-bit word
Date: Fri, 23 May 2025 09:07:26 +0200

On 5/20/25 09:04, Torbjörn Granlund wrote:
> The current factor.c code is becoming insanely complex.  Contructs
> masking bugs triggered on hypothetical systems doesn't help with
> managing complexity.

If such hardware seems to be quite some time away, what about
guarding the code for the ranges we know it to work, and throw
an error - no matter whether at compile time or runtime - with
a diagnostic to revisit the problem once such hardware will be
available?

Have a nice day,
Berny

Information forwarded to bug-coreutils <at> gnu.org:
bug#78476; Package coreutils. (Thu, 10 Jul 2025 05:45:02 GMT) Full text and rfc822 format available.

Message #26 received at 78476 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Torbjörn Granlund <tg <at> gmplib.org>
Cc: 78476 <at> debbugs.gnu.org
Subject: Re: bug#78476: GNU 'factor' problems with 128-bit word
Date: Wed, 9 Jul 2025 22:44:32 -0700

[Message part 1 (text/plain, inline)]

On 2025-05-20 00:04, Torbjörn Granlund wrote:
> I am too busy to take care of this.  Sorry.

Understandable. Thanks for taking the time to look at my email.

> Preferably, this problem should be fully understood.

Yes, that'd be better but it appears that nobody has the time to do so. 
However, we can address this in a different way, by removing the buggy 
code instead of understanding it. I did so by re-reading this bug report 
along with your January proposal[1], looking at factor.c again, and 
committing changes mostly along the lines you suggested, as follows:

* Merge the January proposed factor.c into GNU coreutils.

* Port to mini-gmp.c, which lacks mpn_tdiv_qr.

* Don't use uintmax_t or unsigned __int128 as a word; just use the same 
mp_limb_t type that GMP uses.

* Remove the USE_SQUFOF code (which was unused).

* Don't try to prove primality; trust Baillie-PSW as being good enough. 
Use mpz_probab_prime_p for that, even for numbers that fit into two 
limbs or less.

* Port to unlikely platforms lacking 32- or 64-bit words.

* Minor speedups, e.g., convert strings to uuint faster, and decrement 
MR_REPS from 25 to 24 as that's a sweet spot in GMP.

* Statically check for unsupported (and unlikely) build configurations.

* Add some comments.

Two FIXMEs remain in factor.c. To address the first one, would GMP 
accept longlong.h changes to make it standalone even with 
-Wunused-macros? Currently longlong.h is not standalone in that sense, 
even if LONGLONG_STANDALONE is defined. The changes should be minor.

You can grab all the coreutils updates from Savannah by cloning [2] as 
usual. For convenience in reviewing I am attaching the factors-related 
changes between the old and new Coreutils as a single patch. Comments 
are of course welcome.

[1]: https://lists.gnu.org/r/coreutils/2025-01/msg00000.html
[2]: https://git.savannah.gnu.org/git/coreutils.git

[factor1.patch (text/x-patch, attachment)]

This bug report was last modified 1 day ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #78476 GNU 'factor' problems with 128-bit word

GNU bug report logs - #78476
GNU 'factor' problems with 128-bit word