GNU bug report logs -
#78476
GNU 'factor' problems with 128-bit word
Previous Next
To reply to this bug, email your comments to 78476 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#78476
; Package
coreutils
.
(Sun, 18 May 2025 07:40:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Sun, 18 May 2025 07:40:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
I recently found out that GNU coreutils 'factor' misbehaves if its
internal word is 128 bits rather than the usual 64. This could happen if
one builds Coreutils 9.7 on a platform with 128-bit uintmax_t, something
that POSIX allows (though it's only theoretical now as far as I know).
The problem occurs when src/factor.c's prime_p function assumes that as
a quick test, its argument N must be prime if N is less than
FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME. I guess this quick-test
assumption is true for 64-bit uintmax_t, but not for 128-bit.
I worked around the bug by installed a patch
<https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=fe51b33859d2e7286425d6d5600288ce9ae43bd1>.
This patch changes prime_p to rely on the quick-test assumption only
when the internal word size is 64 bits. On platforms with wider words,
the code now tests smaller numbers for primes too; this works around the
bug, but is slower.
It'd be nice if the underlying problem were fixed, so that the code is
more portable to 128-bit word.
The issue might be related to Bug#25135 "Infinite loop in factor"
<https://bugs.gnu.org/25135> so I am cc'ing this to Torbjörn. The
current factor.c is not the same as what Torbjörn contributed years ago
so it is of course possible that I or someone else introduced the bug
more recently.
To reproduce the problem on Fedora 42 x86-64:
git clone https://git.savannah.gnu.org/git/coreutils.git
cd coreutils
git checkout fe51b33859d2e7286425d6d5600288ce9ae43bd1
./bootstrap
./configure
make CFLAGS='-DUSE_INT128 -DEXHIBIT_INT128_BUG' WERROR_CFLAGS=
echo 340282366920938463463374607431768211355 | src/factor
The output is:
340282366920938463463374607431768211355: 155
2195370109167344925570158757624311041
which is wrong, as 155 is not prime.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#78476
; Package
coreutils
.
(Sun, 18 May 2025 09:08:02 GMT)
Full text and
rfc822 format available.
Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):
Just a quick note.
I don't think is makes much sense to handle two 128-bit words in this
code. In fact, the use of uintmax_t was a mistake, it should use
"unsigned long" or "unsigned long long" whichever is efficiently
supported directly by the hardware.
While uintmax_t could be made to work also for the cases you report, it
will be inefficient.
A few months ago, I contributed a suggested new core factoring function
to GNU coreutils, which is, IIRC, 2-3 times faster than the current code
for the bignum range. It uses GMP's mpn functions with "Montgomery
multiplication".
There were two problems with my new core code:
1. It will not work with mini-GMP, which I think is undesiable.
2. It did not optimise for smaller factoring tasks. The fix for this
would be to merge that from the current coreutils factor.c.
--
Torbjörn
Please encrypt, key id 0xC8601622
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#78476
; Package
coreutils
.
(Mon, 19 May 2025 19:28:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 78476 <at> debbugs.gnu.org (full text, mbox):
On 2025-05-18 02:07, Torbjörn Granlund wrote:
> I don't think is makes much sense to handle two 128-bit words in this
> code. In fact, the use of uintmax_t was a mistake, it should use
> "unsigned long" or "unsigned long long" whichever is efficiently
> supported directly by the hardware.
Fine, but the 128-bit bug is independent of uintmax_t. Suppose we change
factor.c to never use uintmax_t, and suppose we have a (theoretical but
allowed) platform where unsigned long is 128 bits and where factor.c
uses that type for its words. Then before my Saturday coreutils
commit[1], this code in prime_p:
/* We have already cast out small primes. */
if (n < (wide_uint) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME)
return true;
was incorrect. The problem is that when given 2**128 - 101,
factor_using_pollard_rho generates 155 and tests it with prime_p, and
the above code causes prime_p to incorrectly report that 155 is prime.
> While uintmax_t could be made to work also for the cases you report, it
> will be inefficient.
>
> A few months ago, I contributed a suggested new core factoring function
> to GNU coreutils, which is, IIRC, 2-3 times faster than the current code
> for the bignum range. It uses GMP's mpn functions with "Montgomery
> multiplication".
>
> There were two problems with my new core code:
>
> 1. It will not work with mini-GMP, which I think is undesiable.
>
> 2. It did not optimise for smaller factoring tasks. The fix for this
> would be to merge that from the current coreutils factor.c.
Thanks, I didn't know about all that. I plan to take a look at it.
[1]:
https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=fe51b33859d2e7286425d6d5600288ce9ae43bd1
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#78476
; Package
coreutils
.
(Mon, 19 May 2025 22:24:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 78476 <at> debbugs.gnu.org (full text, mbox):
Paul Eggert <eggert <at> cs.ucla.edu> writes:
Fine, but the 128-bit bug is independent of uintmax_t. Suppose we
change factor.c to never use uintmax_t, and suppose we have a
(theoretical but allowed) platform where unsigned long is 128 bits and
where factor.c uses that type for its words. Then before my Saturday
coreutils commit[1], this code in prime_p:
/* We have already cast out small primes. */
if (n < (wide_uint) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME)
return true;
was incorrect.
Why is it incorrect?
I read you change, and I strongly disgree with it.
I actually find it extremely worrying that a change is motivated by "We
don't know why the code misbehaves when wide_uint is wider;" and then
the presumed bug is masked with a pretty questionable change. Really?
How about debugging it?
Furthernorem tinkering with mature code for porting it to something with
does not exist and likely will never exist, is a bad idea.
No, we are very unlikely to see systems with 128x128->256 bit hardware
multiply support, which is the only case where forcing factor.c to do
128-bit arithmetic.
Let's use our time and effort on improving things for the word we live
in.
--
Torbjörn
Please encrypt, key id 0xC8601622
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#78476
; Package
coreutils
.
(Mon, 19 May 2025 22:52:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 78476 <at> debbugs.gnu.org (full text, mbox):
On 5/19/25 15:23, Torbjörn Granlund wrote:
> Paul Eggert <eggert <at> cs.ucla.edu> writes:
>
> before my Saturday coreutils commit[1], this code in prime_p:
>
> /* We have already cast out small primes. */
> if (n < (wide_uint) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME)
> return true;
>
> was incorrect.
>
> Why is it incorrect?
Because when you run it on a machine with 128-bit unsigned long (or
whatever type is being used for the word size), N can be 155 and 155 is
not prime.
> I actually find it extremely worrying that a change is motivated by "We
> don't know why the code misbehaves when wide_uint is wider;" and then
> the presumed bug is masked with a pretty questionable change. Really?
The change is questionable only because it hurts performance when the
word size exceeds 64 bits. It's not questionable on correctness grounds.
And it doesn't affect performance in the usual 64-bit case.
> How about debugging it?
Yes, that'd be better and that's why I filed the bug report. However,
what's a good way to debug it?
My *guess* is that prime_p is sometimes called in a context where it's
already guaranteed that we have cast out small primes, and it's
sometimes called in a context where there is no such guarantee. The
latter context is the bug. It appears that this context can occur in
factor_using_pollard_rho (n, a, factors), which contains this code after
the "factor_found:" label:
if (!prime_p (g))
factor_using_pollard_rho (g, a + 1, factors);
else
factor_insert (factors, g);
If I understand things correctly, there's no guarantee here that G
cannot be a small prime, which means factor_insert can be called even
when G is not prime, and that is a bug.
> Let's use our time and effort on improving things for the word we live
> in.
That would be reasonable if we knew that the existing code works for all
cases in 64 bits. Is there some way to show that? (We can't exhaustively
test it.)
Even a comment would be helpful. And, if the code is not designed to
work with any word size other than 64 bits, there should be a static
check to that effect. (There's already a static check that words must be
at least 64 bits.)
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#78476
; Package
coreutils
.
(Tue, 20 May 2025 07:05:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 78476 <at> debbugs.gnu.org (full text, mbox):
Paul Eggert <eggert <at> cs.ucla.edu> writes:
> if (n < (wide_uint) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME)
> return true;
> was incorrect.
> Why is it incorrect?
Because when you run it on a machine with 128-bit unsigned long (or
whatever type is being used for the word size), N can be 155 and 155
is not prime.
I thought you said that test was wrong, and I asked what was wrong with
it, and why did you make that test return false.
Now I understand that you made the test return false in order to mask an
underlying bug. Please don't do that!
The change is questionable only because it hurts performance when the
word size exceeds 64 bits. It's not questionable on correctness
grounds.
I cannot tell for sure that you change cannot coase other problems.
Say, an infinite loop? (I will not lose sleep over that until somebody
creates a computer with native 128-bit multiply hardware.)
However, what's a good way to debug it?
As a last resort, compare the execution for a working base type with one
which fails? When do they diverge? Why do they diverge?
I am too busy to take care of this. Sorry.
The current factor.c code is becoming insanely complex. Contructs
masking bugs triggered on hypothetical systems doesn't help with
managing complexity.
Preferably, this problem should be fully understood. If it can be
argued that it is a real bug which affects real systems, it needs a
clean fix.
--
Torbjörn
Please encrypt, key id 0xC8601622
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#78476
; Package
coreutils
.
(Fri, 23 May 2025 07:08:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 78476 <at> debbugs.gnu.org (full text, mbox):
On 5/20/25 09:04, Torbjörn Granlund wrote:
> The current factor.c code is becoming insanely complex. Contructs
> masking bugs triggered on hypothetical systems doesn't help with
> managing complexity.
If such hardware seems to be quite some time away, what about
guarding the code for the ranges we know it to work, and throw
an error - no matter whether at compile time or runtime - with
a diagnostic to revisit the problem once such hardware will be
available?
Have a nice day,
Berny
This bug report was last modified 1 day ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.