GNU bug report logs - #45343
factor: missed optimizations

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: coreutils; Reported by: Denys Vlasenko <dvlasenk@HIDDEN>; dated Sun, 20 Dec 2020 18:06:01 UTC; Maintainer for coreutils is bug-coreutils@HIDDEN.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 20 Dec 2020 18:05:24 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Dec 20 13:05:24 2020
Received: from localhost ([127.0.0.1]:45582 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1kr358-0006yT-NL
	for submit <at> debbugs.gnu.org; Sun, 20 Dec 2020 13:05:24 -0500
Received: from lists.gnu.org ([209.51.188.17]:49834)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <dvlasenk@HIDDEN>) id 1kr1OU-00005t-5X
 for submit <at> debbugs.gnu.org; Sun, 20 Dec 2020 11:17:14 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:43938)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <dvlasenk@HIDDEN>)
 id 1kr1OL-0002dZ-O5
 for bug-coreutils@HIDDEN; Sun, 20 Dec 2020 11:17:10 -0500
Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:36142)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256)
 (Exim 4.90_1) (envelope-from <dvlasenk@HIDDEN>)
 id 1kr1OF-0003Zk-Q6
 for bug-coreutils@HIDDEN; Sun, 20 Dec 2020 11:17:03 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
 s=mimecast20190719; t=1608481014;
 h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
 to:to:cc:mime-version:mime-version:content-type:content-type;
 bh=tTMiRZSj207/JKpyM0F4UTby8BZMItnp6wrjWKWMV0Q=;
 b=Lh/1E+GTyXhbTUuLLrvC1WPZjhC9JRx08Xbk5+gbrM8D0kDYvz4mienLoJXGPo1BuVzHwz
 B3IK1tEusIhPQoRDWR1gbSbXiSnE1PRtKVQh5X2KPvvNbnAFLdJTam8PJT+9maXSyLy/eM
 nvLefPaXqi1pV0hwlzO0rBjV6aWyMI4=
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-467-vxPYPNPsONOB7_BDNXcNQg-1; Sun, 20 Dec 2020 11:16:52 -0500
X-MC-Unique: vxPYPNPsONOB7_BDNXcNQg-1
Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com
 [10.5.11.15])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 39B1A801AC0;
 Sun, 20 Dec 2020 16:16:51 +0000 (UTC)
Received: from [10.40.192.23] (unknown [10.40.192.23])
 by smtp.corp.redhat.com (Postfix) with ESMTP id 6CD095B692;
 Sun, 20 Dec 2020 16:16:50 +0000 (UTC)
To: coreutils@HIDDEN, bug-coreutils@HIDDEN
From: Denys Vlasenko <dvlasenk@HIDDEN>
Subject: factor: missed optimizations
Message-ID: <9436ddae-b31b-7e1f-30a2-b56746344d85@HIDDEN>
Date: Sun, 20 Dec 2020 17:16:49 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.4.0
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15
Authentication-Results: relay.mimecast.com;
 auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dvlasenk@HIDDEN
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: multipart/mixed; boundary="------------CF1065E22818613C93660811"
Content-Language: en-US
Received-SPF: pass client-ip=216.205.24.124; envelope-from=dvlasenk@HIDDEN;
 helo=us-smtp-delivery-124.mimecast.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001,
 DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01,
 SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -1.4 (-)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Sun, 20 Dec 2020 13:05:20 -0500
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.4 (--)

This is a multi-part message in MIME format.
--------------CF1065E22818613C93660811
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

factor ---debug output is rather rudimentary.
In order to understand the logic, I added more printouts
(try attached debug patch)
and immediately noticed some obvious snafus:

$ src/factor ---debug 340282366920938461286658806734041124249
[using arbitrary-precision arithmetic] 340282366920938461286658806734041124249:[trial division 340282366920938461286658806734041124249] [trial stopped 4999(max)] [is number 34028236692093846128665880673404112424
9 prime?] [no] [pollard-rho (1)] 1:1:2:4:8:16:32:64:128:256:512:1024:2048:4096:8192:16384:32768:65536:131072:262144:524288:1048576:2097152:4194304:8388608:16777216:33554432:67108864:134217728:268435456:536870912
[factor found][rho factor 18446744073709551557] [trial division 18446744073709551556] [factor 2] [factor 2] [factor 11] [factor 137] [factor 547] [trial stopped 4999(max)] [is number 5594472617641 prime?] [trial
  division 5594472617640] [factor 2] [factor 2] [factor 2] [factor 3] [factor 5] [factor 1427] [factor 2131] [trial stopped 2131] [is number 15331 prime?] [yes] [factor 15331] [yes] ...

Look at the ens of the last line.
We don't need to check whether 15331 is prime. We know it is: we just stopped
trial division because next prime divisor to try is larger than
sqrt(number_we_try_to_factor) - therefore number_we_try_to_factor
has no more divisors, ergo it's prime. The fix:

    for (unsigned int i = 1; i <= PRIMES_PTAB_ENTRIES;)
      {
        if (! mpz_divisible_ui_p (t, p))
          {
            p += primes_diff[i++];
            if (mpz_cmp_ui (t, p * p) < 0)
+            mp_factor_insert (factors, t);
+            mpz_set_ui (t, 1);
              break;
          }
        else


... [factor 5594472617641] [factor
  18446744073709551557] [trial division 18446744073709551556] [factor 2] [factor 2] [factor 11] [factor 137] [factor 547] [trial stopped 4999(max)] [is number 5594472617641 prime?] [trial division 5594472617640]
[factor 2] [factor 2] [factor 2] [factor 3] [factor 5] [factor 1427] [factor 2131] [trial stopped 2131] [is number 15331 prime?] [yes] [factor 15331] [yes] [factor 5594472617641] [factor 18446744073709551557] [p
ollard-rho end]
   18446744073709551557 18446744073709551557

Second optimization is this: as you see, I deliberately fed it a square
(the factorization result is 18446744073709551557^2).
It happily did not notice it and spent an awful lot of time in pollard rho:

  1:1:2:4:8:16:32:64:128:256:512:1024:2048:4096:8192:16384:32768:65536:131072:262144:524288:1048576:2097152:4194304:8388608:16777216:33554432:67108864:134217728:268435456:536870912

A quick check for square drastically cuts down execution time:


  /* Use Pollard-rho to compute the prime factors of
     arbitrary-precision T, and put the results in FACTORS.  */
  static void
+mp_factor_big (mpz_t t, struct mp_factors *factors)
+{
+  /* the _big() function assumes trial division was already tried */
+  devmsg ("[is number prime?] ");
+  if (mp_prime_p (t))
+    mp_factor_insert (factors, t);
+  else
+    if (mpz_perfect_square_p (t))
+      {
+        struct mp_factors f2;
+        devmsg ("[it is square] ");
+        mpz_sqrt (t, t);
+        mp_factor_init (&f2);
+        mp_factor_big (t, &f2);
+        for (unsigned long int i = 0; i < f2.nfactors; i++)
+          {
+            unsigned long int e = f2.e[i];
+            while (e--) {                           //
+              mp_factor_insert (factors, f2.p[i]);  //FIXME: obvious optimization:
+              mp_factor_insert (factors, f2.p[i]);  // need to introduce/use mp_factor_insert_multiplicity(..., e*2);
+            }
+          }
+        mp_factor_clear (&f2);
+      }
+    else
+      mp_factor_using_pollard_rho (t, 1, factors);
+}
+
+static void
  mp_factor (mpz_t t, struct mp_factors *factors)
  {
    mp_factor_init (factors);
@@ -2265,11 +2309,7 @@ mp_factor (mpz_t t, struct mp_factors *f
.
        if (mpz_cmp_ui (t, 1) != 0)
          {
-          devmsg ("[is number prime?] ");
-          if (mp_prime_p (t))
-            mp_factor_insert (factors, t);
-          else
-            mp_factor_using_pollard_rho (t, 1, factors);
+          mp_factor_big (t, factors);
          }
      }
  }


--------------CF1065E22818613C93660811
Content-Type: text/x-patch;
 name="zz.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="zz.diff"

diff -urp coreutils-8.32/src/factor.c coreutils-8.32-TT/src/factor.c
--- coreutils-8.32/src/factor.c	2020-01-01 15:13:12.000000000 +0100
+++ coreutils-8.32-TT/src/factor.c	2020-12-20 17:08:33.022081567 +0100
@@ -620,6 +620,8 @@ mp_factor_insert (struct mp_factors *fac
   unsigned long int *e  = factors->e;
   long i;
 
+gmp_printf ("[factor %Zd] ", prime);
+
   /* Locate position for insert new or increment e.  */
   for (i = nfactors - 1; i >= 0; i--)
     {
@@ -842,7 +844,8 @@ mp_factor_using_division (mpz_t t, struc
   mpz_t q;
   unsigned long int p;
 
-  devmsg ("[trial division] ");
+gmp_printf ("[trial division %Zd] ", t);
+//  devmsg ("[trial division] ");
 
   mpz_init (q);
 
@@ -861,6 +864,8 @@ mp_factor_using_division (mpz_t t, struc
         {
           p += primes_diff[i++];
           if (mpz_cmp_ui (t, p * p) < 0)
+            mp_factor_insert (factors, t);
+            mpz_set_ui (t, 1);
             break;
         }
       else
@@ -1407,6 +1412,7 @@ mp_prime_p (mpz_t n)
   if (!mp_millerrabin (n, nm1, a, tmp, q, k))
     {
       is_prime = false;
+gmp_printf ("[ML1:no] ");
       goto ret2;
     }
 
@@ -1414,6 +1420,7 @@ mp_prime_p (mpz_t n)
     {
       /* Factor n-1 for Lucas.  */
       mpz_set (tmp, nm1);
+gmp_printf ("[ML does not know, trying Lucas] ");
       mp_factor (tmp, &factors);
     }
 
@@ -1438,13 +1445,16 @@ mp_prime_p (mpz_t n)
         }
 
       if (is_prime)
+{gmp_printf ("[Lucas:yes] ");
         goto ret1;
+}
 
       mpz_add_ui (a, a, primes_diff[r]);        /* Establish new base.  */
 
       if (!mp_millerrabin (n, nm1, a, tmp, q, k))
         {
           is_prime = false;
+gmp_printf ("[ML2:no] ");
           goto ret1;
         }
     }
@@ -1692,6 +1702,7 @@ mp_factor_using_pollard_rho (mpz_t n, un
     {
       for (;;)
         {
+printf("%llu", k);
           do
             {
               mpz_mul (t, x, x);
@@ -1715,6 +1726,7 @@ mp_factor_using_pollard_rho (mpz_t n, un
           mpz_set (z, x);
           k = l;
           l = 2 * l;
+devmsg(":");
           for (unsigned long long int i = 0; i < k; i++)
             {
               mpz_mul (t, x, x);
@@ -1725,6 +1737,7 @@ mp_factor_using_pollard_rho (mpz_t n, un
         }
 
     factor_found:
+gmp_printf("[factor found]");
       do
         {
           mpz_mul (t, y, y);
@@ -1738,6 +1751,7 @@ mp_factor_using_pollard_rho (mpz_t n, un
 
       mpz_divexact (n, n, t);   /* divide by t, before t is overwritten */
 
+gmp_printf("[rho factor %Zd] ", t);
       if (!mp_prime_p (t))
         {
           devmsg ("[composite factor--restarting pollard-rho] ");
@@ -1759,6 +1773,7 @@ mp_factor_using_pollard_rho (mpz_t n, un
       mpz_mod (y, y, n);
     }
 
+  devmsg ("[pollard-rho end] ");
   mpz_clears (P, t2, t, z, x, y, NULL);
 }
 #endif
@@ -2255,6 +2270,35 @@ factor (uintmax_t t1, uintmax_t t0, stru
 /* Use Pollard-rho to compute the prime factors of
    arbitrary-precision T, and put the results in FACTORS.  */
 static void
+mp_factor_big (mpz_t t, struct mp_factors *factors)
+{
+  /* the _big() function assumes trial division was already tried */
+  devmsg ("[is number prime?] ");
+  if (mp_prime_p (t))
+    mp_factor_insert (factors, t);
+  else
+    if (mpz_perfect_square_p (t))
+      {
+        struct mp_factors f2;
+        devmsg ("[it is square] ");
+        mpz_sqrt (t, t);
+        mp_factor_init (&f2);
+        mp_factor_big (t, &f2);
+        for (unsigned long int i = 0; i < f2.nfactors; i++)
+          {
+            unsigned long int e = f2.e[i];
+            while (e--) {
+              mp_factor_insert (factors, f2.p[i]);
+              mp_factor_insert (factors, f2.p[i]);
+            }
+          }
+        mp_factor_clear (&f2);
+      }
+    else
+      mp_factor_using_pollard_rho (t, 1, factors);
+}
+
+static void
 mp_factor (mpz_t t, struct mp_factors *factors)
 {
   mp_factor_init (factors);
@@ -2265,11 +2309,7 @@ mp_factor (mpz_t t, struct mp_factors *f
 
       if (mpz_cmp_ui (t, 1) != 0)
         {
-          devmsg ("[is number prime?] ");
-          if (mp_prime_p (t))
-            mp_factor_insert (factors, t);
-          else
-            mp_factor_using_pollard_rho (t, 1, factors);
+          mp_factor_big (t, factors);
         }
     }
 }
@@ -2599,6 +2639,8 @@ do_stdin (void)
 int
 main (int argc, char **argv)
 {
+  setvbuf(stdout, NULL, _IONBF, 0);
+
   initialize_main (&argc, &argv);
   set_program_name (argv[0]);
   setlocale (LC_ALL, "");

--------------CF1065E22818613C93660811--





Acknowledgement sent to Denys Vlasenko <dvlasenk@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-coreutils@HIDDEN. Full text available.
Report forwarded to bug-coreutils@HIDDEN:
bug#45343; Package coreutils. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Sun, 20 Dec 2020 18:15:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.