GNU logs - #45343, boring messages


Message sent to bug-coreutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#45343: factor: missed optimizations
Resent-From: Denys Vlasenko <dvlasenk@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-coreutils@HIDDEN
Resent-Date: Sun, 20 Dec 2020 18:06:01 +0000
Resent-Message-ID: <handler.45343.B.160848752426818 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: report 45343
X-GNU-PR-Package: coreutils
X-GNU-PR-Keywords: 
To: coreutils@HIDDEN, 45343 <at> debbugs.gnu.org
X-Debbugs-Original-To: coreutils@HIDDEN, bug-coreutils@HIDDEN
Received: via spool by submit <at> debbugs.gnu.org id=B.160848752426818
          (code B ref -1); Sun, 20 Dec 2020 18:06:01 +0000
Received: (at submit) by debbugs.gnu.org; 20 Dec 2020 18:05:24 +0000
Received: from localhost ([127.0.0.1]:45582 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1kr358-0006yT-NL
	for submit <at> debbugs.gnu.org; Sun, 20 Dec 2020 13:05:24 -0500
Received: from lists.gnu.org ([209.51.188.17]:49834)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <dvlasenk@HIDDEN>) id 1kr1OU-00005t-5X
 for submit <at> debbugs.gnu.org; Sun, 20 Dec 2020 11:17:14 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:43938)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <dvlasenk@HIDDEN>)
 id 1kr1OL-0002dZ-O5
 for bug-coreutils@HIDDEN; Sun, 20 Dec 2020 11:17:10 -0500
Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:36142)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256)
 (Exim 4.90_1) (envelope-from <dvlasenk@HIDDEN>)
 id 1kr1OF-0003Zk-Q6
 for bug-coreutils@HIDDEN; Sun, 20 Dec 2020 11:17:03 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
 s=mimecast20190719; t=1608481014;
 h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
 to:to:cc:mime-version:mime-version:content-type:content-type;
 bh=tTMiRZSj207/JKpyM0F4UTby8BZMItnp6wrjWKWMV0Q=;
 b=Lh/1E+GTyXhbTUuLLrvC1WPZjhC9JRx08Xbk5+gbrM8D0kDYvz4mienLoJXGPo1BuVzHwz
 B3IK1tEusIhPQoRDWR1gbSbXiSnE1PRtKVQh5X2KPvvNbnAFLdJTam8PJT+9maXSyLy/eM
 nvLefPaXqi1pV0hwlzO0rBjV6aWyMI4=
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-467-vxPYPNPsONOB7_BDNXcNQg-1; Sun, 20 Dec 2020 11:16:52 -0500
X-MC-Unique: vxPYPNPsONOB7_BDNXcNQg-1
Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com
 [10.5.11.15])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 39B1A801AC0;
 Sun, 20 Dec 2020 16:16:51 +0000 (UTC)
Received: from [10.40.192.23] (unknown [10.40.192.23])
 by smtp.corp.redhat.com (Postfix) with ESMTP id 6CD095B692;
 Sun, 20 Dec 2020 16:16:50 +0000 (UTC)
From: Denys Vlasenko <dvlasenk@HIDDEN>
Message-ID: <9436ddae-b31b-7e1f-30a2-b56746344d85@HIDDEN>
Date: Sun, 20 Dec 2020 17:16:49 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.4.0
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15
Authentication-Results: relay.mimecast.com;
 auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dvlasenk@HIDDEN
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: multipart/mixed; boundary="------------CF1065E22818613C93660811"
Content-Language: en-US
Received-SPF: pass client-ip=216.205.24.124; envelope-from=dvlasenk@HIDDEN;
 helo=us-smtp-delivery-124.mimecast.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001,
 DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01,
 SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -1.4 (-)
X-Mailman-Approved-At: Sun, 20 Dec 2020 13:05:20 -0500
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.4 (--)

This is a multi-part message in MIME format.
--------------CF1065E22818613C93660811
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

factor ---debug output is rather rudimentary.
In order to understand the logic, I added more printouts
(try attached debug patch)
and immediately noticed some obvious snafus:

$ src/factor ---debug 340282366920938461286658806734041124249
[using arbitrary-precision arithmetic] 340282366920938461286658806734041124249:[trial division 340282366920938461286658806734041124249] [trial stopped 4999(max)] [is number 34028236692093846128665880673404112424
9 prime?] [no] [pollard-rho (1)] 1:1:2:4:8:16:32:64:128:256:512:1024:2048:4096:8192:16384:32768:65536:131072:262144:524288:1048576:2097152:4194304:8388608:16777216:33554432:67108864:134217728:268435456:536870912
[factor found][rho factor 18446744073709551557] [trial division 18446744073709551556] [factor 2] [factor 2] [factor 11] [factor 137] [factor 547] [trial stopped 4999(max)] [is number 5594472617641 prime?] [trial
  division 5594472617640] [factor 2] [factor 2] [factor 2] [factor 3] [factor 5] [factor 1427] [factor 2131] [trial stopped 2131] [is number 15331 prime?] [yes] [factor 15331] [yes] ...

Look at the ens of the last line.
We don't need to check whether 15331 is prime. We know it is: we just stopped
trial division because next prime divisor to try is larger than
sqrt(number_we_try_to_factor) - therefore number_we_try_to_factor
has no more divisors, ergo it's prime. The fix:

    for (unsigned int i = 1; i <= PRIMES_PTAB_ENTRIES;)
      {
        if (! mpz_divisible_ui_p (t, p))
          {
            p += primes_diff[i++];
            if (mpz_cmp_ui (t, p * p) < 0)
+            mp_factor_insert (factors, t);
+            mpz_set_ui (t, 1);
              break;
          }
        else


... [factor 5594472617641] [factor
  18446744073709551557] [trial division 18446744073709551556] [factor 2] [factor 2] [factor 11] [factor 137] [factor 547] [trial stopped 4999(max)] [is number 5594472617641 prime?] [trial division 5594472617640]
[factor 2] [factor 2] [factor 2] [factor 3] [factor 5] [factor 1427] [factor 2131] [trial stopped 2131] [is number 15331 prime?] [yes] [factor 15331] [yes] [factor 5594472617641] [factor 18446744073709551557] [p
ollard-rho end]
   18446744073709551557 18446744073709551557

Second optimization is this: as you see, I deliberately fed it a square
(the factorization result is 18446744073709551557^2).
It happily did not notice it and spent an awful lot of time in pollard rho:

  1:1:2:4:8:16:32:64:128:256:512:1024:2048:4096:8192:16384:32768:65536:131072:262144:524288:1048576:2097152:4194304:8388608:16777216:33554432:67108864:134217728:268435456:536870912

A quick check for square drastically cuts down execution time:


  /* Use Pollard-rho to compute the prime factors of
     arbitrary-precision T, and put the results in FACTORS.  */
  static void
+mp_factor_big (mpz_t t, struct mp_factors *factors)
+{
+  /* the _big() function assumes trial division was already tried */
+  devmsg ("[is number prime?] ");
+  if (mp_prime_p (t))
+    mp_factor_insert (factors, t);
+  else
+    if (mpz_perfect_square_p (t))
+      {
+        struct mp_factors f2;
+        devmsg ("[it is square] ");
+        mpz_sqrt (t, t);
+        mp_factor_init (&f2);
+        mp_factor_big (t, &f2);
+        for (unsigned long int i = 0; i < f2.nfactors; i++)
+          {
+            unsigned long int e = f2.e[i];
+            while (e--) {                           //
+              mp_factor_insert (factors, f2.p[i]);  //FIXME: obvious optimization:
+              mp_factor_insert (factors, f2.p[i]);  // need to introduce/use mp_factor_insert_multiplicity(..., e*2);
+            }
+          }
+        mp_factor_clear (&f2);
+      }
+    else
+      mp_factor_using_pollard_rho (t, 1, factors);
+}
+
+static void
  mp_factor (mpz_t t, struct mp_factors *factors)
  {
    mp_factor_init (factors);
@@ -2265,11 +2309,7 @@ mp_factor (mpz_t t, struct mp_factors *f
.
        if (mpz_cmp_ui (t, 1) != 0)
          {
-          devmsg ("[is number prime?] ");
-          if (mp_prime_p (t))
-            mp_factor_insert (factors, t);
-          else
-            mp_factor_using_pollard_rho (t, 1, factors);
+          mp_factor_big (t, factors);
          }
      }
  }


--------------CF1065E22818613C93660811
Content-Type: text/x-patch;
 name="zz.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="zz.diff"

diff -urp coreutils-8.32/src/factor.c coreutils-8.32-TT/src/factor.c
--- coreutils-8.32/src/factor.c	2020-01-01 15:13:12.000000000 +0100
+++ coreutils-8.32-TT/src/factor.c	2020-12-20 17:08:33.022081567 +0100
@@ -620,6 +620,8 @@ mp_factor_insert (struct mp_factors *fac
   unsigned long int *e  = factors->e;
   long i;
 
+gmp_printf ("[factor %Zd] ", prime);
+
   /* Locate position for insert new or increment e.  */
   for (i = nfactors - 1; i >= 0; i--)
     {
@@ -842,7 +844,8 @@ mp_factor_using_division (mpz_t t, struc
   mpz_t q;
   unsigned long int p;
 
-  devmsg ("[trial division] ");
+gmp_printf ("[trial division %Zd] ", t);
+//  devmsg ("[trial division] ");
 
   mpz_init (q);
 
@@ -861,6 +864,8 @@ mp_factor_using_division (mpz_t t, struc
         {
           p += primes_diff[i++];
           if (mpz_cmp_ui (t, p * p) < 0)
+            mp_factor_insert (factors, t);
+            mpz_set_ui (t, 1);
             break;
         }
       else
@@ -1407,6 +1412,7 @@ mp_prime_p (mpz_t n)
   if (!mp_millerrabin (n, nm1, a, tmp, q, k))
     {
       is_prime = false;
+gmp_printf ("[ML1:no] ");
       goto ret2;
     }
 
@@ -1414,6 +1420,7 @@ mp_prime_p (mpz_t n)
     {
       /* Factor n-1 for Lucas.  */
       mpz_set (tmp, nm1);
+gmp_printf ("[ML does not know, trying Lucas] ");
       mp_factor (tmp, &factors);
     }
 
@@ -1438,13 +1445,16 @@ mp_prime_p (mpz_t n)
         }
 
       if (is_prime)
+{gmp_printf ("[Lucas:yes] ");
         goto ret1;
+}
 
       mpz_add_ui (a, a, primes_diff[r]);        /* Establish new base.  */
 
       if (!mp_millerrabin (n, nm1, a, tmp, q, k))
         {
           is_prime = false;
+gmp_printf ("[ML2:no] ");
           goto ret1;
         }
     }
@@ -1692,6 +1702,7 @@ mp_factor_using_pollard_rho (mpz_t n, un
     {
       for (;;)
         {
+printf("%llu", k);
           do
             {
               mpz_mul (t, x, x);
@@ -1715,6 +1726,7 @@ mp_factor_using_pollard_rho (mpz_t n, un
           mpz_set (z, x);
           k = l;
           l = 2 * l;
+devmsg(":");
           for (unsigned long long int i = 0; i < k; i++)
             {
               mpz_mul (t, x, x);
@@ -1725,6 +1737,7 @@ mp_factor_using_pollard_rho (mpz_t n, un
         }
 
     factor_found:
+gmp_printf("[factor found]");
       do
         {
           mpz_mul (t, y, y);
@@ -1738,6 +1751,7 @@ mp_factor_using_pollard_rho (mpz_t n, un
 
       mpz_divexact (n, n, t);   /* divide by t, before t is overwritten */
 
+gmp_printf("[rho factor %Zd] ", t);
       if (!mp_prime_p (t))
         {
           devmsg ("[composite factor--restarting pollard-rho] ");
@@ -1759,6 +1773,7 @@ mp_factor_using_pollard_rho (mpz_t n, un
       mpz_mod (y, y, n);
     }
 
+  devmsg ("[pollard-rho end] ");
   mpz_clears (P, t2, t, z, x, y, NULL);
 }
 #endif
@@ -2255,6 +2270,35 @@ factor (uintmax_t t1, uintmax_t t0, stru
 /* Use Pollard-rho to compute the prime factors of
    arbitrary-precision T, and put the results in FACTORS.  */
 static void
+mp_factor_big (mpz_t t, struct mp_factors *factors)
+{
+  /* the _big() function assumes trial division was already tried */
+  devmsg ("[is number prime?] ");
+  if (mp_prime_p (t))
+    mp_factor_insert (factors, t);
+  else
+    if (mpz_perfect_square_p (t))
+      {
+        struct mp_factors f2;
+        devmsg ("[it is square] ");
+        mpz_sqrt (t, t);
+        mp_factor_init (&f2);
+        mp_factor_big (t, &f2);
+        for (unsigned long int i = 0; i < f2.nfactors; i++)
+          {
+            unsigned long int e = f2.e[i];
+            while (e--) {
+              mp_factor_insert (factors, f2.p[i]);
+              mp_factor_insert (factors, f2.p[i]);
+            }
+          }
+        mp_factor_clear (&f2);
+      }
+    else
+      mp_factor_using_pollard_rho (t, 1, factors);
+}
+
+static void
 mp_factor (mpz_t t, struct mp_factors *factors)
 {
   mp_factor_init (factors);
@@ -2265,11 +2309,7 @@ mp_factor (mpz_t t, struct mp_factors *f
 
       if (mpz_cmp_ui (t, 1) != 0)
         {
-          devmsg ("[is number prime?] ");
-          if (mp_prime_p (t))
-            mp_factor_insert (factors, t);
-          else
-            mp_factor_using_pollard_rho (t, 1, factors);
+          mp_factor_big (t, factors);
         }
     }
 }
@@ -2599,6 +2639,8 @@ do_stdin (void)
 int
 main (int argc, char **argv)
 {
+  setvbuf(stdout, NULL, _IONBF, 0);
+
   initialize_main (&argc, &argv);
   set_program_name (argv[0]);
   setlocale (LC_ALL, "");

--------------CF1065E22818613C93660811--





Message sent:


Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Mailer: MIME-tools 5.505 (Entity 5.505)
Content-Type: text/plain; charset=utf-8
X-Loop: help-debbugs@HIDDEN
From: help-debbugs@HIDDEN (GNU bug Tracking System)
To: Denys Vlasenko <dvlasenk@HIDDEN>
Subject: bug#45343: Acknowledgement (factor: missed optimizations)
Message-ID: <handler.45343.B.160848752426818.ack <at> debbugs.gnu.org>
References: <9436ddae-b31b-7e1f-30a2-b56746344d85@HIDDEN>
X-Gnu-PR-Message: ack 45343
X-Gnu-PR-Package: coreutils
Reply-To: 45343 <at> debbugs.gnu.org
Date: Sun, 20 Dec 2020 18:06:01 +0000

Thank you for filing a new bug report with debbugs.gnu.org.

This is an automatically generated reply to let you know your message
has been received.

Your message is being forwarded to the package maintainers and other
interested parties for their attention; they will reply in due course.

Your message has been sent to the package maintainer(s):
 bug-coreutils@HIDDEN

If you wish to submit further information on this problem, please
send it to 45343 <at> debbugs.gnu.org.

Please do not send mail to help-debbugs@HIDDEN unless you wish
to report a problem with the Bug-tracking system.

--=20
45343: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D45343
GNU Bug Tracking System
Contact help-debbugs@HIDDEN with problems



Last modified: Sun, 20 Dec 2020 18:15:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.