GNU bug report logs - #48114
Disarchive occasionally fails tests

Previous Next

Package: guix;

Reported by: Ludovic Courtès <ludovic.courtes <at> inria.fr>

Date: Fri, 30 Apr 2021 10:02:02 UTC

Severity: normal

Done: Timothy Sample <samplet <at> ngyro.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 48114 in the body.
You can then email your comments to 48114 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to samplet <at> ngyro.com, bug-guix <at> gnu.org:
bug#48114; Package guix. (Fri, 30 Apr 2021 10:02:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ludovic Courtès <ludovic.courtes <at> inria.fr>:
New bug report received and forwarded. Copy sent to samplet <at> ngyro.com, bug-guix <at> gnu.org. (Fri, 30 Apr 2021 10:02:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludovic.courtes <at> inria.fr>
To: <bug-guix <at> gnu.org>
Subject: Disarchive occasionally fails tests
Date: Fri, 30 Apr 2021 12:00:36 +0200
Hi Timothy,

Disarchive 0.2.0 occasionally fails two tests:

  FAIL: tests/kinds/octal.scm - [prop] Writing is reversible
  FAIL: tests/kinds/octal.scm - [prop] Serializing is reversible

(Thanks, Quickcheck! :-))

I added ‘pk’ calls like so:

--8<---------------cut here---------------start------------->8---
(test-assert "[prop] Writing is reversible"
  (quickcheck
   (property ((octal $octal))
     (test-when (valid-octal? octal)
       (begin
         (equal? (pk 'oct octal) (pk 'decode (decode-octal (encode-octal octal)))))))))

(test-assert "[prop] Serializing is reversible"
  (quickcheck
   (property ((octal $octal))
     (test-when (valid-octal? octal)
       (equal? (pk 'OCT octal) (pk 'DECODE (serdeser -octal- octal)))))))
--8<---------------cut here---------------end--------------->8---

and got this output:

--8<---------------cut here---------------start------------->8---
;;; (oct #<<unstructured-octal> value: 0 source: #<<zero-string> value: "\U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" trailer: #vu8(172 156 23 48 25 29 159 226 210)>>)

;;; (decode #<<unstructured-octal> value: 0 source: #<<zero-string> value: "\U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" trailer: #vu8(172 156 23 48 25 29 159 226 210)>>)
actual-value: #f
actual-error:
+ (out-of-range
+   #f
+   "Value out of range ~S to ~S: ~S"
+   (8 9 10)
+   (10))
result: FAIL

[…]

;;; (OCT #<<unstructured-octal> value: 0 source: #<<zero-string> value: "\U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" trailer: #vu8(172 156 23 48 25 29 159 226 210)>>)

;;; (DECODE #<<unstructured-octal> value: 0 source: #<<zero-string> value: "\U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" trailer: #vu8(172 156 23 48 25 29 159 226 210)>>)
actual-value: #f
actual-error:
+ (out-of-range
+   #f
+   "Value out of range ~S to ~S: ~S"
+   (8 9 10)
+   (10))
result: FAIL
--8<---------------cut here---------------end--------------->8---

I’m not sure where the exception comes from though.

Thoughts?

Thanks,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#48114; Package guix. (Fri, 30 Apr 2021 19:51:01 GMT) Full text and rfc822 format available.

Message #8 received at 48114 <at> debbugs.gnu.org (full text, mbox):

From: Timothy Sample <samplet <at> ngyro.com>
To: Ludovic Courtès <ludovic.courtes <at> inria.fr>
Cc: 48114 <at> debbugs.gnu.org
Subject: Re: bug#48114: Disarchive occasionally fails tests
Date: Fri, 30 Apr 2021 15:49:52 -0400
Hey,

Ludovic Courtès <ludovic.courtes <at> inria.fr> writes:

> Disarchive 0.2.0 occasionally fails two tests:
>
>   FAIL: tests/kinds/octal.scm - [prop] Writing is reversible
>   FAIL: tests/kinds/octal.scm - [prop] Serializing is reversible

These two tests have a bit of a problem.  They occasionally fail by
“giving up”, which is when too many test cases are discarded rather than
used.  (This happens because you might write a generator for a superset
of the values you’re interested in, and then filter out some values with
“test-when”.)  I don’t think this is happening here, though.  You would
see something like “Gave up! Passed only 0 ests [sic].”

> I added ‘pk’ calls like so:
>
> (test-assert "[prop] Writing is reversible"
>   (quickcheck
>    (property ((octal $octal))
>      (test-when (valid-octal? octal)
>        (begin
>          (equal? (pk 'oct octal) (pk 'decode (decode-octal (encode-octal octal)))))))))
>
> (test-assert "[prop] Serializing is reversible"
>   (quickcheck
>    (property ((octal $octal))
>      (test-when (valid-octal? octal)
>        (equal? (pk 'OCT octal) (pk 'DECODE (serdeser -octal- octal)))))))
>
>
> and got this output:
>
> ;;; (oct #<<unstructured-octal> value: 0 source: #<<zero-string> value: "\U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" trailer: #vu8(172 156 23 48 25 29 159 226 210)>>)
>
> ;;; (decode #<<unstructured-octal> value: 0 source: #<<zero-string> value: "\U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" trailer: #vu8(172 156 23 48 25 29 159 226 210)>>)
> actual-value: #f
> actual-error:
> + (out-of-range
> +   #f
> +   "Value out of range ~S to ~S: ~S"
> +   (8 9 10)
> +   (10))
> result: FAIL
>
> […]
>
> ;;; (OCT #<<unstructured-octal> value: 0 source: #<<zero-string> value: "\U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" trailer: #vu8(172 156 23 48 25 29 159 226 210)>>)
>
> ;;; (DECODE #<<unstructured-octal> value: 0 source: #<<zero-string> value: "\U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" trailer: #vu8(172 156 23 48 25 29 159 226 210)>>)
> actual-value: #f
> actual-error:
> + (out-of-range
> +   #f
> +   "Value out of range ~S to ~S: ~S"
> +   (8 9 10)
> +   (10))
> result: FAIL
>
> I’m not sure where the exception comes from though.

I can’t seem to reproduce this.  I’ve run the test suite many, many
times, but I also tried:

    ,use (disarchive kinds octal)
    ,use (disarchive kinds zero-string)
    ,use (disarchive serialization)
    (define the-zero-string
      (make-zero-string
       "\U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b"
       #vu8(172 156 23 48 25 29 159 226 210)))
    (define the-octal
      (make-unstructured-octal 0 the-zero-string))
    (equal? the-octal (decode-octal (encode-octal the-octal)))
    (equal? the-octal (serdeser -octal- the-octal))

Which works fine.  (Does it work for you?)

However, isn’t it possible that these values aren’t the culprits?  With
the “pk” calls you added, isn’t it printing the last OK value without
telling us the value causing the issue?

What if you run it with the following?

    (test-assert "[prop] Writing is reversible"
      (quickcheck
       (property ((octal $octal))
         (test-when (valid-octal? octal)
           (false-if-exception  ; <-- changed!
             (equal? octal (decode-octal (encode-octal octal))))))))

This way, Guile-QuickCheck should print the offending value and the seed
used for the tests, which could be helpful for reproducing.  (The fact
that it doesn’t handle exceptions well is a known bug!)


-- Tim




Information forwarded to bug-guix <at> gnu.org:
bug#48114; Package guix. (Sun, 02 May 2021 19:58:02 GMT) Full text and rfc822 format available.

Message #11 received at 48114 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Timothy Sample <samplet <at> ngyro.com>
Cc: 48114 <at> debbugs.gnu.org
Subject: Re: bug#48114: Disarchive occasionally fails tests
Date: Sun, 02 May 2021 21:57:43 +0200
Hello!

Timothy Sample <samplet <at> ngyro.com> skribis:

> I can’t seem to reproduce this.  I’ve run the test suite many, many
> times, but I also tried:

I can reproduce it quickly with:

  while make check TESTS=tests/kinds/octal.scm -j5 ; do : ; done

… in C locale (LC_ALL & co. all unset).

> However, isn’t it possible that these values aren’t the culprits?  With
> the “pk” calls you added, isn’t it printing the last OK value without
> telling us the value causing the issue?

You’re right, the values printed are not the culprit.  The problem comes
from the generator (I had to raise the (quickcheck …) form out of
‘test-assert’ so I could get a backtrace):

--8<---------------cut here---------------start------------->8---
Backtrace:
          13 (primitive-load "/data/src/disarchive/./build-aux/test-driver.scm")
In ice-9/eval.scm:
    619:8 12 (_ #(#(#<directory (guile-user) 7fccb09d9f00> ((() "./tests/kinds/octal.scm") (# . "no") (# . #) ?)) #))
    619:8 11 (_ #(#(#(#(#(#(#(#(#<directory (guile-user) 7fccb09d9f00> ("./tests/kinds/octal?") ?)) ?) ?) ?) ?) ?) ?))
In ice-9/boot-9.scm:
    142:2 10 (dynamic-wind _ _ #<procedure 7fccaf5b81a0 at ice-9/eval.scm:330:13 ()>)
In unknown file:
           9 (primitive-load "./tests/kinds/octal.scm")
In quickcheck.scm:
    118:6  8 (check #<<quickcheck-config> seed: 321557891 stop?: #<procedure 7fccaf8c3540 at ice-9/eval.scm:336:13?> ?)
    98:12  7 (check-results _ #<<property> names: (octal) gen/arbs: (#<<arbitrary> gen: #<<generator> proc: #<proce?>)
In quickcheck/generator.scm:
     65:2  6 (_ 7 #<<rng-state> start: #(1907167801 2749187034 1190323419 1039883844 766725436 3567744198) s1: #(29?>)
     65:2  5 (_ 7 #<<rng-state> start: #(1907167801 2749187034 1190323419 1039883844 766725436 3567744198) s1: #(29?>)
    78:17  4 (_ 7 #<<rng-state> start: #(1907167801 2749187034 1190323419 1039883844 766725436 3567744198) s1: #(28?>)
   105:22  3 (_ _)
In tests/kinds.scm:
    84:22  2 (fix-unstructured-octal-value #<<unstructured-octal> value: 7 source: #<<zero-string> value: "\U0f99aa?>)
    86:47  1 (_ _)
In unknown file:
           0 (substring "\U0f99aa?\U0ff7c1\U0fb97a\U0ff933?\U0fe7a1" 6 8)

ERROR: In procedure substring:
Value out of range 6 to 7: 8
--8<---------------cut here---------------end--------------->8---

Note that this is in C locale, which may mean that ‘regexp-exec’, which
passes strings to libc, gets offsets wrong somehow (see
‘fixup_multibyte_match’ in libguile), though I couldn’t reproduce it
with the string above.

Anyway, ‘guix build disarchive’ builds in en_US.utf8 locale, so the
thing above is probably a wrong lead.

If I switch to en_US.utf8, I occasionally get the following error
instead:

--8<---------------cut here---------------start------------->8---
test-name: [prop] Serializing is reversible
location: tests/kinds/octal.scm:154
source:
+ (test-assert
+   "[prop] Serializing is reversible"
+   (quickcheck
+     (property
+       ((octal $octal))
+       (test-when
+         (valid-octal? octal)
+         (equal?
+           (pk 'OCT octal)
+           (pk 'DECODE (serdeser -octal- octal)))))))

;;; (OCT #<<unstructured-octal> value: 0 source: #<<zero-string> value: "" trailer: "">>)

;;; (DECODE #<<unstructured-octal> value: 0 source: #<<zero-string> value: "" trailer: "">>)
Gave up! Passed only 1 est.
actual-value: #f
result: FAIL
--8<---------------cut here---------------end--------------->8---

This is more in line with what you described.  Any ideas on how to
address that?

Thanks,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#48114; Package guix. (Mon, 03 May 2021 02:25:01 GMT) Full text and rfc822 format available.

Message #14 received at 48114 <at> debbugs.gnu.org (full text, mbox):

From: Timothy Sample <samplet <at> ngyro.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 48114 <at> debbugs.gnu.org
Subject: Re: bug#48114: Disarchive occasionally fails tests
Date: Sun, 02 May 2021 22:24:04 -0400
Hi,

Ludovic Courtès <ludo <at> gnu.org> writes:

[...]

> ERROR: In procedure substring:
> Value out of range 6 to 7: 8
>
> Note that this is in C locale, which may mean that ‘regexp-exec’, which
> passes strings to libc, gets offsets wrong somehow (see
> ‘fixup_multibyte_match’ in libguile), though I couldn’t reproduce it
> with the string above.

I’m still looking into this, but I wanted to quickly post this
reproducer for the Guile bug:

    (use-modules (ice-9 regex))
    (define str "\U101514\U103ab0\U0f6e6e\U02e278\U01d9eb\U10b996\U1089b5\uea15\U0fa074\U101e41\U02e330\u0177\u2492")
    (match:substring (string-match "[0-8]+" str))

This triggers the out-of-range error when run with “LC_ALL=C”.


-- Tim




Information forwarded to bug-guix <at> gnu.org:
bug#48114; Package guix. (Mon, 03 May 2021 04:03:02 GMT) Full text and rfc822 format available.

Message #17 received at 48114 <at> debbugs.gnu.org (full text, mbox):

From: Timothy Sample <samplet <at> ngyro.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 48114 <at> debbugs.gnu.org
Subject: Re: bug#48114: Disarchive occasionally fails tests
Date: Mon, 03 May 2021 00:02:09 -0400
Timothy Sample <samplet <at> ngyro.com> writes:

> I’m still looking into this, but I wanted to quickly post this
> reproducer for the Guile bug:
>
>     (use-modules (ice-9 regex))
>     (define str
> "\U101514\U103ab0\U0f6e6e\U02e278\U01d9eb\U10b996\U1089b5\uea15\U0fa074\U101e41\U02e330\u0177\u2492")
>     (match:substring (string-match "[0-8]+" str))
>
> This triggers the out-of-range error when run with “LC_ALL=C”.

It turns out that all that’s needed is the last code point, which is
“Number Eleven Full Stop”, or ‘⒒’.  When Guile converts this to an ASCII
C string using ‘u32_conv_from_encoding’, it becomes “11.”.  The regex
(“[0-8]+”) matches the “11” part with start index 0 and end index 2.
The ‘fixup_multibyte_match’ function does nothing (it only matters when
the locale encoding is multibyte) [1].  Guile then builds the match
vector with the original string but keeps the ASCII offsets.  In other
words, it thinks the match substring goes from 0 to 2 in a single code
point string:

    ,use (ice-9 regex)
    (string-match "11" "\u2492")
    => #("\u2492" (0 . 2))

I’m not sure there’s any way to solve this nicely in Guile.  It would be
clearer if the match vector included the string as libc matched it, but
it’s still surprising that the match happens with a different string.

In Disarchive, I can rewrite the generator without regex.  I’ll do that
and see what I can do about the “Gave up!” issue.

[1] It works on the converted-to-ASCII C string, which means that the
byte offsets and code point offsets are the same.  Hence, it has nothing
to do.


-- Tim




Information forwarded to bug-guix <at> gnu.org:
bug#48114; Package guix. (Mon, 03 May 2021 06:21:01 GMT) Full text and rfc822 format available.

Message #20 received at 48114 <at> debbugs.gnu.org (full text, mbox):

From: Bengt Richter <bokr <at> bokr.com>
To: Timothy Sample <samplet <at> ngyro.com>
Cc: 48114 <at> debbugs.gnu.org, Ludovic Courtès <ludo <at> gnu.org>
Subject: Re: bug#48114: Disarchive occasionally fails tests
Date: Mon, 3 May 2021 08:19:50 +0200
Hi Timothy, Ludo,

On +2021-05-03 00:02:09 -0400, Timothy Sample wrote:
> Timothy Sample <samplet <at> ngyro.com> writes:
> 
> > I’m still looking into this, but I wanted to quickly post this
> > reproducer for the Guile bug:
> >
> >     (use-modules (ice-9 regex))
> >     (define str
> > "\U101514\U103ab0\U0f6e6e\U02e278\U01d9eb\U10b996\U1089b5\uea15\U0fa074\U101e41\U02e330\u0177\u2492")
> >     (match:substring (string-match "[0-8]+" str))
> >
> > This triggers the out-of-range error when run with “LC_ALL=C”.
> 
> It turns out that all that’s needed is the last code point, which is
> “Number Eleven Full Stop”, or ‘⒒’.  When Guile converts this to an ASCII
> C string using ‘u32_conv_from_encoding’, it becomes “11.”.  The regex
> (“[0-8]+”) matches the “11” part with start index 0 and end index 2.
> The ‘fixup_multibyte_match’ function does nothing (it only matters when
> the locale encoding is multibyte) [1].  Guile then builds the match
> vector with the original string but keeps the ASCII offsets.  In other
> words, it thinks the match substring goes from 0 to 2 in a single code
> point string:
> 
>     ,use (ice-9 regex)
>     (string-match "11" "\u2492")
>     => #("\u2492" (0 . 2))
> 
> I’m not sure there’s any way to solve this nicely in Guile.  It would be
> clearer if the match vector included the string as libc matched it, but
> it’s still surprising that the match happens with a different string.
> 
> In Disarchive, I can rewrite the generator without regex.  I’ll do that
> and see what I can do about the “Gave up!” issue.
> 
> [1] It works on the converted-to-ASCII C string, which means that the
> byte offsets and code point offsets are the same.  Hence, it has nothing
> to do.
> 
> 
> -- Tim
>

> 
> 
What happens with these?
(code ppoints in decimal)

    8554 _Ⅺ_ "ROMAN NUMERAL ELEVEN"
    8570 _ⅺ_ "SMALL ROMAN NUMERAL ELEVEN"
    9322 _⑪_ "CIRCLED NUMBER ELEVEN"
    9342 _⑾_ "PARENTHESIZED NUMBER ELEVEN"
    9362 _⒒_ "NUMBER ELEVEN FULL STOP"
    9451 _⓫_ "NEGATIVE CIRCLED NUMBER ELEVEN"
   13155 _㍣_ "IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR ELEVEN"
   13290 _㏪_ "IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY ELEVEN"

I would argue that none of these should be "decoded" into ascii polyglyphs
since they are atomic character glyphs. IMO It is over-eager transformation
to make them into ascii polyglyphs.

/Super/sub/-script placement metadata is another thing to consider --
"decode" to ascii art?? ;-)

Unicode characters representing mathematical values in
other languages are different. Those are subject to natural language
translation with locale-dependent semantics.

These might be candidates for that?:
(code points in decimal)

    8544 _Ⅰ_ "ROMAN NUMERAL ONE"
    8545 _Ⅱ_ "ROMAN NUMERAL TWO"
    8546 _Ⅲ_ "ROMAN NUMERAL THREE"
    8547 _Ⅳ_ "ROMAN NUMERAL FOUR"
    8548 _Ⅴ_ "ROMAN NUMERAL FIVE"
    8549 _Ⅵ_ "ROMAN NUMERAL SIX"
    8550 _Ⅶ_ "ROMAN NUMERAL SEVEN"
    8551 _Ⅷ_ "ROMAN NUMERAL EIGHT"
    8552 _Ⅸ_ "ROMAN NUMERAL NINE"
    8553 _Ⅹ_ "ROMAN NUMERAL TEN"
    8554 _Ⅺ_ "ROMAN NUMERAL ELEVEN"
    8555 _Ⅻ_ "ROMAN NUMERAL TWELVE"
    8556 _Ⅼ_ "ROMAN NUMERAL FIFTY"
    8557 _Ⅽ_ "ROMAN NUMERAL ONE HUNDRED"
    8558 _Ⅾ_ "ROMAN NUMERAL FIVE HUNDRED"
    8559 _Ⅿ_ "ROMAN NUMERAL ONE THOUSAND"
    8560 _ⅰ_ "SMALL ROMAN NUMERAL ONE"
    8561 _ⅱ_ "SMALL ROMAN NUMERAL TWO"
    8562 _ⅲ_ "SMALL ROMAN NUMERAL THREE"
    8563 _ⅳ_ "SMALL ROMAN NUMERAL FOUR"
    8564 _ⅴ_ "SMALL ROMAN NUMERAL FIVE"
    8565 _ⅵ_ "SMALL ROMAN NUMERAL SIX"
    8566 _ⅶ_ "SMALL ROMAN NUMERAL SEVEN"
    8567 _ⅷ_ "SMALL ROMAN NUMERAL EIGHT"
    8568 _ⅸ_ "SMALL ROMAN NUMERAL NINE"
    8569 _ⅹ_ "SMALL ROMAN NUMERAL TEN"
    8570 _ⅺ_ "SMALL ROMAN NUMERAL ELEVEN"
    8571 _ⅻ_ "SMALL ROMAN NUMERAL TWELVE"
    8572 _ⅼ_ "SMALL ROMAN NUMERAL FIFTY"
    8573 _ⅽ_ "SMALL ROMAN NUMERAL ONE HUNDRED"
    8574 _ⅾ_ "SMALL ROMAN NUMERAL FIVE HUNDRED"
    8575 _ⅿ_ "SMALL ROMAN NUMERAL ONE THOUSAND"
    8576 _ↀ_ "ROMAN NUMERAL ONE THOUSAND C D"
    8577 _ↁ_ "ROMAN NUMERAL FIVE THOUSAND"
    8578 _ↂ_ "ROMAN NUMERAL TEN THOUSAND"
    8579 _Ↄ_ "ROMAN NUMERAL REVERSED ONE HUNDRED"
    8581 _ↅ_ "ROMAN NUMERAL SIX LATE FORM"
    8582 _ↆ_ "ROMAN NUMERAL FIFTY EARLY FORM"
    8583 _ↇ_ "ROMAN NUMERAL FIFTY THOUSAND"
    8584 _ↈ_ "ROMAN NUMERAL ONE HUNDRED THOUSAND"
   12321 _〡_ "HANGZHOU NUMERAL ONE"
   12322 _〢_ "HANGZHOU NUMERAL TWO"
   12323 _〣_ "HANGZHOU NUMERAL THREE"
   12324 _〤_ "HANGZHOU NUMERAL FOUR"
   12325 _〥_ "HANGZHOU NUMERAL FIVE"
   12326 _〦_ "HANGZHOU NUMERAL SIX"
   12327 _〧_ "HANGZHOU NUMERAL SEVEN"
   12328 _〨_ "HANGZHOU NUMERAL EIGHT"
   12329 _〩_ "HANGZHOU NUMERAL NINE"
   12344 _〸_ "HANGZHOU NUMERAL TEN"
   12345 _〹_ "HANGZHOU NUMERAL TWENTY"
   12346 _〺_ "HANGZHOU NUMERAL THIRTY"

Just my intuitive reaction, no academic creds to back it up ;)

-- 
Regards,
Bengt Richter




Information forwarded to bug-guix <at> gnu.org:
bug#48114; Package guix. (Mon, 03 May 2021 20:05:02 GMT) Full text and rfc822 format available.

Message #23 received at 48114 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Timothy Sample <samplet <at> ngyro.com>
Cc: 48114 <at> debbugs.gnu.org
Subject: Re: bug#48114: Disarchive occasionally fails tests
Date: Mon, 03 May 2021 22:03:59 +0200
Hi!

Timothy Sample <samplet <at> ngyro.com> skribis:

> Timothy Sample <samplet <at> ngyro.com> writes:
>
>> I’m still looking into this, but I wanted to quickly post this
>> reproducer for the Guile bug:
>>
>>     (use-modules (ice-9 regex))
>>     (define str
>> "\U101514\U103ab0\U0f6e6e\U02e278\U01d9eb\U10b996\U1089b5\uea15\U0fa074\U101e41\U02e330\u0177\u2492")
>>     (match:substring (string-match "[0-8]+" str))
>>
>> This triggers the out-of-range error when run with “LC_ALL=C”.
>
> It turns out that all that’s needed is the last code point, which is
> “Number Eleven Full Stop”, or ‘⒒’.

Whaaat? “Number Eleven Full Stop”, I wonder how the Unicode folks came
up with that one.  ㊷ = ㉚ + ⒓

> When Guile converts this to an ASCII C string using
> ‘u32_conv_from_encoding’, it becomes “11.”.  The regex (“[0-8]+”)
> matches the “11” part with start index 0 and end index 2.  The
> ‘fixup_multibyte_match’ function does nothing (it only matters when
> the locale encoding is multibyte) [1].  Guile then builds the match
> vector with the original string but keeps the ASCII offsets.  In other
> words, it thinks the match substring goes from 0 to 2 in a single code
> point string:
>
>     ,use (ice-9 regex)
>     (string-match "11" "\u2492")
>     => #("\u2492" (0 . 2))
>
> I’m not sure there’s any way to solve this nicely in Guile.  It would be
> clearer if the match vector included the string as libc matched it, but
> it’s still surprising that the match happens with a different string.

Yeah, I don’t think there’s much we can do.  It’s a lot of fun anyway.

Thanks for investigating!

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#48114; Package guix. (Thu, 13 May 2021 21:05:01 GMT) Full text and rfc822 format available.

Message #26 received at 48114 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Timothy Sample <samplet <at> ngyro.com>
Cc: 48114 <at> debbugs.gnu.org
Subject: Re: bug#48114: Disarchive occasionally fails tests
Date: Thu, 13 May 2021 23:04:26 +0200
Hi!

Timothy Sample <samplet <at> ngyro.com> skribis:

> In Disarchive, I can rewrite the generator without regex.  I’ll do that
> and see what I can do about the “Gave up!” issue.

Did you have a chance to look into it?

I’d like to make ‘guix’ and ‘guix-daemon’ depend on Disarchive, but not
before we can be sure its test suite passes.

Thanks,
Ludo’.




Reply sent to Timothy Sample <samplet <at> ngyro.com>:
You have taken responsibility. (Fri, 14 May 2021 03:07:01 GMT) Full text and rfc822 format available.

Notification sent to Ludovic Courtès <ludovic.courtes <at> inria.fr>:
bug acknowledged by developer. (Fri, 14 May 2021 03:07:02 GMT) Full text and rfc822 format available.

Message #31 received at 48114-done <at> debbugs.gnu.org (full text, mbox):

From: Timothy Sample <samplet <at> ngyro.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 48114-done <at> debbugs.gnu.org
Subject: Re: bug#48114: Disarchive occasionally fails tests
Date: Thu, 13 May 2021 23:06:07 -0400
Heyo,

Ludovic Courtès <ludo <at> gnu.org> writes:

> Timothy Sample <samplet <at> ngyro.com> skribis:
>
>> In Disarchive, I can rewrite the generator without regex.  I’ll do that
>> and see what I can do about the “Gave up!” issue.
>
> Did you have a chance to look into it?

I just pushed b9f0e78238e6186d28d738c7c5355a56557ce84f, which updates
Disarchive to 0.2.1, which has fixes for the test suite.  The giving up
problem has not been solved outright, but it should be practically
impossible to trigger.  (In fact, it probably *is* impossible to trigger
given how few PRNG states there are....)

> I’d like to make ‘guix’ and ‘guix-daemon’ depend on Disarchive, but not
> before we can be sure its test suite passes.

Exciting!


-- Tim




Information forwarded to bug-guix <at> gnu.org:
bug#48114; Package guix. (Fri, 14 May 2021 13:52:01 GMT) Full text and rfc822 format available.

Message #34 received at 48114-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Timothy Sample <samplet <at> ngyro.com>
Cc: 48114-done <at> debbugs.gnu.org
Subject: Re: bug#48114: Disarchive occasionally fails tests
Date: Fri, 14 May 2021 15:51:10 +0200
Hi Timothy,

Timothy Sample <samplet <at> ngyro.com> skribis:

> Ludovic Courtès <ludo <at> gnu.org> writes:
>
>> Timothy Sample <samplet <at> ngyro.com> skribis:
>>
>>> In Disarchive, I can rewrite the generator without regex.  I’ll do that
>>> and see what I can do about the “Gave up!” issue.
>>
>> Did you have a chance to look into it?
>
> I just pushed b9f0e78238e6186d28d738c7c5355a56557ce84f, which updates
> Disarchive to 0.2.1, which has fixes for the test suite.  The giving up
> problem has not been solved outright, but it should be practically
> impossible to trigger.  (In fact, it probably *is* impossible to trigger
> given how few PRNG states there are....)

Yay!  Thanks for the quick reply!

I’ll have ‘guix’ depend on Disarchive and report back.

Ludo’.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 12 Jun 2021 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 317 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.