GNU bug report logs - #51332
[PATCH 0/2] Detect early and gracefully handle invalid Texinfo

Previous Next

Package: guix-patches;

Reported by: Ludovic Courtès <ludo <at> gnu.org>

Date: Fri, 22 Oct 2021 12:42:01 UTC

Severity: normal

Tags: patch

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 51332 in the body.
You can then email your comments to 51332 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to guix-patches <at> gnu.org:
bug#51332; Package guix-patches. (Fri, 22 Oct 2021 12:42:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ludovic Courtès <ludo <at> gnu.org>:
New bug report received and forwarded. Copy sent to guix-patches <at> gnu.org. (Fri, 22 Oct 2021 12:42:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: guix-patches <at> gnu.org
Cc: Ludovic Courtès <ludo <at> gnu.org>
Subject: [PATCH 0/2] Detect early and gracefully handle invalid Texinfo
Date: Fri, 22 Oct 2021 14:40:52 +0200
Hello!

It’s a fact that we occasionally push invalid Texinfo markup in
package descriptions/synopses, probably even more so in external
channels, and despite the fact that ‘guix lint’ flags it.

The problem is that some of the tools were designed around the idea
that invalid Texinfo “does not happen”.  For example, if a single
package contains invalid markup, ‘guix search’ and ‘guix show’ crash
badly:

--8<---------------cut here---------------start------------->8---
$ guix search ghc citations
name: ghc-citeproc
version: 0.4.0.1
outputs: out
systems: x86_64-linux i686-linux
dependencies: ghc-aeson-pretty <at> 0.8.8 ghc-aeson <at> 1.5.6.0 ghc-attoparsec <at> 0.13.2.5
+ ghc-base-compat <at> 0.11.2 ghc-case-insensitive <at> 1.2.1.0 ghc-data-default <at> 0.7.1.1 ghc-diff <at> 0.4.0
+ ghc-file-embed <at> 0.0.15.0 ghc-pandoc-types <at> 1.22 ghc-safe <at> 0.3.19 ghc-scientific <at> 0.3.7.0
+ ghc-timeit <at> 2.0 ghc-unicode-collation <at> 0.1.3 ghc-uniplate <at> 1.6.13 ghc-vector <at> 0.12.3.0
+ ghc-xml-conduit <at> 1.9.1.1
location: gnu/packages/haskell-xyz.scm:15823:2
homepage: https://hackage.haskell.org/package/citeproc
license: FreeBSD
synopsis: Generate citations and bibliography from CSL styles  
Backtrace:
          13 (primitive-load "/home/ludo/.config/guix/current/bin/gu…")
In guix/ui.scm:
   2185:7 12 (run-guix . _)
  2148:10 11 (run-guix-command _ . _)
In ice-9/boot-9.scm:
  1752:10 10 (with-exception-handler _ _ #:unwind? _ # _)
In guix/scripts/package.scm:
    896:9  9 (_)
In ice-9/boot-9.scm:
  1747:15  8 (with-exception-handler #<procedure 7fb7f469a6c0 at ic…> …)
In guix/ui.scm:
  1677:23  7 (call-with-paginated-output-port _ #:less-options _)
  1712:11  6 (_ #<output: #{write pipe}# 15>)
  1558:14  5 (package->recutils _ #<output: #{write pipe}# 15> _ # _ …)
  1432:23  4 (texi->plain-text _)
In texinfo.scm:
  1132:22  3 (parse _)
   967:36  2 (loop #<input: string 7fb7f4a4bc40> (*fragment*) #<pro…> …)
     92:2  1 (command-spec _)
In ice-9/boot-9.scm:
  1685:16  0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Throw to key `parser-error' with args `(#f "Unknown command" urefhttps)'.
--8<---------------cut here---------------end--------------->8---

(This one was fixed in c3c502896b1454b345ee9f17d20063853652a35a.)

This series does two things:

  1. Emit a warning when invalid markup is encountered but keep going.

  2. Raise a syntax error, at macro-expansion time, when invalid markup
     is encountered.

Obviously #2 incurs some overhead, since it parses Texinfo strings at
expansion time, so it’s enabled only when ‘GUIX_UNINSTALLED’ is set—that
is, when working on a checkout with ./pre-inst-env.  The expanded code
is exactly the same as before though, without any overhead.  Concretely,
that means that ‘make’ fail and you just don’t see the package until
the error has been fixed:

--8<---------------cut here---------------start------------->8---
$ make
[…]
[ 78%] LOAD     gnu/packages/haskell-xyz.scm
;;; note: source file ./gnu/packages/haskell-xyz.scm
;;;       newer than compiled /home/ludo/src/guix/gnu/packages/haskell-xyz.go
;;; note: source file ./gnu/packages/haskell-xyz.scm
;;;       newer than compiled /home/ludo/src/guix/gnu/packages/haskell-xyz.go
gnu/packages/haskell-xyz.scm:15855:5: error: "@code{ghc-citeproc} parses @acronym{Citation Style Language, CSL} style files\nand uses them to generate a list of formatted citations and bibliography\nentries.  For more information about CSL, see @urefhttps://citationstyles.org/}.": invalid Texinfo markup
make[2]: *** [Makefile:7131: make-packages-go] Error 1
--8<---------------cut here---------------end--------------->8---

Feedback welcome!

Ludo’.

Ludovic Courtès (2):
  ui: Gracefully handle invalid Texinfo markup in package blurbs.
  packages: Optionally validate Texinfo markup at expansion time.

 guix/packages.scm | 52 ++++++++++++++++++++++++++++++++++++++++++++---
 guix/ui.scm       | 17 ++++++++++++++--
 2 files changed, 64 insertions(+), 5 deletions(-)


base-commit: e1261ddd38cf02a0f046f3a5360502d659b4e7d4
-- 
2.33.0





Information forwarded to guix-patches <at> gnu.org:
bug#51332; Package guix-patches. (Fri, 22 Oct 2021 12:47:01 GMT) Full text and rfc822 format available.

Message #8 received at 51332 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 51332 <at> debbugs.gnu.org
Cc: Ludovic Courtès <ludo <at> gnu.org>
Subject: [PATCH 1/2] ui: Gracefully handle invalid Texinfo markup in package
 blurbs.
Date: Fri, 22 Oct 2021 14:45:18 +0200
Previously 'guix search' & co. would crash when encountering invalid
Texinfo.

* guix/ui.scm (texi->plain-text*): New procedure.
(package-field-string, package->recutils): Use it.
---
 guix/ui.scm | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/guix/ui.scm b/guix/ui.scm
index 1428c254b3..eb7f0afcfd 100644
--- a/guix/ui.scm
+++ b/guix/ui.scm
@@ -1431,10 +1431,22 @@ (define (texi->plain-text str)
   (with-fluids ((%default-port-encoding "UTF-8"))
     (stexi->plain-text (texi-fragment->stexi str))))
 
+(define (texi->plain-text* package str)
+  "Same as 'texi->plain-text', but gracefully handle Texinfo errors."
+  (catch 'parser-error
+    (lambda ()
+      (texi->plain-text str))
+    (lambda args
+      (warning (package-location package)
+               (G_ "~a: invalid Texinfo markup~%")
+               (package-full-name package))
+      str)))
+
 (define (package-field-string package field-accessor)
   "Return a plain-text representation of PACKAGE field."
   (and=> (field-accessor package)
-         (compose texi->plain-text P_)))
+         (lambda (str)
+           (texi->plain-text* package (P_ str)))))
 
 (define (package-description-string package)
   "Return a plain-text representation of PACKAGE description field."
@@ -1555,7 +1567,8 @@ (define (package<? p1 p2)
             (parameterize ((%text-width width*))
               ;; Call 'texi->plain-text' on the concatenated string to account
               ;; for the width of "description:" in paragraph filling.
-              (texi->plain-text
+              (texi->plain-text*
+               p
                (string-append "description: "
                               (or (and=> (package-description p) P_)
                                   ""))))
-- 
2.33.0





Information forwarded to guix-patches <at> gnu.org:
bug#51332; Package guix-patches. (Fri, 22 Oct 2021 12:47:02 GMT) Full text and rfc822 format available.

Message #11 received at 51332 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 51332 <at> debbugs.gnu.org
Cc: Ludovic Courtès <ludo <at> gnu.org>
Subject: [PATCH 2/2] packages: Optionally validate Texinfo markup at expansion
 time.
Date: Fri, 22 Oct 2021 14:45:19 +0200
* guix/packages.scm (validate-texinfo): New macro.
(<package>)[synopsis, description]: Add 'sanitize' property.
---
 guix/packages.scm | 52 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 49 insertions(+), 3 deletions(-)

diff --git a/guix/packages.scm b/guix/packages.scm
index e5a9d08bce..394f6aa39e 100644
--- a/guix/packages.scm
+++ b/guix/packages.scm
@@ -49,6 +49,7 @@ (define-module (guix packages)
   #:use-module (srfi srfi-35)
   #:use-module (rnrs bytevectors)
   #:use-module (web uri)
+  #:autoload   (texinfo) (texi-fragment->stexi)
   #:re-export (%current-system
                %current-target-system
                search-path-specification)         ;for convenience
@@ -437,6 +438,49 @@ (define location
                                   (lambda (s) #,location)))
              body ...))))))
 
+(define-syntax validate-texinfo
+  (let ((validate? (getenv "GUIX_UNINSTALLED")))
+    (define ensure-thread-safe-texinfo-parser!
+      ;; Work around <https://issues.guix.gnu.org/51264> for Guile <= 3.0.7.
+      (let ((patched? (or (> (string->number (major-version)) 3)
+                          (> (string->number (minor-version)) 0)
+                          (> (string->number (micro-version)) 7)))
+            (next-token-of/thread-safe
+             (lambda (pred port)
+               (let loop ((chars '()))
+                 (match (read-char port)
+                   ((? eof-object?)
+                    (list->string (reverse! chars)))
+                   (chr
+                    (let ((chr* (pred chr)))
+                      (if chr*
+                          (loop (cons chr* chars))
+                          (begin
+                            (unread-char chr port)
+                            (list->string (reverse! chars)))))))))))
+        (lambda ()
+          (unless patched?
+            (set! (@@ (texinfo) next-token-of) next-token-of/thread-safe)
+            (set! patched? #t)))))
+
+    (lambda (s)
+      "Raise a syntax error when passed a literal string that is not valid
+Texinfo.  Otherwise, return the string."
+      (syntax-case s ()
+        ((_ str)
+         (string? (syntax->datum #'str))
+         (if validate?
+             (catch 'parser-error
+               (lambda ()
+                 (ensure-thread-safe-texinfo-parser!)
+                 (texi-fragment->stexi (syntax->datum #'str))
+                 #'str)
+               (lambda _
+                 (syntax-violation 'package "invalid Texinfo markup" #'str)))
+             #'str))
+        ((_ obj)
+         #'obj)))))
+
 ;; A package.
 (define-record-type* <package>
   package make-package
@@ -471,9 +515,11 @@ (define-record-type* <package>
   (replacement package-replacement                ; package | #f
                (default #f) (thunked) (innate))
 
-  (synopsis package-synopsis)                    ; one-line description
-  (description package-description)              ; one or two paragraphs
-  (license package-license)                      ; <license> instance or list
+  (synopsis package-synopsis
+            (sanitize validate-texinfo))          ; one-line description
+  (description package-description
+               (sanitize validate-texinfo))       ; one or two paragraphs
+  (license package-license)                       ; <license> instance or list
   (home-page package-home-page)
   (supported-systems package-supported-systems    ; list of strings
                      (default %supported-systems))
-- 
2.33.0





Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Thu, 28 Oct 2021 19:47:02 GMT) Full text and rfc822 format available.

Notification sent to Ludovic Courtès <ludo <at> gnu.org>:
bug acknowledged by developer. (Thu, 28 Oct 2021 19:47:02 GMT) Full text and rfc822 format available.

Message #16 received at 51332-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 51332-done <at> debbugs.gnu.org
Subject: Re: bug#51332: [PATCH 0/2] Detect early and gracefully handle
 invalid Texinfo
Date: Thu, 28 Oct 2021 21:46:14 +0200
Ludovic Courtès <ludo <at> gnu.org> skribis:

>   ui: Gracefully handle invalid Texinfo markup in package blurbs.
>   packages: Optionally validate Texinfo markup at expansion time.

Pushed as e171182a20962c4119e12439b92bbbfd59b1495e!

Ludo'.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 26 Nov 2021 12:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 150 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.