GNU bug report logs -
#77387
[PATCH 0/2] man-db: Better parsing of man macros.
Previous Next
To reply to this bug, email your comments to 77387 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
guix <at> cbaines.net, dev <at> jpoiret.xyz, ludo <at> gnu.org, othacehe <at> gnu.org, zimon.toutoune <at> gmail.com, me <at> tobias.gr, guix-patches <at> gnu.org
:
bug#77387
; Package
guix-patches
.
(Sun, 30 Mar 2025 14:27:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Sergey Trofimov <sarg <at> sarg.org.ru>
:
New bug report received and forwarded. Copy sent to
guix <at> cbaines.net, dev <at> jpoiret.xyz, ludo <at> gnu.org, othacehe <at> gnu.org, zimon.toutoune <at> gmail.com, me <at> tobias.gr, guix-patches <at> gnu.org
.
(Sun, 30 Mar 2025 14:27:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hey guix, I've noticed that quite a lot man pages are reported to belong to a wrong section by `man -k`:
--8<---------------cut here---------------start------------->8---
$ man -k "" | grep "(0)"
...
ssh-pkcs11-helper (0) - (unknown subject)
ssh-sk-helper (0) - (unknown subject)
ssh_config (0) - (unknown subject)
sshd (0) - (unknown subject)
sshd_config (0) - (unknown subject)
sudo (0) - (unknown subject)
sudo.conf (0) - (unknown subject)
tc-cgroup (0) - control group based traffic control filter
tc-connmark (0) - (unknown subject)
...
--8<---------------cut here---------------end--------------->8---
A side-effect of it is that `M-x man` doesn't list such pages in auto-completion. I've attempted to fix that, see the following patch.
With the patch `man -k` and `M-x man` work properly:
--8<---------------cut here---------------start------------->8---
$ man -k sudo
cvtsudoers (1) - (unknown subject)
sudo (8) - (unknown subject)
sudo.conf (5) - (unknown subject)
sudo_logsrv.proto (5) - (unknown subject)
sudo_logsrvd (8) - (unknown subject)
...
--8<---------------cut here---------------end--------------->8---
Note, that synopsis extraction also needs improvement, however it turns out to
be more complicated as proper formatting requires cleaning up / expanding macros.
Sergey Trofimov (2):
man-db: Parse man macro arguments better.
man-db: Support mdoc-formatted man pages.
guix/man-db.scm | 52 ++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 43 insertions(+), 9 deletions(-)
base-commit: 2ed28b5c24c599b2f9bc60dfc93151cf489ca477
--
2.49.0
Information forwarded
to
guix <at> cbaines.net, dev <at> jpoiret.xyz, ludo <at> gnu.org, othacehe <at> gnu.org, zimon.toutoune <at> gmail.com, me <at> tobias.gr, guix-patches <at> gnu.org
:
bug#77387
; Package
guix-patches
.
(Sun, 30 Mar 2025 15:51:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 77387 <at> debbugs.gnu.org (full text, mbox):
* guix/man-db.scm (man-macro-tokenize): New procedure to parse man
macros.
(man-page->entry): Parse macro line using man-macro-tokenize.
Change-Id: Iea0ffbc65290757df746138e0a6174646b5a3eb8
---
guix/man-db.scm | 52 ++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 43 insertions(+), 9 deletions(-)
diff --git a/guix/man-db.scm b/guix/man-db.scm
index bba90ed473..44c01ac298 100644
--- a/guix/man-db.scm
+++ b/guix/man-db.scm
@@ -161,16 +161,50 @@ (define (read-synopsis port)
(line
(loop (cons line lines))))))
+(define (man-macro-tokenize input)
+ (let loop ((pos 0)
+ (tokens '())
+ (current '())
+ (in-string? #f))
+ (if (>= pos (string-length input))
+ ;; End of input
+ (unless in-string?
+ (reverse (if (null? current)
+ tokens
+ (cons (list->string (reverse current)) tokens))))
+ (let ((c (string-ref input pos)))
+ (cond
+ ;; Inside a string
+ (in-string?
+ (if (char=? c #\")
+ (if (and (< (+ pos 1) (string-length input))
+ (char=? (string-ref input (+ pos 1)) #\"))
+ ;; Double quote inside string
+ (loop (+ pos 2) tokens (cons #\" current) #t)
+ ;; End of string
+ (loop (+ pos 1) (cons (list->string (reverse current)) tokens) '() #f))
+ ;; Regular character in string
+ (loop (+ pos 1) tokens (cons c current) #t)))
+
+ ;; Whitespace outside string
+ ((char-whitespace? c)
+ (if (null? current)
+ (loop (+ pos 1) tokens '() #f)
+ (loop (+ pos 1) (cons (list->string (reverse current)) tokens) '() #f)))
+
+ ;; Start of string
+ ((char=? c #\")
+ (if (null? current)
+ (loop (+ pos 1) tokens '() #t)
+ (loop pos (cons (list->string (reverse current)) tokens) '() #f)))
+
+ ;; Symbol character
+ (else
+ (loop (+ pos 1) tokens (cons c current) #f)))))))
+
(define* (man-page->entry file #:optional (resolve identity))
"Parse FILE, a gzip or zstd compressed man page, and return a <mandb-entry>
for it."
- (define (string->number* str)
- (if (and (string-prefix? "\"" str)
- (> (string-length str) 1)
- (string-suffix? "\"" str))
- (string->number (string-drop (string-drop-right str 1) 1))
- (string->number str)))
-
(define call-with-input-port*
(cond
((gzip-compressed? file) call-with-gzip-input-port)
@@ -189,8 +223,8 @@ (define* (man-page->entry file #:optional (resolve identity))
(if (eof-object? line)
(mandb-entry file name (or section 0) (or synopsis "")
kind)
- (match (string-tokenize line)
- ((".TH" name (= string->number* section) _ ...)
+ (match (if (string-prefix? "." line) (man-macro-tokenize line) #f)
+ ((".TH" name (= string->number section) _ ...)
(loop name section synopsis kind))
((".SH" (or "NAME" "\"NAME\""))
(loop name section (read-synopsis port) kind))
--
2.49.0
Information forwarded
to
guix <at> cbaines.net, dev <at> jpoiret.xyz, ludo <at> gnu.org, othacehe <at> gnu.org, zimon.toutoune <at> gmail.com, me <at> tobias.gr, guix-patches <at> gnu.org
:
bug#77387
; Package
guix-patches
.
(Sun, 30 Mar 2025 15:51:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 77387 <at> debbugs.gnu.org (full text, mbox):
* guix/man-db.scm (man-page->entry): Extract man name and section from
.Dt macro.
Change-Id: I02dc99d73dceecdb077315805025efad9a650e91
---
guix/man-db.scm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/guix/man-db.scm b/guix/man-db.scm
index 44c01ac298..44668a3ebf 100644
--- a/guix/man-db.scm
+++ b/guix/man-db.scm
@@ -224,7 +224,7 @@ (define* (man-page->entry file #:optional (resolve identity))
(mandb-entry file name (or section 0) (or synopsis "")
kind)
(match (if (string-prefix? "." line) (man-macro-tokenize line) #f)
- ((".TH" name (= string->number section) _ ...)
+ (((or ".TH" ".Dt") name (= string->number section) _ ...)
(loop name section synopsis kind))
((".SH" (or "NAME" "\"NAME\""))
(loop name section (read-synopsis port) kind))
--
2.49.0
Information forwarded
to
guix-patches <at> gnu.org
:
bug#77387
; Package
guix-patches
.
(Tue, 01 Apr 2025 12:09:04 GMT)
Full text and
rfc822 format available.
Message #14 received at 77387 <at> debbugs.gnu.org (full text, mbox):
Hi!
Glad you fixed this problem. :-)
Sergey Trofimov <sarg <at> sarg.org.ru> skribis:
> * guix/man-db.scm (man-macro-tokenize): New procedure to parse man
> macros.
> (man-page->entry): Parse macro line using man-macro-tokenize.
>
> Change-Id: Iea0ffbc65290757df746138e0a6174646b5a3eb8
[...]
> +(define (man-macro-tokenize input)
Could you add a docstring explaining what it takes and what it returns?
> + (let loop ((pos 0)
> + (tokens '())
> + (current '())
Maybe s/current/characters/ ?
> + (in-string? #f))
> + (if (>= pos (string-length input))
> + ;; End of input
> + (unless in-string?
> + (reverse (if (null? current)
> + tokens
> + (cons (list->string (reverse current)) tokens))))
So this procedure can return *unspecified*, right? Sounds fishy.
> @@ -189,8 +223,8 @@ (define* (man-page->entry file #:optional (resolve identity))
> (if (eof-object? line)
> (mandb-entry file name (or section 0) (or synopsis "")
> kind)
> - (match (string-tokenize line)
> - ((".TH" name (= string->number* section) _ ...)
> + (match (if (string-prefix? "." line) (man-macro-tokenize line) #f)
> + ((".TH" name (= string->number section) _ ...)
Please add a comment above ‘match’ explaining what’s happening (why we
call ‘man-macro-tokenize’ etc.).
Also: (and (string-prefix? "." line) (man-macro-tokenize line))
Ludo’.
Information forwarded
to
guix-patches <at> gnu.org
:
bug#77387
; Package
guix-patches
.
(Tue, 01 Apr 2025 12:09:05 GMT)
Full text and
rfc822 format available.
Message #17 received at 77387 <at> debbugs.gnu.org (full text, mbox):
Sergey Trofimov <sarg <at> sarg.org.ru> skribis:
> * guix/man-db.scm (man-page->entry): Extract man name and section from
> .Dt macro.
>
> Change-Id: I02dc99d73dceecdb077315805025efad9a650e91
[...]
> (match (if (string-prefix? "." line) (man-macro-tokenize line) #f)
> - ((".TH" name (= string->number section) _ ...)
> + (((or ".TH" ".Dt") name (= string->number section) _ ...)
Likewise, please add a short comment above the clause explaining that
‘.Dt’ is produced by ‘mandoc’ (did I get that right?).
Ludo’.
Information forwarded
to
sarg <at> sarg.org.ru, ludo <at> gnu.org, guix <at> cbaines.net, dev <at> jpoiret.xyz, othacehe <at> gnu.org, zimon.toutoune <at> gmail.com, me <at> tobias.gr, guix-patches <at> gnu.org
:
bug#77387
; Package
guix-patches
.
(Tue, 01 Apr 2025 19:33:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 77387 <at> debbugs.gnu.org (full text, mbox):
* guix/man-db.scm (man-macro-tokenize): New procedure to parse man
macros.
(man-page->entry): Parse macro line using man-macro-tokenize.
Change-Id: Iea0ffbc65290757df746138e0a6174646b5a3eb8
---
guix/man-db.scm | 56 +++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 47 insertions(+), 9 deletions(-)
diff --git a/guix/man-db.scm b/guix/man-db.scm
index bba90ed473..94231264f0 100644
--- a/guix/man-db.scm
+++ b/guix/man-db.scm
@@ -161,16 +161,52 @@ (define (read-synopsis port)
(line
(loop (cons line lines))))))
+(define (man-macro-tokenize input)
+ "Split INPUT string, a man macro invocation, into a list containing the macro's
+name followed by its arguments."
+ (let loop ((pos 0)
+ (tokens '())
+ (characters '())
+ (in-string? #f))
+ (if (>= pos (string-length input))
+ ;; End of input
+ (unless in-string?
+ (reverse (if (null? characters)
+ tokens
+ (cons (list->string (reverse characters)) tokens))))
+ (let ((c (string-ref input pos)))
+ (cond
+ ;; Inside a string
+ (in-string?
+ (if (char=? c #\")
+ (if (and (< (+ pos 1) (string-length input))
+ (char=? (string-ref input (+ pos 1)) #\"))
+ ;; Double quote inside string
+ (loop (+ pos 2) tokens (cons #\" characters) #t)
+ ;; End of string
+ (loop (+ pos 1) (cons (list->string (reverse characters)) tokens) '() #f))
+ ;; Regular character in string
+ (loop (+ pos 1) tokens (cons c characters) #t)))
+
+ ;; Whitespace outside string
+ ((char-whitespace? c)
+ (if (null? characters)
+ (loop (+ pos 1) tokens '() #f)
+ (loop (+ pos 1) (cons (list->string (reverse characters)) tokens) '() #f)))
+
+ ;; Start of string
+ ((char=? c #\")
+ (if (null? characters)
+ (loop (+ pos 1) tokens '() #t)
+ (loop pos (cons (list->string (reverse characters)) tokens) '() #f)))
+
+ ;; Symbol character
+ (else
+ (loop (+ pos 1) tokens (cons c characters) #f)))))))
+
(define* (man-page->entry file #:optional (resolve identity))
"Parse FILE, a gzip or zstd compressed man page, and return a <mandb-entry>
for it."
- (define (string->number* str)
- (if (and (string-prefix? "\"" str)
- (> (string-length str) 1)
- (string-suffix? "\"" str))
- (string->number (string-drop (string-drop-right str 1) 1))
- (string->number str)))
-
(define call-with-input-port*
(cond
((gzip-compressed? file) call-with-gzip-input-port)
@@ -189,8 +225,10 @@ (define* (man-page->entry file #:optional (resolve identity))
(if (eof-object? line)
(mandb-entry file name (or section 0) (or synopsis "")
kind)
- (match (string-tokenize line)
- ((".TH" name (= string->number* section) _ ...)
+ ;; man 7 groff groff_mdoc groff_man
+ ;; look for metadata in macro invocations (lines starting with .)
+ (match (and (string-prefix? "." line) (man-macro-tokenize line))
+ ((".TH" name (= string->number section) _ ...)
(loop name section synopsis kind))
((".SH" (or "NAME" "\"NAME\""))
(loop name section (read-synopsis port) kind))
base-commit: 5735c278e16517d9be5e26235fe68dea9bae3527
prerequisite-patch-id: f9cc903b8048c8c6fde576fbf38ab110263020e3
prerequisite-patch-id: 220ddf11addf3a6c7ab3b349077bca6849241556
prerequisite-patch-id: fc7d254c8dc198bc2f083e1c8aea18960c73b165
prerequisite-patch-id: b6d30068ce4971d4d8e67517229916df4e76c529
--
2.49.0
Information forwarded
to
sarg <at> sarg.org.ru, ludo <at> gnu.org, guix <at> cbaines.net, dev <at> jpoiret.xyz, othacehe <at> gnu.org, zimon.toutoune <at> gmail.com, me <at> tobias.gr, guix-patches <at> gnu.org
:
bug#77387
; Package
guix-patches
.
(Tue, 01 Apr 2025 19:33:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 77387 <at> debbugs.gnu.org (full text, mbox):
* guix/man-db.scm (man-page->entry): Extract man name and section from
.Dt macro.
Change-Id: I02dc99d73dceecdb077315805025efad9a650e91
---
guix/man-db.scm | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/guix/man-db.scm b/guix/man-db.scm
index 94231264f0..7601580c40 100644
--- a/guix/man-db.scm
+++ b/guix/man-db.scm
@@ -228,10 +228,13 @@ (define* (man-page->entry file #:optional (resolve identity))
;; man 7 groff groff_mdoc groff_man
;; look for metadata in macro invocations (lines starting with .)
(match (and (string-prefix? "." line) (man-macro-tokenize line))
- ((".TH" name (= string->number section) _ ...)
+ ;; "Title Header" or "Document title"
+ (((or ".TH" ".Dt") name (= string->number section) _ ...)
(loop name section synopsis kind))
+ ;; "Section Header"
((".SH" (or "NAME" "\"NAME\""))
(loop name section (read-synopsis port) kind))
+ ;; include source
((".so" link)
(match (and=> (resolve link)
(cut man-page->entry <> resolve))
--
2.49.0
Information forwarded
to
guix-patches <at> gnu.org
:
bug#77387
; Package
guix-patches
.
(Tue, 01 Apr 2025 19:43:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 77387 <at> debbugs.gnu.org (full text, mbox):
Hi Ludovic,
I've sent an amended series.
Ludovic Courtès <ludo <at> gnu.org> writes:
>> + (in-string? #f))
>> + (if (>= pos (string-length input))
>> + ;; End of input
>> + (unless in-string?
>> + (reverse (if (null? current)
>> + tokens
>> + (cons (list->string (reverse current)) tokens))))
>
> So this procedure can return *unspecified*, right? Sounds fishy.
>
Why is it fishy? Is it unconventional? Such return value is handled
correctly by the calling code (`match`).
Information forwarded
to
guix-patches <at> gnu.org
:
bug#77387
; Package
guix-patches
.
(Tue, 08 Apr 2025 15:31:01 GMT)
Full text and
rfc822 format available.
Message #29 received at 77387 <at> debbugs.gnu.org (full text, mbox):
Hi,
Sergey Trofimov <sarg <at> sarg.org.ru> skribis:
> Ludovic Courtès <ludo <at> gnu.org> writes:
>
>>> + (in-string? #f))
>>> + (if (>= pos (string-length input))
>>> + ;; End of input
>>> + (unless in-string?
>>> + (reverse (if (null? current)
>>> + tokens
>>> + (cons (list->string (reverse current)) tokens))))
>>
>> So this procedure can return *unspecified*, right? Sounds fishy.
>>
> Why is it fishy? Is it unconventional? Such return value is handled
> correctly by the calling code (`match`).
It’s unconventional; usually, procedures are monomorphic and in this
case, the expectation is that it always returns a list of tokens.
I would either return the empty list in the ‘in-string?’ case or throw
an exception (because that means we failed to parse the thing).
Does that make sense?
Thanks,
Ludo’.
This bug report was last modified today.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.