#49870 - [PATCH] Improve syntax source location accuracy

GNU bug report logs - #49870
[PATCH] Improve syntax source location accuracy

Package: guile;

Reported by: Vivien Kraus <vivien <at> planete-kraus.eu>

Date: Wed, 4 Aug 2021 09:52:02 UTC

Severity: normal

Tags: patch

To reply to this bug, email your comments to 49870 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

Report forwarded to bug-guile <at> gnu.org:
bug#49870; Package guile. (Wed, 04 Aug 2021 09:52:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Vivien Kraus <vivien <at> planete-kraus.eu>:
New bug report received and forwarded. Copy sent to bug-guile <at> gnu.org. (Wed, 04 Aug 2021 09:52:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Vivien Kraus <vivien <at> planete-kraus.eu> To: bug-guile <at> gnu.org Subject: [PATCH] Improve syntax source location accuracy Date: Wed, 04 Aug 2021 11:50:58 +0200

[Message part 1 (text/plain, inline)]

Dear guile developers, The algorithm to compute the column number has a couple of flaws: 1. The text location may decrease (with #\backspace and #\return), which makes source location ambiguous: (syntax-case (call-with-input-string "(a\r b)" read-syntax) () ((a b) (values (syntax-source #'a) (syntax-source #'b)))) => $1 = ((line . 0) (column . 1)) $2 = ((line . 0) (column . 1)) and: (syntax-case (call-with-input-string "('a\b\b\b b)" read-syntax) () ((a b) (values (syntax-source #'a) (syntax-source #'b)))) => $1 = ((line . 0) (column . 1)) $2 = ((line . 0) (column . 1)) This behavior is not desirable for programs that want to inspect the source at this location. 2. Many unicode characters need to span multiple columns, especially non-latin ones. The best solution I found [1] was to use this algorithm: 1. Tabs stop every 8 columns (that’s what Guile already implements); 2. printable ASCII characters span 1 column; 3. other unprintable ASCII characters span 0 columns (I guess? That would be compatible with #\alarm, which has its special test case so I figure it’s important for it to have 0 width); 4. Compute the width of non-ASCII characters with wcwidth. The wcwidth function takes a unicode character, and, depending on the LC_CTYPE locale, determines how many columns it spans. It may fail if LC_CTYPE is not set, and in that case return -1. For this to work, we need to include the wcwidth gnulib module. You can do it yourself, but in order not to forget it, I include the first 2 patches. Don’t be afraid if these patches are huge. The real work is to first expose the wcwidth function to scheme, to implement the algorithm for the suspendable ports (commit 3), and then use wcwidth both in the C ports implementation and in the scheme suspendable ports module (commit 4). Best regards, Vivien [1]: https://www.gnu.org/prep/standards/html_node/Errors.html#Errors

This bug report was last modified 2 years and 274 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #49870 [PATCH] Improve syntax source location accuracy

GNU bug report logs - #49870
[PATCH] Improve syntax source location accuracy