GNU bug report logs - #49870
[PATCH] Improve syntax source location accuracy

Previous Next

Package: guile;

Reported by: Vivien Kraus <vivien <at> planete-kraus.eu>

Date: Wed, 4 Aug 2021 09:52:02 UTC

Severity: normal

Tags: patch

To reply to this bug, email your comments to 49870 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guile <at> gnu.org:
bug#49870; Package guile. (Wed, 04 Aug 2021 09:52:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Vivien Kraus <vivien <at> planete-kraus.eu>:
New bug report received and forwarded. Copy sent to bug-guile <at> gnu.org. (Wed, 04 Aug 2021 09:52:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Vivien Kraus <vivien <at> planete-kraus.eu>
To: bug-guile <at> gnu.org
Subject: [PATCH] Improve syntax source location accuracy
Date: Wed, 04 Aug 2021 11:50:58 +0200
[Message part 1 (text/plain, inline)]
Dear guile developers,

The algorithm to compute the column number has a couple of flaws:

1. The text location may decrease (with #\backspace and #\return), which
makes source location ambiguous:

(syntax-case
    (call-with-input-string "(a\r b)" read-syntax) ()
  ((a b)
   (values (syntax-source #'a) (syntax-source #'b))))

=>

$1 = ((line . 0) (column . 1))
$2 = ((line . 0) (column . 1))

and:

(syntax-case
    (call-with-input-string "('a\b\b\b b)" read-syntax) ()
  ((a b)
   (values (syntax-source #'a) (syntax-source #'b))))

=>

$1 = ((line . 0) (column . 1))
$2 = ((line . 0) (column . 1))

This behavior is not desirable for programs that want to inspect the
source at this location.

2. Many unicode characters need to span multiple columns, especially
non-latin ones.

The best solution I found [1] was to use this algorithm:
1. Tabs stop every 8 columns (that’s what Guile already implements);
2. printable ASCII characters span 1 column;
3. other unprintable ASCII characters span 0 columns (I guess? That
would be compatible with #\alarm, which has its special test case so I
figure it’s important for it to have 0 width);
4. Compute the width of non-ASCII characters with wcwidth.

The wcwidth function takes a unicode character, and, depending on the
LC_CTYPE locale, determines how many columns it spans. It may fail if
LC_CTYPE is not set, and in that case return -1.

For this to work, we need to include the wcwidth gnulib module. You can
do it yourself, but in order not to forget it, I include the first 2
patches. Don’t be afraid if these patches are huge.

The real work is to first expose the wcwidth function to scheme, to
implement the algorithm for the suspendable ports (commit 3), and then
use wcwidth both in the C ports implementation and in the scheme
suspendable ports module (commit 4).

Best regards,

Vivien

[1]: https://www.gnu.org/prep/standards/html_node/Errors.html#Errors

[0001-Update-Gnulib.patch (text/x-patch, attachment)]
[0002-gnulib-import-wcwidth.patch (text/x-patch, attachment)]
[0003-Export-the-wcwidth-function.patch (text/x-patch, attachment)]
[0004-Use-wcwidth-to-compute-the-textual-port-column.patch (text/x-patch, attachment)]

This bug report was last modified 2 years and 274 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.