Package: emacs;
Reported by: "Jacob S. Gordon" <jacob.as.gordon <at> gmail.com>
Date: Wed, 21 May 2025 07:04:03 UTC
Severity: normal
Tags: patch
To reply to this bug, email your comments to 78528 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
View this report as an mbox folder, status mbox, maintainer mbox
bug-gnu-emacs <at> gnu.org
:bug#78528
; Package emacs
.
(Wed, 21 May 2025 07:04:03 GMT) Full text and rfc822 format available."Jacob S. Gordon" <jacob.as.gordon <at> gmail.com>
:bug-gnu-emacs <at> gnu.org
.
(Wed, 21 May 2025 07:04:03 GMT) Full text and rfc822 format available.Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
From: "Jacob S. Gordon" <jacob.as.gordon <at> gmail.com> To: bug-gnu-emacs <at> gnu.org Subject: [PATCH v1] calc: Allow strings with higher character codes Date: Tue, 20 May 2025 21:34:02 -0400
[Message part 1 (text/plain, inline)]
Tags: patch Hello all, Please find below a feature proposal for strings in `calc', and a first draft of a patch attached to this message. Motivation ========== Suppose you're working with Unicode code points in `calc', and you end up with the following vector on the stack. You'd like to know what a string composed of these character codes would look like, so you toggle `calc-display-strings' (`d "') and … nothing happens. ,---- | 1: [383, 117, 99, 99, 101, 383, 115] `---- Later, in an `org-mode' file you have the following table with a list of dates in the first column. Since [formulas] can be any algebraic expression understood by `calc', and `calc' [understands dates], you try to insert a Unicode character for rows where the first column is in the past. When you evaluate the formula (`C-c C-c' on the `#+TBLFM:' line) `calc' stops short of displaying the string. ,---- | | Date | Past? | | |------------------+-----------------| | | [2025-05-01 Thu] | string([10003]) | | | [2026-05-01 Fri] | | | #+TBLFM: $2 = if($1 < now(), string("✓"), string("")) `---- Both of these problems are due to the fact that some or all of the character codes are outside the `Latin-1' (8-bit) range. If we replace this hard-coded limitation with a custom variable and increase its value, both of these use-cases can be supported. ,---- | 1: "ſucceſs" `---- ,---- | | Date | Past? | | |------------------+-------| | | [2025-05-01 Thu] | ✓ | | | [2026-05-01 Fri] | | | #+TBLFM: $2 = if($1 < now(), string("✓"), string("")) `---- The alternative is that the user has to exit `calc' (or its syntax) and dip into `Lisp': ,---- | (concat '(383 117 99 99 101 383 115)) `---- ,---- | | Date | Past? | | |------------------+-------| | | [2025-05-01 Thu] | ✓ | | | [2026-05-01 Fri] | | | #+TBLFM: $2 = '(if (time-less-p (org-read-date t t $1) (current-time)) "✓" "") `---- [formulas] <https://orgmode.org/manual/Formula-syntax-for-Calc.html> [understands dates] <https://www.gnu.org/software/emacs/manual/html_node/calc/Date-Forms.html> Proposal & Impact ================= The attached patch introduces a custom variable `calc-string-maximum-character' (optimistically versioned for `31.1'), which replaces a hard-coded maximum in the function `math-vector-is-string'. This variable defaults to `0xFF' in order to preserve the current behaviour, but otherwise can be any character up to `(max-char)'. Since the vector contents are passed to `math-vector-to-string', the Unicode-aware `concat' has no problem with the higher characters: ,---- | (defun math-vector-to-string (a &optional quoted) | (setq a (concat (mapcar (lambda (x) (if (consp x) (nth 1 x) x)) | (cdr a)))) | […]) `---- Here are the outstanding issues I've identified for discussion: 1. Since users can blow past the variable type and set `calc-string-maximum-character' to /anything/, I'm not sure the patch's error handling is enough. If a hapless user sets it to something invalid like a string (`"invalid"', let's say), then with the current patch they'll encounter at least two kinds of errors: a) With the following vector on the stack, executing `calc-display-strings' (`d "') will display `Wrong type argument: number-or-marker-p, "invalid"' in the minibuffer, /and/ enter a string display mode where the vector isn't rendered as seen in the second block below. ,---- | 1: [0, 1, 2] `---- ,---- | 1: . `---- Only executing `calc-display-strings' (`d "') again will toggle the display mode and show the original vector. This is a bad experience for the user, and should be mitigated by raising an error in `calc-display-strings' before the display mode is toggled. b) If a user tries to enter a string algebraically with `calc-algebraic-entry' (`''), say `string("abc")', the same message from the first error will appear in the minibuffer, but the string is not added to the stack. This is slightly cryptic, but not as bad an experience as the first error. 2. With a higher value of `calc-string-maximum-character', the displayed string could contain right-to-left or a bidirectional mixture of characters that could conceivably interfere with the `calc' alignment functions `calc-left-justify' (`d <'), `calc-center-justify' (`d ='), and `calc-right-justify' (`d >'). Toggling the display of the following vectors reveals a misalignment of the fully Arabic string under center justification, and misalignment of the full- and mixed-Arabic strings under right justification. None of these contain any of the funky bidirectional Unicode markers so I'm not sure if there's other problems lurking. ,---- | 3: [108, 101, 102, 116, 45, 116, 111, 45, 114, 105, 103, 104, 116] | 2: [1605, 1606, 32, 1575, 1604, 1610, 1605, 1610, 1606, 32, 1573, 1604, 1609, 32, 1575, 1604, 1610, 1587, 1575, 1585] | 1: [108, 101, 102, 116, 45, 1610, 1605, 1610, 1606] `---- ,---- | 3: "left-to-right" | 2: "من اليمين إلى اليسار" | 1: "left-يمين" `---- ,---- | 3: "left-to-right" | 2: "من اليمين إلى اليسار" | 1: "left-يمين" `---- ,---- | 3: "left-to-right" | 2: "من اليمين إلى اليسار" | 1: "left-يمين" `---- Also, combining diacritical marks appear as separate characters, but I'm not sure if this is the expected behaviour and/or related to my configuration. ,---- | 1. [117, 776] `---- ,---- | 1: "ü" `---- 3. I haven't found any internal references to `math-vector-is-string' that look like they could conflict with this change (`math-format-flat-expr-fancy', `math-compose-expr', `calc-kbd-query'). Existing references are mostly related to displaying strings from vectors, `string' or `bstring' objects, and composite objects involving vectors or strings, but I could use an extra set of eyes to confirm. Since `org-mode' uses `calc' expressions in tables, I might need to get their concurrence with the change. I'm unaware of any third-party dependencies on this function. 4. For unit tests, are there any naming conventions I should follow? I just stuck all of the tests in one place for `math-vector-is-string'. Thanks for your consideration! -- Jacob S. Gordon jacob.as.gordon <at> gmail.com ========================= Please avoid sending me HTML emails and MS Office documents. https://useplaintext.email/#etiquette https://www.gnu.org/philosophy/no-word-attachments.html In GNU Emacs 30.1 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.49, cairo version 1.18.4) System Description: Arch Linux Configured using: 'configure --with-pgtk --sysconfdir=/etc --prefix=/usr --libexecdir=/usr/lib --localstatedir=/var --disable-build-details --with-cairo --with-harfbuzz --with-libsystemd --with-modules --with-native-compilation=aot --with-tree-sitter 'CFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=3 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -g -ffile-prefix-map=/build/emacs/src=/usr/src/debug/emacs -flto=auto' 'LDFLAGS=-Wl,-O1 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,-z,pack-relative-relocs -flto=auto' 'CXXFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=3 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -Wp,-D_GLIBCXX_ASSERTIONS -g -ffile-prefix-map=/build/emacs/src=/usr/src/debug/emacs -flto=auto''
[v1-0001-calc-Allow-strings-with-higher-character-codes.patch (text/patch, attachment)]
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.