GNU bug report logs - #78528
[PATCH v1] calc: Allow strings with higher character codes

Previous Next

Package: emacs;

Reported by: "Jacob S. Gordon" <jacob.as.gordon <at> gmail.com>

Date: Wed, 21 May 2025 07:04:03 UTC

Severity: normal

Tags: patch

To reply to this bug, email your comments to 78528 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#78528; Package emacs. (Wed, 21 May 2025 07:04:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Jacob S. Gordon" <jacob.as.gordon <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 21 May 2025 07:04:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Jacob S. Gordon" <jacob.as.gordon <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: [PATCH v1] calc: Allow strings with higher character codes
Date: Tue, 20 May 2025 21:34:02 -0400
[Message part 1 (text/plain, inline)]
Tags: patch

Hello all,

Please find below a feature proposal for strings in `calc', and a first
draft of a patch attached to this message.

Motivation
==========

Suppose you're working with Unicode code points in `calc', and you end
up with the following vector on the stack. You'd like to know what a
string composed of these character codes would look like, so you toggle
`calc-display-strings' (`d "') and … nothing happens.

,----
| 1:  [383, 117, 99, 99, 101, 383, 115]
`----

Later, in an `org-mode' file you have the following table with a list of
dates in the first column. Since [formulas] can be any algebraic
expression understood by `calc', and `calc' [understands dates], you try
to insert a Unicode character for rows where the first column is in the
past. When you evaluate the formula (`C-c C-c' on the `#+TBLFM:' line)
`calc' stops short of displaying the string.

,----
| | Date             | Past?           |
| |------------------+-----------------|
| | [2025-05-01 Thu] | string([10003]) |
| | [2026-05-01 Fri] |                 |
| #+TBLFM: $2 = if($1 < now(), string("✓"), string(""))
`----

Both of these problems are due to the fact that some or all of the
character codes are outside the `Latin-1' (8-bit) range. If we replace
this hard-coded limitation with a custom variable and increase its
value, both of these use-cases can be supported.

,----
| 1:  "ſucceſs"
`----

,----
| | Date             | Past? |
| |------------------+-------|
| | [2025-05-01 Thu] | ✓     |
| | [2026-05-01 Fri] |       |
| #+TBLFM: $2 = if($1 < now(), string("✓"), string(""))
`----

The alternative is that the user has to exit `calc' (or its syntax) and
dip into `Lisp':

,----
| (concat '(383 117 99 99 101 383 115))
`----

,----
| | Date             | Past? |
| |------------------+-------|
| | [2025-05-01 Thu] | ✓     |
| | [2026-05-01 Fri] |       |
| #+TBLFM: $2 = '(if (time-less-p (org-read-date t t $1) (current-time)) "✓" "")
`----

[formulas] <https://orgmode.org/manual/Formula-syntax-for-Calc.html>

[understands dates]
<https://www.gnu.org/software/emacs/manual/html_node/calc/Date-Forms.html>

Proposal & Impact
=================

The attached patch introduces a custom variable
`calc-string-maximum-character' (optimistically versioned for `31.1'),
which replaces a hard-coded maximum in the function
`math-vector-is-string'. This variable defaults to `0xFF' in order to
preserve the current behaviour, but otherwise can be any character up to
`(max-char)'. Since the vector contents are passed to
`math-vector-to-string', the Unicode-aware `concat' has no problem with
the higher characters:

,----
| (defun math-vector-to-string (a &optional quoted)
|   (setq a (concat (mapcar (lambda (x) (if (consp x) (nth 1 x) x))
|                           (cdr a))))
|   […])
`----

Here are the outstanding issues I've identified for discussion:

1. Since users can blow past the variable type and set
   `calc-string-maximum-character' to /anything/, I'm not sure the
   patch's error handling is enough. If a hapless user sets it to
   something invalid like a string (`"invalid"', let's say), then with
   the current patch they'll encounter at least two kinds of errors:

   a) With the following vector on the stack, executing
      `calc-display-strings' (`d "') will display `Wrong type argument:
      number-or-marker-p, "invalid"' in the minibuffer, /and/ enter a
      string display mode where the vector isn't rendered as seen in the
      second block below.

      ,----
      | 1:  [0, 1, 2]
      `----

      ,----
      | 1:  .
      `----

      Only executing `calc-display-strings' (`d "') again will toggle
      the display mode and show the original vector. This is a bad
      experience for the user, and should be mitigated by raising an
      error in `calc-display-strings' before the display mode is
      toggled.

   b) If a user tries to enter a string algebraically with
      `calc-algebraic-entry' (`''), say `string("abc")', the same
      message from the first error will appear in the minibuffer, but
      the string is not added to the stack. This is slightly cryptic,
      but not as bad an experience as the first error.

2. With a higher value of `calc-string-maximum-character', the displayed
   string could contain right-to-left or a bidirectional mixture of
   characters that could conceivably interfere with the `calc' alignment
   functions `calc-left-justify' (`d <'), `calc-center-justify' (`d ='),
   and `calc-right-justify' (`d >'). Toggling the display of the
   following vectors reveals a misalignment of the fully Arabic string
   under center justification, and misalignment of the full- and
   mixed-Arabic strings under right justification. None of these contain
   any of the funky bidirectional Unicode markers so I'm not sure if
   there's other problems lurking.

   ,----
   | 3:  [108, 101, 102, 116, 45, 116, 111, 45, 114, 105, 103, 104, 116]
   | 2:  [1605, 1606, 32, 1575, 1604, 1610, 1605, 1610, 1606, 32, 1573, 1604, 1609, 32, 1575, 1604, 1610, 1587, 1575, 1585]
   | 1:  [108, 101, 102, 116, 45, 1610, 1605, 1610, 1606]
   `----

   ,----
   | 3:  "left-to-right"
   | 2:  "من اليمين إلى اليسار"
   | 1:  "left-يمين"
   `----

   ,----
   | 3:                       "left-to-right"
   | 2:                   "من اليمين إلى اليسار"
   | 1:                         "left-يمين"
   `----

   ,----
   | 3:                                               "left-to-right"
   | 2:                                        "من اليمين إلى اليسار"
   | 1:                                                   "left-يمين"
   `----

   Also, combining diacritical marks appear as separate characters, but
   I'm not sure if this is the expected behaviour and/or related to my
   configuration.

   ,----
   | 1.  [117, 776]
   `----

   ,----
   | 1:  "ü"
   `----

3. I haven't found any internal references to `math-vector-is-string'
   that look like they could conflict with this change
   (`math-format-flat-expr-fancy', `math-compose-expr',
   `calc-kbd-query'). Existing references are mostly related to
   displaying strings from vectors, `string' or `bstring' objects, and
   composite objects involving vectors or strings, but I could use an
   extra set of eyes to confirm. Since `org-mode' uses `calc'
   expressions in tables, I might need to get their concurrence with the
   change. I'm unaware of any third-party dependencies on this function.

4. For unit tests, are there any naming conventions I should follow? I
   just stuck all of the tests in one place for `math-vector-is-string'.


Thanks for your consideration!

--
Jacob S. Gordon
jacob.as.gordon <at> gmail.com

=========================

Please avoid sending me HTML emails and MS Office documents.
https://useplaintext.email/#etiquette
https://www.gnu.org/philosophy/no-word-attachments.html

In GNU Emacs 30.1 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.49,
cairo version 1.18.4)
System Description: Arch Linux

Configured using:
 'configure --with-pgtk --sysconfdir=/etc --prefix=/usr
 --libexecdir=/usr/lib --localstatedir=/var --disable-build-details
 --with-cairo --with-harfbuzz --with-libsystemd --with-modules
 --with-native-compilation=aot --with-tree-sitter 'CFLAGS=-march=x86-64
 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=3
 -Wformat -Werror=format-security -fstack-clash-protection
 -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -g
 -ffile-prefix-map=/build/emacs/src=/usr/src/debug/emacs -flto=auto'
 'LDFLAGS=-Wl,-O1 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro
 -Wl,-z,now -Wl,-z,pack-relative-relocs -flto=auto'
 'CXXFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions
 -Wp,-D_FORTIFY_SOURCE=3 -Wformat -Werror=format-security
 -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer
 -mno-omit-leaf-frame-pointer -Wp,-D_GLIBCXX_ASSERTIONS -g
 -ffile-prefix-map=/build/emacs/src=/usr/src/debug/emacs -flto=auto''

[v1-0001-calc-Allow-strings-with-higher-character-codes.patch (text/patch, attachment)]

This bug report was last modified 3 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.