GNU bug report logs - #63518
28.2; shr.el seems to break inline latex (mathjax) in html

Previous Next

Package: emacs;

Reported by: mousebot <mousebot <at> riseup.net>

Date: Mon, 15 May 2023 13:35:01 UTC

Severity: normal

Found in version 28.2

To reply to this bug, email your comments to 63518 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#63518; Package emacs. (Mon, 15 May 2023 13:35:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to mousebot <mousebot <at> riseup.net>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Mon, 15 May 2023 13:35:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: mousebot <mousebot <at> riseup.net>
To: bug-gnu-emacs <at> gnu.org
Subject: 28.2; shr.el seems to break inline latex (mathjax) in html
Date: Mon, 15 May 2023 13:21:51 +0200
hi emacs,

The fediverse client I maintain, mastodon.el, uses shr-render-region to render individual posts. Some instances, e.g. https://mathstodon.xyz, allow users to post inline latex using mathjax notation.

When shr.el renders inline latex, it often breaks it as it fills the text. It inserts a newline in between the two characters that open an inline latex block: `\(` or `\[`. Using normal fill commands to fill text (fill-region, fill paragraph) do not split latex in this way, from what I could gather.

When digging around and debugging a little, I found that in shr-find-fill-point, the check (shr-char-kinsoku-eol-p (following-char)) in the when condition returns t when point is in between \ and ( or [, meaning that shr-find-fill-point considers that position to be a breakable point. Commenting that single check seems to largely prevent the undesired splitting. (Behaviour confirmed by my checks and also by another mastodon.el user.)

I don't really understand the significance of the checks that shr-find-fill-point runs, nor whether they can be temporarily deactivated or worked around in some other way.

I read around a little, and asked on emacs.stackexchange, but received no replies. So I'm still unsure if shr is able to handle mathjax notation or not, or how to patch it so that it would respect it.

An example thread containing inline latex: https://mathstodon.xyz/@bones/110249960030484103.

An example of the html that may break if window width is close to a latex snippet (from the above thread):

<p>Apéry’s proof utilized two surprising sequences of numbers \\(A_n\\) and \\(B_n\\), which satisfy the recurrence relation \\[(n+1)^3 x_{n+1}-\\left(34 n^3+51 n^2+27 n+5\\right) x_n+n^3 x_{n-1}=0 \\]  with initial conditions \\((A_0,A_1)=(1,5)\\), \\(B_0,B_1)=(0,6)\\). Apéry showed that \\(A_n \\in \\mathbb{Z}\\) for all \\(n \\geq 0 \\), which is quite surprising! You can check that the first few numbers in the sequences are given by \\[A_n = 1, 5, 73, 1445, \\dots \\] \\[B_n = 0, 6, \\frac{351}{4}, \\dots \\]  The result of Apéry is that the sequence \\(B_n/A_n \\to \\zeta(3)\\) sufficiently fast to guarantee that \\(\\zeta(3)\\) is irrational, by Dirichlet’s irrationality criterion.</p>

This is my first time reporting a bug in emacs, apologies if there's anything wrong the report. I'm happy to provide further details if needed.

Thanks,
Marty.

original report on mastodon.el repo: https://codeberg.org/martianh/mastodon.el/issues/464

stack exchange question: https://emacs.stackexchange.com/questions/77214/shr-filling-dont-split-inline-latex-in-html

report-emacs-bug details from a minimal emacs:

In GNU Emacs 28.2 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.24, cairo version 1.16.0)
 of 2023-01-21 built on t470s
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Debian GNU/Linux 11 (bullseye)

Configured using:
 'configure --prefix=/home/mouse/programmes/emacs-28.2/
 --bindir=/home/mouse/bin'

Configured features:
CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GSETTINGS HARFBUZZ JPEG JSON
LIBOTF LIBSELINUX LIBXML2 M17N_FLT MODULES NOTIFY INOTIFY PDUMPER PNG
RSVG SECCOMP SOUND THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE XIM XPM
GTK3 ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Mastodon

Minor modes in effect:
  delete-selection-mode: t
  cua-mode: t
  vertico-mode: t
  emojify-mode: t
  straight-use-package-mode: t
  straight-package-neutering-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  line-number-mode: t
  indent-tabs-mode: t
  transient-mark-mode: t

Load-path shadows:
/home/mouse/code/elisp/mastodon.el/lisp/mastodon-iso hides /home/mouse/.emacs.d/straight/build/mastodon/mastodon-iso
/home/mouse/code/elisp/mastodon.el/lisp/mastodon-search hides /home/mouse/.emacs.d/straight/build/mastodon/mastodon-search
/home/mouse/code/elisp/mastodon.el/lisp/mastodon-async hides /home/mouse/.emacs.d/straight/build/mastodon/mastodon-async
/home/mouse/code/elisp/mastodon.el/lisp/mastodon-http hides /home/mouse/.emacs.d/straight/build/mastodon/mastodon-http
/home/mouse/code/elisp/mastodon.el/lisp/mastodon hides /home/mouse/.emacs.d/straight/build/mastodon/mastodon
/home/mouse/code/elisp/mastodon.el/lisp/mastodon-media hides /home/mouse/.emacs.d/straight/build/mastodon/mastodon-media
/home/mouse/code/elisp/mastodon.el/lisp/mastodon-discover hides /home/mouse/.emacs.d/straight/build/mastodon/mastodon-discover
/home/mouse/code/elisp/mastodon.el/lisp/mastodon-client hides /home/mouse/.emacs.d/straight/build/mastodon/mastodon-client
/home/mouse/code/elisp/mastodon.el/lisp/mastodon-auth hides /home/mouse/.emacs.d/straight/build/mastodon/mastodon-auth
/home/mouse/code/elisp/mastodon.el/lisp/mastodon-notifications hides /home/mouse/.emacs.d/straight/build/mastodon/mastodon-notifications
/home/mouse/code/elisp/mastodon.el/lisp/mastodon-profile hides /home/mouse/.emacs.d/straight/build/mastodon/mastodon-profile
/home/mouse/code/elisp/mastodon.el/lisp/mastodon-views hides /home/mouse/.emacs.d/straight/build/mastodon/mastodon-views
/home/mouse/code/elisp/mastodon.el/lisp/mastodon-tl hides /home/mouse/.emacs.d/straight/build/mastodon/mastodon-tl
/home/mouse/code/elisp/mastodon.el/lisp/mastodon-toot hides /home/mouse/.emacs.d/straight/build/mastodon/mastodon-toot
/home/mouse/code/elisp/mastodon.el/lisp/mastodon-inspect hides /home/mouse/.emacs.d/straight/build/mastodon/mastodon-inspect

Features:
(shadow sort mail-extr emacsbug message dired dired-loaddefs rfc822 mml
mml-sec epa gnus-util rmail rmail-loaddefs mm-decode mm-bodies mm-encode
mailabbrev gmm-utils sendmail compile latexenc ox-odt rng-loc rng-uri
rng-parse rng-match rng-dt rng-util rng-pttrn nxml-parse nxml-ns
nxml-enc xmltok nxml-util ox-latex ox-icalendar org-agenda org-refile
ox-html table ox-ascii ox-publish ox org-element avl-tree generator
mastodon-media mastodon-profile parse-time gnutls network-stream
url-http mail-parse rfc2231 rfc2047 rfc2045 mm-util ietf-drums
mail-prsvr url-gw nsm rmc url-cache url-auth mastodon-auth
mastodon-client plstore epg rfc6068 epg-config delsel cua-base vertico
compat compat-29 vertico-autoloads compat-autoloads wombat-theme
mastodon derived mastodon-search mastodon-toot mastodon-tl let-alist
thingatpt shr kinsoku puny svg xml dom browse-url text-property-search
facemenu mastodon-iso mastodon-http mastodon-autoloads mpv tq org-timer
org-clock org ob ob-tangle ob-ref ob-lob ob-table ob-exp org-macro
org-footnote org-src ob-comint org-pcomplete pcomplete comint ansi-color
ring org-list org-faces org-entities noutline outline easy-mmode
org-version ob-emacs-lisp ob-core ob-eval org-table oc-basic bibtex
iso8601 time-date ol rx org-keys oc org-compat org-macs org-loaddefs
format-spec find-func cal-menu calendar cal-loaddefs mpv-autoloads
company edmacro kmacro company-autoloads ts s ts-autoloads s-autoloads
persist persist-autoloads request mailheader mail-utils url url-proxy
url-privacy url-expand url-methods url-history url-cookie url-domsuf
url-util url-parse auth-source eieio eieio-core eieio-loaddefs
password-cache url-vars mailcap request-autoloads emojify advice apropos
tar-mode arc-mode archive-mode pcase json map ht dash emojify-autoloads
ht-autoloads dash-autoloads finder-inf use-package-core
use-package-autoloads info bind-key-autoloads straight-autoloads cl-seq
cl-extra help-mode seq byte-opt straight subr-x cl-macs gv cl-loaddefs
cl-lib bytecomp byte-compile cconv iso-transl tooltip eldoc paren
electric uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel
term/x-win x-win term/common-win x-dnd tool-bar dnd fontset image
regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu
timer select scroll-bar mouse jit-lock font-lock syntax font-core
term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese composite emoji-zwj charscript charprop case-table
epa-hook jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice
button loaddefs faces cus-face macroexp files window text-properties
overlay sha1 md5 base64 format env code-pages mule custom widget
hashtable-print-readable backquote threads dbusbind inotify
dynamic-setting system-font-setting font-render-setting cairo
move-toolbar gtk x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 311432 41501)
 (symbols 48 25123 6)
 (strings 32 147666 7861)
 (string-bytes 1 6525722)
 (vectors 16 73699)
 (vector-slots 8 3025398 126294)
 (floats 8 187 493)
 (intervals 56 1691 262)
 (buffers 992 15))

-- 
some writing: https://anarchive.mooo.com
an internets: https://pleasantlybabykid.tumblr.com/
.
xmpp: mousebot <at> ghost.noho.st
.
gpg pub key: 0x582C8EAF0B0D77C9
fingerprint: DA24 B943 36EF C491 E22F A70B 582C 8EAF 0B0D 77C9




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63518; Package emacs. (Mon, 15 May 2023 14:02:01 GMT) Full text and rfc822 format available.

Message #8 received at 63518 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: mousebot <mousebot <at> riseup.net>
Cc: 63518 <at> debbugs.gnu.org
Subject: Re: bug#63518: 28.2;
 shr.el seems to break inline latex (mathjax) in html
Date: Mon, 15 May 2023 17:01:47 +0300
> Date: Mon, 15 May 2023 13:21:51 +0200
> From: mousebot <mousebot <at> riseup.net>
> 
> The fediverse client I maintain, mastodon.el, uses shr-render-region to render individual posts. Some instances, e.g. https://mathstodon.xyz, allow users to post inline latex using mathjax notation.
> 
> When shr.el renders inline latex, it often breaks it as it fills the text. It inserts a newline in between the two characters that open an inline latex block: `\(` or `\[`. Using normal fill commands to fill text (fill-region, fill paragraph) do not split latex in this way, from what I could gather.
> 
> When digging around and debugging a little, I found that in shr-find-fill-point, the check (shr-char-kinsoku-eol-p (following-char)) in the when condition returns t when point is in between \ and ( or [, meaning that shr-find-fill-point considers that position to be a breakable point. Commenting that single check seems to largely prevent the undesired splitting. (Behaviour confirmed by my checks and also by another mastodon.el user.)
> 
> I don't really understand the significance of the checks that shr-find-fill-point runs, nor whether they can be temporarily deactivated or worked around in some other way.

That function looks for a suitable place to break the line in two.

The question is whether we can reliably determine that we are inside
inline latex, so that we augment the conditions for a break point.
Turning that off unconditionally is not an option.  Do you happen to
know about some criteria to be applied to distinguish this special
case?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63518; Package emacs. (Mon, 15 May 2023 14:30:02 GMT) Full text and rfc822 format available.

Message #11 received at 63518 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: mousebot <mousebot <at> riseup.net>
Cc: 63518 <at> debbugs.gnu.org
Subject: Re: bug#63518: 28.2; shr.el seems to break inline latex (mathjax) in
 html
Date: Mon, 15 May 2023 17:29:36 +0300
[Please use Reply All to keep the bug tracker CC'ed.]

> Date: Mon, 15 May 2023 16:14:47 +0200
> From: mousebot <mousebot <at> riseup.net>
> 
> Thanks for your response Eli.
> 
> Yes, I'm aware of that the function does that. What I meant is I don't understand how the kinsoku functions in the when clause work, so I don't feel qualified to hack around with them.
> 
> > 
> > The question is whether we can reliably determine that we are inside
> > inline latex, so that we augment the conditions for a break point.
> > Turning that off unconditionally is not an option.  Do you happen to
> > know about some criteria to be applied to distinguish this special
> > case?
> 
> I wondered if we couldn't modify the functionality to flag that the html being rendered (may) contain inline latex? (An optional argument say, so that it only tries to render inline latex if specified.)

The problem is that HTML that includes inline latex can also include
other text that needs the kinsoku treatment.  So this cannot be a
global flag, it must be raised only while processing the inline latex
part.

> Re inline latex, I don't know much about it myself. From what I have seen on the mathjax website and the examples in the thread I shared, it is enclosed in \[...\] or \(...\). I also read that it can be enclosed in $...$, but I haven't seen that on mathstodon.xyz.
> 
> I wrote a (probably *un*reliable!) fill-predicate function with regexes, one set to check if we were in between the \ and ( or [, and one to check if we were somewhere in between a \( or \[ and a \) or \]. But then I realized that shr seemingly doesn't working with fill-predicates, but makes its own filling decisions.
> 




This bug report was last modified 339 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.