GNU bug report logs - #69381
mumi does not correctly display (some?) non-ascii characters

Previous Next

Package: mumi;

Reported by: Tomas Volf <~@wolfsden.cz>

Date: Sun, 25 Feb 2024 13:27:03 UTC

Severity: normal

Tags: patch

Done: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>

To reply to this bug, email your comments to 69381 AT debbugs.gnu.org.
There is no need to reopen the bug first.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-mumi <at> gnu.org:
bug#69381; Package mumi. (Sun, 25 Feb 2024 13:27:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Tomas Volf <~@wolfsden.cz>:
New bug report received and forwarded. Copy sent to bug-mumi <at> gnu.org. (Sun, 25 Feb 2024 13:27:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Tomas Volf <~@wolfsden.cz>
To: bug-mumi <at> gnu.org
Subject: mumi does not correctly display (some?) non-ascii characters
Date: Sun, 25 Feb 2024 14:04:10 +0100
[Message part 1 (text/plain, inline)]
Hi,

when I compare mumi page[0] with debbugs page[1], the from field displays "???"
in mumi, but "宋文武" in debbugs.

Have a nice day,
Tomas Volf

0: https://issues.guix.gnu.org/57268
1: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=57268

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-mumi <at> gnu.org:
bug#69381; Package mumi. (Tue, 14 May 2024 23:14:01 GMT) Full text and rfc822 format available.

Message #8 received at 69381 <at> debbugs.gnu.org (full text, mbox):

From: Felix Lechner <felix.lechner <at> lease-up.com>
To: 69381 <at> patchwise.org
Cc: Tomas Volf <~@wolfsden.cz>, Felix Lechner <felix.lechner <at> lease-up.com>
Subject: [PATCH] Convert HTML to UTF-8 ourselves. (Closes: #69381)
Date: Tue, 14 May 2024 16:12:49 -0700
This fixes a host of encoding issues in Mumi, including the diff
problems that are not mentioned in the bug.  An example is here:

    https://issues.guix.gnu.org/63508#4

The procedure version may one day be more efficient but does not work.
Based on comments in the Guile source code, the procedure style may
one day enable more advanced response formats.  The author is unclear
as to why the procedure does not work.  There may be a complex
interaction involving the response headers.

A preview of this code is live at patchwise.org.

The solution of this bug may depend on the patch in Bug#70907.  This
patch furthermore depends on the patch in Bug#70906, but the solution
of the bug may not.
---
 mumi/web/render.scm | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mumi/web/render.scm b/mumi/web/render.scm
index 316ca4c..9b16f8d 100644
--- a/mumi/web/render.scm
+++ b/mumi/web/render.scm
@@ -28,6 +28,7 @@
   #:use-module ((ice-9 textual-ports)
                 #:select (get-string-all put-string))
   #:use-module (ice-9 match)
+  #:use-module (rnrs bytevectors)
   #:use-module (web http)
   #:use-module (web request)
   #:use-module (web response)
@@ -104,13 +105,13 @@
 (define* (render-html sxml #:key (extra-headers '()))
   (values (append extra-headers
                   '((content-type . (text/html (charset . "utf-8")))))
-          (lambda (port)
-            (sxml->html sxml port))))
+          (string->utf8
+           (sxml->html-string sxml))))
 
 (define (render-json json)
   (values '((content-type . (application/json (charset . "utf-8"))))
-          (lambda (port)
-            (scm->json json port))))
+          (string->utf8
+           (scm->json-string json))))
 
 (define (not-found uri)
   (values (build-response #:code 404)
-- 
2.41.0





Added blocking bug(s) 70907 and 70906 Request was from Felix Lechner <felix.lechner <at> lease-up.com> to control <at> debbugs.gnu.org. (Tue, 14 May 2024 23:16:02 GMT) Full text and rfc822 format available.

Added tag(s) patch. Request was from Felix Lechner <felix.lechner <at> lease-up.com> to control <at> debbugs.gnu.org. (Tue, 14 May 2024 23:16:02 GMT) Full text and rfc822 format available.

Information forwarded to ~@wolfsden.cz, felix.lechner <at> lease-up.com, bug-mumi <at> gnu.org:
bug#69381; Package mumi. (Sat, 02 Nov 2024 00:07:02 GMT) Full text and rfc822 format available.

Message #15 received at 69381 <at> debbugs.gnu.org (full text, mbox):

From: noe <at> xn--no-cja.eu
To: 69381 <at> debbugs.gnu.org
Cc: Noé Lopez <noelopez <at> free.fr>
Subject: [PATCH] web: Use string to avoid losing unicode characters.
Date: Sat,  2 Nov 2024 01:07:28 +0100
From: Noé Lopez <noelopez <at> free.fr>

I don’t really understand why the unicode characters were lost in the
first place, maybe something in the sanitize-response of (fibers web
server)?  Specifically, strings and procedures don’t take the same
path there.

* mumi/web/render.scm (render-html): Return string instead of procedure.
---
 mumi/web/render.scm | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mumi/web/render.scm b/mumi/web/render.scm
index 168f3bc..c28a26f 100644
--- a/mumi/web/render.scm
+++ b/mumi/web/render.scm
@@ -105,8 +105,9 @@
 (define* (render-html sxml #:key (extra-headers '()))
   (values (append extra-headers
                   '((content-type . (text/html (charset . "utf-8")))))
-          (lambda (port)
-            (sxml->html sxml port))))
+          (call-with-output-string
+	    (lambda (port)
+              (sxml->html sxml port)))))
 
 (define (render-json json)
   (values '((content-type . (application/json)))
-- 
2.46.0





Information forwarded to bug-mumi <at> gnu.org:
bug#69381; Package mumi. (Sat, 02 Nov 2024 00:14:02 GMT) Full text and rfc822 format available.

Message #18 received at 69381 <at> debbugs.gnu.org (full text, mbox):

From: Noé Lopez <noe <at> xn--no-cja.eu>
To: 69381 <at> debbugs.gnu.org
Subject: [PATCH] web: Use string to avoid losing unicode characters.
Date: Sat, 02 Nov 2024 01:14:15 +0100
Hi,

Wanted to send this patch separately but had this issue selected in mumi
so it sent it here, oops.

I recognize this solution is not optimal (a hack even), but it should be
heavily considered as the issue is rampant among international users.

I suspect the actual issue lies in fibers, as said in the commit message
and I’ll try to fix it there but this patch is still important in the
meanwhile.

Good night,
Noé




Information forwarded to bug-mumi <at> gnu.org:
bug#69381; Package mumi. (Sat, 02 Nov 2024 02:23:02 GMT) Full text and rfc822 format available.

Message #21 received at 69381 <at> debbugs.gnu.org (full text, mbox):

From: Noé Lopez <noe <at> xn--no-cja.eu>
To: 69381 <at> debbugs.gnu.org
Subject: [PATCH] web: Use string to avoid losing unicode characters.
Date: Sat, 02 Nov 2024 03:23:24 +0100
Small update,

I’ve investigated the issue in fibers and I now blame the guile web
library for the issue.  Apparently it sets the port to ISO-8859-1
encoding each time you call read-request, but it acts like « yeah don’t
worry just use utf-8 for your body » in the docs.

That’s fine UNLESS you use chunked transfers (omitting content-length in
fibers), in which case it just decides to blow up :///// (it assumes one
character = one byte)

In the end I’m pretty sure any of this could have been avoided by just
not replacing every character with question marks.  Had it kept the
invalid bytes intact they would have translated back with no issue.




Information forwarded to bug-mumi <at> gnu.org:
bug#69381; Package mumi. (Fri, 07 Feb 2025 14:56:01 GMT) Full text and rfc822 format available.

Message #24 received at 69381 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: Noé Lopez <noe <at> xn--no-cja.eu>
Cc: 69381 <at> debbugs.gnu.org
Subject: Re: bug#69381: mumi does not correctly display (some?) non-ascii
 characters
Date: Fri, 07 Feb 2025 23:54:41 +0900
Hi Noé,

Noé Lopez <noe <at> noé.eu> writes:

> Small update,
>
> I’ve investigated the issue in fibers and I now blame the guile web
> library for the issue.  Apparently it sets the port to ISO-8859-1
> encoding each time you call read-request, but it acts like « yeah don’t
> worry just use utf-8 for your body » in the docs.
>
> That’s fine UNLESS you use chunked transfers (omitting content-length in
> fibers), in which case it just decides to blow up :///// (it assumes one
> character = one byte)
>
> In the end I’m pretty sure any of this could have been avoided by just
> not replacing every character with question marks.  Had it kept the
> invalid bytes intact they would have translated back with no issue.

Nice investigation!  Did you create an issue at bug-guile <at> gnu.org?  
don't see it on the tracker.  Or perhaps this could be tackled from the
angle of fibers?  For example by adding a new failing test reproducing
the problem to its test suite, and going from there.

-- 
Thanks,
Maxim




Information forwarded to bug-mumi <at> gnu.org:
bug#69381; Package mumi. (Fri, 07 Feb 2025 15:12:02 GMT) Full text and rfc822 format available.

Message #27 received at 69381 <at> debbugs.gnu.org (full text, mbox):

From: Noé Lopez <noe <at> xn--no-cja.eu>
To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Cc: Christopher Baines <mail <at> cbaines.net>, 69381 <at> debbugs.gnu.org
Subject: Re: bug#69381: mumi does not correctly display (some?) non-ascii
 characters
Date: Fri, 07 Feb 2025 16:11:12 +0100
Maxim Cournoyer <maxim.cournoyer <at> gmail.com> writes:

> Hi Noé,
>
> Noé Lopez <noe <at> noé.eu> writes:
>
>> Small update,
>>
>> I’ve investigated the issue in fibers and I now blame the guile web
>> library for the issue.  Apparently it sets the port to ISO-8859-1
>> encoding each time you call read-request, but it acts like « yeah don’t
>> worry just use utf-8 for your body » in the docs.
>>
>> That’s fine UNLESS you use chunked transfers (omitting content-length in
>> fibers), in which case it just decides to blow up :///// (it assumes one
>> character = one byte)
>>
>> In the end I’m pretty sure any of this could have been avoided by just
>> not replacing every character with question marks.  Had it kept the
>> invalid bytes intact they would have translated back with no issue.
>
> Nice investigation!  Did you create an issue at bug-guile <at> gnu.org?  
> don't see it on the tracker.  Or perhaps this could be tackled from the
> angle of fibers?  For example by adding a new failing test reproducing
> the problem to its test suite, and going from there.
>

I talked about this with Christopher Baines at FOSDEM and he seemed to
know much more about it than me, so maybe he can suggest a way forward?

Starting with a failing test seems like a good idea.

Have a nice day,
Noé




Information forwarded to bug-mumi <at> gnu.org:
bug#69381; Package mumi. (Sat, 08 Feb 2025 09:41:01 GMT) Full text and rfc822 format available.

Message #30 received at 69381 <at> debbugs.gnu.org (full text, mbox):

From: Christopher Baines <mail <at> cbaines.net>
To: Noé Lopez <noe <at> xn--no-cja.eu>
Cc: 69381 <at> debbugs.gnu.org, Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Subject: Re: bug#69381: mumi does not correctly display (some?) non-ascii
 characters
Date: Sat, 08 Feb 2025 09:40:40 +0000
[Message part 1 (text/plain, inline)]
Noé Lopez <noe <at> noé.eu> writes:

> Maxim Cournoyer <maxim.cournoyer <at> gmail.com> writes:
>
>> Hi Noé,
>>
>> Noé Lopez <noe <at> noé.eu> writes:
>>
>>> Small update,
>>>
>>> I’ve investigated the issue in fibers and I now blame the guile web
>>> library for the issue.  Apparently it sets the port to ISO-8859-1
>>> encoding each time you call read-request, but it acts like « yeah don’t
>>> worry just use utf-8 for your body » in the docs.
>>>
>>> That’s fine UNLESS you use chunked transfers (omitting content-length in
>>> fibers), in which case it just decides to blow up :///// (it assumes one
>>> character = one byte)
>>>
>>> In the end I’m pretty sure any of this could have been avoided by just
>>> not replacing every character with question marks.  Had it kept the
>>> invalid bytes intact they would have translated back with no issue.
>>
>> Nice investigation!  Did you create an issue at bug-guile <at> gnu.org?  
>> don't see it on the tracker.  Or perhaps this could be tackled from the
>> angle of fibers?  For example by adding a new failing test reproducing
>> the problem to its test suite, and going from there.
>>
>
> I talked about this with Christopher Baines at FOSDEM and he seemed to
> know much more about it than me, so maybe he can suggest a way forward?
>
> Starting with a failing test seems like a good idea.

I've raised a Pull Request which I think should help in fibers:

  https://github.com/wingo/fibers/pull/116

I think this issue should be possible to work around in Mumi as well,
the encoding on the port needs to be set, and I think Guile 3.0.10 needs
to be used.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-mumi <at> gnu.org:
bug#69381; Package mumi. (Tue, 11 Feb 2025 22:13:01 GMT) Full text and rfc822 format available.

Message #33 received at 69381 <at> debbugs.gnu.org (full text, mbox):

From: Felix Lechner <felix.lechner <at> lease-up.com>
To: Christopher Baines <mail <at> cbaines.net>
Cc: Noé Lopez <noe <at> xn--no-cja.eu>, 69381 <at> debbugs.gnu.org,
 Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Subject: Re: bug#69381: mumi does not correctly display (some?) non-ascii
 characters
Date: Tue, 11 Feb 2025 14:12:33 -0800
Hi,

On Sat, Feb 08 2025, Christopher Baines wrote:

> this issue should be possible to work around in Mumi as well,

What is wrong with my proposed patch, please?  Just because a lambda
will eventually save memory and enable chunking?

Kind regards,
Felix




Information forwarded to bug-mumi <at> gnu.org:
bug#69381; Package mumi. (Tue, 11 Feb 2025 22:43:02 GMT) Full text and rfc822 format available.

Message #36 received at 69381 <at> debbugs.gnu.org (full text, mbox):

From: Christopher Baines <mail <at> cbaines.net>
To: Felix Lechner <felix.lechner <at> lease-up.com>
Cc: 69381 <at> debbugs.gnu.org
Subject: Re: bug#69381: mumi does not correctly display (some?) non-ascii
 characters
Date: Tue, 11 Feb 2025 22:42:28 +0000
[Message part 1 (text/plain, inline)]
Felix Lechner <felix.lechner <at> lease-up.com> writes:

> Hi,
>
> On Sat, Feb 08 2025, Christopher Baines wrote:
>
>> this issue should be possible to work around in Mumi as well,
>
> What is wrong with my proposed patch, please?  Just because a lambda
> will eventually save memory and enable chunking?

I'm late to this thread but looking at your patch that looks like it
should work, there's nothing wrong with it, I was just looking at fixing
the issue with chunked responses.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-mumi <at> gnu.org:
bug#69381; Package mumi. (Wed, 12 Feb 2025 02:50:02 GMT) Full text and rfc822 format available.

Message #39 received at 69381 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: Felix Lechner <felix.lechner <at> lease-up.com>
Cc: Noé Lopez <noe <at> xn--no-cja.eu>,
 Christopher Baines <mail <at> cbaines.net>, 69381 <at> debbugs.gnu.org
Subject: Re: bug#69381: mumi does not correctly display (some?) non-ascii
 characters
Date: Wed, 12 Feb 2025 11:49:30 +0900
Hi Felix,

Felix Lechner <felix.lechner <at> lease-up.com> writes:

> Hi,
>
> On Sat, Feb 08 2025, Christopher Baines wrote:
>
>> this issue should be possible to work around in Mumi as well,
>
> What is wrong with my proposed patch, please?  Just because a lambda
> will eventually save memory and enable chunking?

There's nothing wrong with it; I believe it could be pushed already,
while a more definitive fix in fibers or guile is pursued!  I just
haven't gotten round to it yet.  Anyone with commit access please feel
free to beat me to it, and update mumi on berlin with it.

-- 
Thanks,
Maxim




Information forwarded to bug-mumi <at> gnu.org:
bug#69381; Package mumi. (Wed, 12 Feb 2025 13:02:02 GMT) Full text and rfc822 format available.

Message #42 received at 69381 <at> debbugs.gnu.org (full text, mbox):

From: Arun Isaac <arunisaac <at> systemreboot.net>
To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>, Felix Lechner
 <felix.lechner <at> lease-up.com>
Cc: Noé Lopez <noe <at> xn--no-cja.eu>,
 Christopher Baines <mail <at> cbaines.net>, 69381 <at> debbugs.gnu.org
Subject: Re: bug#69381: mumi does not correctly display (some?) non-ascii
 characters
Date: Wed, 12 Feb 2025 13:01:26 +0000
Hi Maxim,

I am redeploying mumi on berlin today for a bunch of other commits. I
can take this over, if you don't mind.

Regards,
Arun




Reply sent to Maxim Cournoyer <maxim.cournoyer <at> gmail.com>:
You have taken responsibility. (Wed, 12 Feb 2025 15:00:03 GMT) Full text and rfc822 format available.

Notification sent to Tomas Volf <~@wolfsden.cz>:
bug acknowledged by developer. (Wed, 12 Feb 2025 15:00:03 GMT) Full text and rfc822 format available.

Message #47 received at 69381-done <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: Arun Isaac <arunisaac <at> systemreboot.net>
Cc: Noé Lopez <noe <at> xn--no-cja.eu>,
 Christopher Baines <mail <at> cbaines.net>, 69381-done <at> debbugs.gnu.org,
 Felix Lechner <felix.lechner <at> lease-up.com>
Subject: Re: bug#69381: mumi does not correctly display (some?) non-ascii
 characters
Date: Wed, 12 Feb 2025 23:59:30 +0900
Hi Arun,

Arun Isaac <arunisaac <at> systemreboot.net> writes:

> Hi Maxim,
>
> I am redeploying mumi on berlin today for a bunch of other commits. I
> can take this over, if you don't mind.

Yes, please :-).  Thanks a lot.

-- 
Thanks,
Maxim




Information forwarded to bug-mumi <at> gnu.org:
bug#69381; Package mumi. (Thu, 13 Feb 2025 11:55:01 GMT) Full text and rfc822 format available.

Message #50 received at 69381-done <at> debbugs.gnu.org (full text, mbox):

From: Arun Isaac <arunisaac <at> systemreboot.net>
To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>, Felix Lechner
 <felix.lechner <at> lease-up.com>, Noé Lopez
 <noe <at> xn--no-cja.eu>, Christopher
 Baines <mail <at> cbaines.net>, 69381-done <at> debbugs.gnu.org
Subject: Re: bug#69381: mumi does not correctly display (some?) non-ascii
 characters
Date: Thu, 13 Feb 2025 11:53:37 +0000
Hi all,

I have updated mumi on berlin. But, berlin needs to be rebooted for a
shepherd update before the mumi update can take effect. And, someone
needs to be at the data center when this happens, just in case there's
any trouble with the reboot. So, this is going to take a little while,
hopefully under a few days or a week.

Regards,
Arun




This bug report was last modified 18 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.