GNU bug report logs - #12051
24.1; rcirc-send-message doesn't take multibyte into account.

Previous Next

Package: emacs;

Reported by: Li Ian-Xue <b4283 <at> bephor.org>

Date: Thu, 26 Jul 2012 01:17:02 UTC

Severity: normal

Found in version 24.1

Done: Leo <sdl.web <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 12051 in the body.
You can then email your comments to 12051 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#12051; Package emacs. (Thu, 26 Jul 2012 01:17:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Li Ian-Xue <b4283 <at> bephor.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 26 Jul 2012 01:17:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Li Ian-Xue <b4283 <at> bephor.org>
To: bug-gnu-emacs <at> gnu.org
Subject: 24.1; rcirc-send-message doesn't take multibyte into account.
Date: Thu, 26 Jul 2012 00:18:29 +0800
[Message part 1 (text/plain, inline)]
Hello developers,

I discovered recently that the irc client `rcirc', although has an
max-message-length set, but it simply uses (length str) for detecting
the output length, which is not desirable for multibyte users because
usually our characters encode to more than one byte, and this causes an
error that the client actually sends out more bytes than the standard
has required (512 bytes to my understanding).

This limit is easily reached since chinese characters are usually
encoded with 3 bytes for one character.

By this error, if the server truncates the result string simply by
bytes, then it's known to cause the string to become entirely scrambles
for xchat.

I'm attaching a patch to perform an binary search for multibyte strings,
and this patch should not have any penalties for original ascii users
since it begins with a (multibyte-string-p) to decide which style to use.

[rcirc-fix-multibyte-overflow.patch (text/x-patch, inline)]
--- rcirc.el	2012-07-25 23:52:41.813226461 +0800
+++ rcirc-1.el	2012-07-25 23:55:20.813220626 +0800
@@ -792,21 +792,40 @@
 (defvar rcirc-max-message-length 420
   "Messages longer than this value will be split.")
 
+(defun rcirc-multibyte-position-at-byte (str bytes)
+  (if (multibyte-string-p str)
+      (rcirc-multibyte-position-at-byte-1 str bytes 0 0)
+      bytes))
+
+(defun rcirc-multibyte-position-at-byte-1 (str bytes now-chars now-bytes)
+  (let ((len (length str)))
+    (if (<= len 1)
+        now-chars
+      (let* ((half-len (/ len 2))
+             (lstr (substring str 0 half-len))
+             (rstr (substring str half-len len))
+             (now-bytes-1 (+ now-bytes (string-bytes lstr))))
+        (if (> now-bytes-1 bytes)
+            (rcirc-multibyte-position-at-byte-1 lstr bytes now-chars now-bytes)
+          (rcirc-multibyte-position-at-byte-1 rstr bytes (+ half-len now-chars) now-bytes-1))))))
+
 (defun rcirc-send-message (process target message &optional noticep silent)
   "Send TARGET associated with PROCESS a privmsg with text MESSAGE.
 If NOTICEP is non-nil, send a notice instead of privmsg.
 If SILENT is non-nil, do not print the message in any irc buffer."
   ;; max message length is 512 including CRLF
   (let* ((response (if noticep "NOTICE" "PRIVMSG"))
-         (oversize (> (length message) rcirc-max-message-length))
+         (oversize (> (string-bytes message) rcirc-max-message-length))
+         (adjusted-pos (if oversize
+                            (rcirc-multibyte-position-at-byte message rcirc-max-message-length)))
          (text (if oversize
-                   (substring message 0 rcirc-max-message-length)
+                   (substring message 0 adjusted-pos)
                  message))
          (text (if (string= text "")
                    " "
                  text))
          (more (if oversize
-                   (substring message rcirc-max-message-length))))
+                   (substring message adjusted-pos))))
     (rcirc-get-buffer-create process target)
     (rcirc-send-string process (concat response " " target " :" text))
     (unless silent

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12051; Package emacs. (Mon, 13 Aug 2012 02:05:01 GMT) Full text and rfc822 format available.

Message #8 received at 12051 <at> debbugs.gnu.org (full text, mbox):

From: Leo <sdl.web <at> gmail.com>
To: Li Ian-Xue <b4283 <at> bephor.org>
Cc: 12051 <at> debbugs.gnu.org
Subject: Re: bug#12051: 24.1;
	rcirc-send-message doesn't take multibyte into account.
Date: Mon, 13 Aug 2012 09:53:42 +0800
[Message part 1 (text/plain, inline)]
On 2012-07-26 00:18 +0800, Li Ian-Xue wrote:
> I discovered recently that the irc client `rcirc', although has an
> max-message-length set, but it simply uses (length str) for detecting
> the output length, which is not desirable for multibyte users because
> usually our characters encode to more than one byte, and this causes an
> error that the client actually sends out more bytes than the standard
> has required (512 bytes to my understanding).

Could you test if the attached patch fixes this problem? Thanks.

[0001-Fix-bug-12051.patch (text/x-patch, inline)]
diff --git a/lisp/net/rcirc.el b/lisp/net/rcirc.el
index e34b7c79..19f54a8e 100644
--- a/lisp/net/rcirc.el
+++ b/lisp/net/rcirc.el
@@ -794,26 +794,34 @@ (defun rcirc-buffer-nick (&optional buffer)
 (defvar rcirc-max-message-length 420
   "Messages longer than this value will be split.")
 
+(defun rcirc-split-message (message)
+  (with-temp-buffer
+    (insert message)
+    (goto-char (point-min))
+    (let (result)
+      (while (not (eobp))
+	(goto-char (or (byte-to-position rcirc-max-message-length)
+		       (point-max)))
+	(while (and (not (bobp))
+		    (> (length
+			(encode-coding-region (point-min) (point)
+					      rcirc-encode-coding-system t))
+		       rcirc-max-message-length))
+	  (forward-char -1))
+	(push (delete-and-extract-region (point-min) (point)) result))
+      (nreverse result))))
+
 (defun rcirc-send-message (process target message &optional noticep silent)
   "Send TARGET associated with PROCESS a privmsg with text MESSAGE.
 If NOTICEP is non-nil, send a notice instead of privmsg.
 If SILENT is non-nil, do not print the message in any irc buffer."
   ;; max message length is 512 including CRLF
-  (let* ((response (if noticep "NOTICE" "PRIVMSG"))
-         (oversize (> (length message) rcirc-max-message-length))
-         (text (if oversize
-                   (substring message 0 rcirc-max-message-length)
-                 message))
-         (text (if (string= text "")
-                   " "
-                 text))
-         (more (if oversize
-                   (substring message rcirc-max-message-length))))
+  (let ((response (if noticep "NOTICE" "PRIVMSG")))
     (rcirc-get-buffer-create process target)
-    (rcirc-send-string process (concat response " " target " :" text))
-    (unless silent
-      (rcirc-print process (rcirc-nick process) response target text))
-    (when more (rcirc-send-message process target more noticep))))
+    (dolist (msg (rcirc-split-message message))
+      (rcirc-send-string process (concat response " " target " :" msg))
+      (unless silent
+	(rcirc-print process (rcirc-nick process) response target msg)))))
 
 (defvar rcirc-input-ring nil)
 (defvar rcirc-input-ring-index 0)
-- 
1.7.9.6 (Apple Git-31.1)


Reply sent to Leo <sdl.web <at> gmail.com>:
You have taken responsibility. (Tue, 14 Aug 2012 13:21:01 GMT) Full text and rfc822 format available.

Notification sent to Li Ian-Xue <b4283 <at> bephor.org>:
bug acknowledged by developer. (Tue, 14 Aug 2012 13:21:02 GMT) Full text and rfc822 format available.

Message #13 received at 12051-done <at> debbugs.gnu.org (full text, mbox):

From: Leo <sdl.web <at> gmail.com>
To: 12051-done <at> debbugs.gnu.org
Subject: Re: bug#12051: 24.1;
	rcirc-send-message doesn't take multibyte into account.
Date: Tue, 14 Aug 2012 21:11:15 +0800
Fixed in emacs 24.2




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12051; Package emacs. (Tue, 14 Aug 2012 15:55:02 GMT) Full text and rfc822 format available.

Message #16 received at 12051 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Leo <sdl.web <at> gmail.com>
Cc: b4283 <at> bephor.org, 12051 <at> debbugs.gnu.org
Subject: Re: bug#12051: 24.1;
	rcirc-send-message doesn't take multibyte into account.
Date: Tue, 14 Aug 2012 18:45:46 +0300
> From: Leo <sdl.web <at> gmail.com>
> Date: Mon, 13 Aug 2012 09:53:42 +0800
> Cc: 12051 <at> debbugs.gnu.org
> 
> Could you test if the attached patch fixes this problem? Thanks.
> 
> 
> [2:text/x-patch Hide]
> 
> diff --git a/lisp/net/rcirc.el b/lisp/net/rcirc.el
> index e34b7c79..19f54a8e 100644
> --- a/lisp/net/rcirc.el
> +++ b/lisp/net/rcirc.el
> @@ -794,26 +794,34 @@ (defun rcirc-buffer-nick (&optional buffer)
>  (defvar rcirc-max-message-length 420
>    "Messages longer than this value will be split.")
>  
> +(defun rcirc-split-message (message)
> +  (with-temp-buffer
> +    (insert message)
> +    (goto-char (point-min))
> +    (let (result)
> +      (while (not (eobp))
> +	(goto-char (or (byte-to-position rcirc-max-message-length)
> +		       (point-max)))
> +	(while (and (not (bobp))
> +		    (> (length
> +			(encode-coding-region (point-min) (point)
> +					      rcirc-encode-coding-system t))
> +		       rcirc-max-message-length))
> +	  (forward-char -1))
> +	(push (delete-and-extract-region (point-min) (point)) result))
> +      (nreverse result))))
> +
>  (defun rcirc-send-message (process target message &optional noticep silent)
>    "Send TARGET associated with PROCESS a privmsg with text MESSAGE.
>  If NOTICEP is non-nil, send a notice instead of privmsg.
>  If SILENT is non-nil, do not print the message in any irc buffer."
>    ;; max message length is 512 including CRLF
> -  (let* ((response (if noticep "NOTICE" "PRIVMSG"))
> -         (oversize (> (length message) rcirc-max-message-length))
> -         (text (if oversize
> -                   (substring message 0 rcirc-max-message-length)
> -                 message))
> -         (text (if (string= text "")
> -                   " "
> -                 text))
> -         (more (if oversize
> -                   (substring message rcirc-max-message-length))))
> +  (let ((response (if noticep "NOTICE" "PRIVMSG")))
>      (rcirc-get-buffer-create process target)
> -    (rcirc-send-string process (concat response " " target " :" text))
> -    (unless silent
> -      (rcirc-print process (rcirc-nick process) response target text))
> -    (when more (rcirc-send-message process target more noticep))))
> +    (dolist (msg (rcirc-split-message message))
> +      (rcirc-send-string process (concat response " " target " :" msg))
> +      (unless silent
> +	(rcirc-print process (rcirc-nick process) response target msg)))))

Isn't it better and simpler to split the string after it is encoded
inside rcirc-send-string?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12051; Package emacs. (Wed, 15 Aug 2012 03:07:01 GMT) Full text and rfc822 format available.

Message #19 received at 12051 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Li Ian-Xue <b4283 <at> bephor.org>
Cc: 12051 <at> debbugs.gnu.org
Subject: Re: bug#12051: 24.1;
	rcirc-send-message doesn't take multibyte into account.
Date: Wed, 15 Aug 2012 05:57:55 +0300
> From: Li Ian-Xue <b4283 <at> bephor.org>
> Date: Wed, 15 Aug 2012 09:59:05 +0800
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > Isn't it better and simpler to split the string after it is encoded
> > inside rcirc-send-string?
> But by then one wouldn't know at which byte to cut. For example when
> it's a multibyte-ascii mixed string.

You can always stop at known characters, like whitespace.

Anyway, another way is simply to assume the worst possible expansion
ratio, and estimate the byte count before encoding.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12051; Package emacs. (Wed, 15 Aug 2012 13:20:02 GMT) Full text and rfc822 format available.

Message #22 received at 12051 <at> debbugs.gnu.org (full text, mbox):

From: Leo <sdl.web <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Li Ian-Xue <b4283 <at> bephor.org>, 12051 <at> debbugs.gnu.org
Subject: Re: bug#12051: 24.1;
	rcirc-send-message doesn't take multibyte into account.
Date: Wed, 15 Aug 2012 21:10:45 +0800
On 2012-08-15 10:57 +0800, Eli Zaretskii wrote:
> You can always stop at known characters, like whitespace.
>
> Anyway, another way is simply to assume the worst possible expansion
> ratio, and estimate the byte count before encoding.

I make a small optimisation that calls encode-coding-region char by
char. Comments?

=== modified file 'lisp/net/rcirc.el'
--- lisp/net/rcirc.el	2012-08-15 12:26:48 +0000
+++ lisp/net/rcirc.el	2012-08-15 13:06:15 +0000
@@ -797,22 +797,24 @@
 (defun rcirc-split-message (message)
   "Split MESSAGE into chunks within `rcirc-max-message-length'."
   ;; `rcirc-encode-coding-system' can have buffer-local value.
-  (let ((encoding rcirc-encode-coding-system))
+  (let ((encoding rcirc-encode-coding-system)
+	result oversize)
     (with-temp-buffer
       (insert message)
       (goto-char (point-min))
-      (let (result)
-	(while (not (eobp))
-	  (goto-char (or (byte-to-position rcirc-max-message-length)
-			 (point-max)))
-	  ;; max message length is 512 including CRLF
-	  (while (and (not (bobp))
-		      (> (length (encode-coding-region
-				  (point-min) (point) encoding t))
-			 rcirc-max-message-length))
-	    (forward-char -1))
-	  (push (delete-and-extract-region (point-min) (point)) result))
-	(nreverse result)))))
+      (while (not (eobp))
+	(goto-char (or (byte-to-position rcirc-max-message-length)
+		       (point-max)))
+	;; Max message length is 512 including CRLF
+	(setq oversize (- (length (encode-coding-region
+				   (point-min) (point) encoding t))
+			  rcirc-max-message-length))
+	(while (and (not (bobp)) (> oversize 0))
+	  (decf oversize (length (encode-coding-region
+				  (1- (point)) (point) encoding t)))
+	  (forward-char -1))
+	(push (delete-and-extract-region (point-min) (point)) result)))
+    (nreverse result)))
 
 (defun rcirc-send-message (process target message &optional noticep silent)
   "Send TARGET associated with PROCESS a privmsg with text MESSAGE.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12051; Package emacs. (Wed, 15 Aug 2012 13:38:02 GMT) Full text and rfc822 format available.

Message #25 received at 12051 <at> debbugs.gnu.org (full text, mbox):

From: Chong Yidong <cyd <at> gnu.org>
To: Leo <sdl.web <at> gmail.com>
Cc: Li Ian-Xue <b4283 <at> bephor.org>, Eli Zaretskii <eliz <at> gnu.org>,
	12051 <at> debbugs.gnu.org
Subject: Re: bug#12051: 24.1;
	rcirc-send-message doesn't take multibyte into account.
Date: Wed, 15 Aug 2012 21:29:04 +0800
Leo <sdl.web <at> gmail.com> writes:

> I make a small optimisation that calls encode-coding-region char by
> char. Comments?

If you are still working on this bug and not really sure what's the best
way to solve it, please revert it ASAP and move the bugfix to the trunk.
It is holding up the 24.2 release.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12051; Package emacs. (Wed, 15 Aug 2012 15:55:02 GMT) Full text and rfc822 format available.

Message #28 received at 12051 <at> debbugs.gnu.org (full text, mbox):

From: Chong Yidong <cyd <at> gnu.org>
To: Leo <sdl.web <at> gmail.com>
Cc: Li Ian-Xue <b4283 <at> bephor.org>, Eli Zaretskii <eliz <at> gnu.org>,
	12051 <at> debbugs.gnu.org
Subject: Re: bug#12051: 24.1;
	rcirc-send-message doesn't take multibyte into account.
Date: Wed, 15 Aug 2012 23:45:19 +0800
Chong Yidong <cyd <at> gnu.org> writes:

> Leo <sdl.web <at> gmail.com> writes:
>
>> I make a small optimisation that calls encode-coding-region char by
>> char. Comments?
>
> If you are still working on this bug and not really sure what's the best
> way to solve it, please revert it ASAP and move the bugfix to the trunk.
> It is holding up the 24.2 release.

OK, I just took a look at the relevant patches.  Personally I think the
previous patch is easier to understand, so unless this new version has
significantly improved performance, I wouldn't bother with it.  But in
any case, please commit future versions of this fix to the trunk, not
the branch, unless you find a regression with it.  Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12051; Package emacs. (Wed, 15 Aug 2012 17:09:01 GMT) Full text and rfc822 format available.

Message #31 received at 12051 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Leo <sdl.web <at> gmail.com>
Cc: b4283 <at> bephor.org, 12051 <at> debbugs.gnu.org
Subject: Re: bug#12051: 24.1;
	rcirc-send-message doesn't take multibyte into account.
Date: Wed, 15 Aug 2012 19:59:36 +0300
> From:  Leo <sdl.web <at> gmail.com>
> Cc: Li Ian-Xue <b4283 <at> bephor.org>,  12051 <at> debbugs.gnu.org
> Date: Wed, 15 Aug 2012 21:10:45 +0800
> 
> On 2012-08-15 10:57 +0800, Eli Zaretskii wrote:
> > You can always stop at known characters, like whitespace.
> >
> > Anyway, another way is simply to assume the worst possible expansion
> > ratio, and estimate the byte count before encoding.
> 
> I make a small optimisation that calls encode-coding-region char by
> char. Comments?

I think it's terribly inefficient to encode the string one character
at a time.

Again, assuming the worst expansion is IMO much better, and will not
force you to encode in a loop.  But that's me.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12051; Package emacs. (Wed, 15 Aug 2012 18:05:02 GMT) Full text and rfc822 format available.

Message #34 received at 12051 <at> debbugs.gnu.org (full text, mbox):

From: Li Ian-Xue <b4283 <at> bephor.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 12051 <at> debbugs.gnu.org, Leo <sdl.web <at> gmail.com>
Subject: Re: bug#12051: 24.1;
	rcirc-send-message doesn't take multibyte into account.
Date: Thu, 16 Aug 2012 01:53:55 +0800
Eli Zaretskii <eliz <at> gnu.org> writes:

>> I make a small optimisation that calls encode-coding-region char by
>> char. Comments?
> I think it's terribly inefficient to encode the string one character
> at a time.
> Again, assuming the worst expansion is IMO much better, and will not
> force you to encode in a loop.  But that's me.

Why, no one is interested in the patch i submitted in July ?
that patched used a binary search.

http://lists.gnu.org/archive/html/bug-gnu-emacs/2012-07/msg00937.html

--
ps: it might need a liitle rework because the mailing list wrapped the
lines. i'll re-post it on a pastebin if anyone is still interested.

http://paste.debian.net/?show=183709





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12051; Package emacs. (Wed, 15 Aug 2012 23:04:01 GMT) Full text and rfc822 format available.

Message #37 received at 12051 <at> debbugs.gnu.org (full text, mbox):

From: Leo <sdl.web <at> gmail.com>
To: Li Ian-Xue <b4283 <at> bephor.org>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 12051 <at> debbugs.gnu.org
Subject: Re: bug#12051: 24.1;
	rcirc-send-message doesn't take multibyte into account.
Date: Thu, 16 Aug 2012 06:54:17 +0800
On 2012-08-16 01:53 +0800, Li Ian-Xue wrote:
> Why, no one is interested in the patch i submitted in July ?
> that patched used a binary search.
>
> http://lists.gnu.org/archive/html/bug-gnu-emacs/2012-07/msg00937.html

Note, you are working with emacs internal encoding and not take into
account rcirc's allowing own encoding.

Leo




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12051; Package emacs. (Wed, 15 Aug 2012 23:05:02 GMT) Full text and rfc822 format available.

Message #40 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Leo <sdl.web <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#12051: 24.1;
	rcirc-send-message doesn't take multibyte into account.
Date: Thu, 16 Aug 2012 06:55:30 +0800
On 2012-08-15 23:45 +0800, Chong Yidong wrote:
> OK, I just took a look at the relevant patches.  Personally I think the
> previous patch is easier to understand, so unless this new version has
> significantly improved performance, I wouldn't bother with it.  But in
> any case, please commit future versions of this fix to the trunk, not
> the branch, unless you find a regression with it.  Thanks.

OK. Optimisation if any will be done after the 24.2 release.

Leo





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12051; Package emacs. (Wed, 15 Aug 2012 23:10:01 GMT) Full text and rfc822 format available.

Message #43 received at 12051 <at> debbugs.gnu.org (full text, mbox):

From: Leo <sdl.web <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 12051 <at> debbugs.gnu.org
Subject: Re: bug#12051: 24.1;
	rcirc-send-message doesn't take multibyte into account.
Date: Thu, 16 Aug 2012 07:00:39 +0800
On 2012-08-16 00:59 +0800, Eli Zaretskii wrote:
> Again, assuming the worst expansion is IMO much better, and will not
> force you to encode in a loop.  But that's me.

By worst expansion, do you mean assuming each char to be 5 bytes? Please
feel free to make the improvement (after 24.2 release per yidong's
request).

Leo




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12051; Package emacs. (Thu, 16 Aug 2012 03:00:02 GMT) Full text and rfc822 format available.

Message #46 received at 12051 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Leo <sdl.web <at> gmail.com>
Cc: 12051 <at> debbugs.gnu.org
Subject: Re: bug#12051: 24.1;
	rcirc-send-message doesn't take multibyte into account.
Date: Thu, 16 Aug 2012 05:50:52 +0300
> From:  Leo <sdl.web <at> gmail.com>
> Cc: 12051 <at> debbugs.gnu.org
> Date: Thu, 16 Aug 2012 07:00:39 +0800
> 
> On 2012-08-16 00:59 +0800, Eli Zaretskii wrote:
> > Again, assuming the worst expansion is IMO much better, and will not
> > force you to encode in a loop.  But that's me.
> 
> By worst expansion, do you mean assuming each char to be 5 bytes?

Yes.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12051; Package emacs. (Thu, 16 Aug 2012 03:26:01 GMT) Full text and rfc822 format available.

Message #49 received at 12051 <at> debbugs.gnu.org (full text, mbox):

From: Leo <sdl.web <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 12051 <at> debbugs.gnu.org
Subject: Re: bug#12051: 24.1;
	rcirc-send-message doesn't take multibyte into account.
Date: Thu, 16 Aug 2012 11:16:04 +0800
On 2012-08-16 10:50 +0800, Eli Zaretskii wrote:
>> By worst expansion, do you mean assuming each char to be 5 bytes?
>
> Yes.

The will split English text at the boundary of 84 chars which seems
sub-optimal.

In the current implementation of rcirc-split-message, the inner loop
might not be run if the encoding is utf-8, which we can assume to be 90%
of the cases. So my suggestion is to leave it alone until we hit a real
case of inefficiency. What do you think?

Leo




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12051; Package emacs. (Thu, 16 Aug 2012 15:31:02 GMT) Full text and rfc822 format available.

Message #52 received at 12051 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Leo <sdl.web <at> gmail.com>
Cc: 12051 <at> debbugs.gnu.org
Subject: Re: bug#12051: 24.1;
	rcirc-send-message doesn't take multibyte into account.
Date: Thu, 16 Aug 2012 18:20:40 +0300
> From: Leo <sdl.web <at> gmail.com>
> Cc: 12051 <at> debbugs.gnu.org
> Date: Thu, 16 Aug 2012 11:16:04 +0800
> 
> On 2012-08-16 10:50 +0800, Eli Zaretskii wrote:
> >> By worst expansion, do you mean assuming each char to be 5 bytes?
> >
> > Yes.
> 
> The will split English text at the boundary of 84 chars which seems
> sub-optimal.

Why is it suboptimal?  (I don't know anything about rcirc.)

If it's important to be better in this case, you could detect it
(e.g., by matching the string against [:ascii:]).

Another ide is to use string-bytes to find out where to break a string
on a character boundary without exceeding the maximum allowed byte
count in a message.

> In the current implementation of rcirc-split-message, the inner loop
> might not be run if the encoding is utf-8, which we can assume to be 90%
> of the cases. So my suggestion is to leave it alone until we hit a real
> case of inefficiency. What do you think?

I'm okay with the current code if you are, but I still think a more
elegant solution should be possible.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12051; Package emacs. (Thu, 16 Aug 2012 17:27:02 GMT) Full text and rfc822 format available.

Message #55 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Leo <sdl.web <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#12051: 24.1;
	rcirc-send-message doesn't take multibyte into account.
Date: Fri, 17 Aug 2012 01:17:25 +0800
On 2012-08-16 23:20 +0800, Eli Zaretskii wrote:
> I'm okay with the current code if you are, but I still think a more
> elegant solution should be possible.

OK let's leave it here. Thanks for suggesting.

Leo





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 14 Sep 2012 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 11 years and 247 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.