GNU bug report logs - #10857
ucs-insert deals inconsistently with errors

Package: emacs;

Reported by: Juanma Barranquero <lekktu <at> gmail.com>

Date: Mon, 20 Feb 2012 15:58:01 UTC

Severity: minor

Tags: patch

Done: Juri Linkov <juri <at> jurta.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 10857 in the body.
You can then email your comments to 10857 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#10857; Package emacs. (Mon, 20 Feb 2012 15:58:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Juanma Barranquero <lekktu <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Mon, 20 Feb 2012 15:58:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Juanma Barranquero <lekktu <at> gmail.com>
To: Bug-Gnu-Emacs <bug-gnu-emacs <at> gnu.org>
Subject: ucs-insert deals inconsistently with errors
Date: Mon, 20 Feb 2012 16:53:56 +0100

Package: emacs
Severity: minor


`ucs-insert' does not deal very consistently with errors.

Two anomalies:

1)  M-x ucs-insert <RET> zzz <RET>   => "Not a Unicode character code: nil"
    Which is caused by `read-char-by-name' not having a way to pass
back what the user really typed. Still, I typed "zzz", not "nil", so
the message is unhelpful.

2) When called from lisp code, it deals differently with erroneous
strings and erroneous non-strings:
    (ucs-insert 'zzz)  =>  "Not a Unicode character code: zzz"   ;; correct
    (ucs-insert "zzz")  =>  any non-hex string is turned into ^@ and
inserted, and no error is produced.

The second problem can be trivially fixed with (not (string-match-p
"[^[:xdigit:]]" character)), though the docstring of `ucs-insert' does
not really say much about the valid forms the CHARACTER arg can take.

    Juanma

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#10857; Package emacs. (Tue, 21 Feb 2012 01:19:04 GMT) Full text and rfc822 format available.

Message #8 received at 10857 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> jurta.org>
To: Juanma Barranquero <lekktu <at> gmail.com>
Cc: 10857 <at> debbugs.gnu.org
Subject: Re: bug#10857: ucs-insert deals inconsistently with errors
Date: Tue, 21 Feb 2012 02:37:38 +0200

> 1)  M-x ucs-insert <RET> zzz <RET>   => "Not a Unicode character code: nil"
>     Which is caused by `read-char-by-name' not having a way to pass
> back what the user really typed. Still, I typed "zzz", not "nil", so
> the message is unhelpful.

Wouldn't it be too weird for `read-char-by-name' to return "zzz"
when the purpose of this function is to return a character,
not a string the user typed.

> 2) When called from lisp code, it deals differently with erroneous
> strings and erroneous non-strings:
>     (ucs-insert 'zzz)  =>  "Not a Unicode character code: zzz"   ;; correct
>     (ucs-insert "zzz")  =>  any non-hex string is turned into ^@ and
> inserted, and no error is produced.
>
> The second problem can be trivially fixed with
> (not (string-match-p "[^[:xdigit:]]" character)),

In `read-char-by-name', the condition for this purpose is:

  (string-match-p "^[0-9a-fA-F]+$" input)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#10857; Package emacs. (Tue, 21 Feb 2012 01:29:01 GMT) Full text and rfc822 format available.

Message #11 received at 10857 <at> debbugs.gnu.org (full text, mbox):

From: Juanma Barranquero <lekktu <at> gmail.com>
To: Juri Linkov <juri <at> jurta.org>
Cc: 10857 <at> debbugs.gnu.org
Subject: Re: bug#10857: ucs-insert deals inconsistently with errors
Date: Tue, 21 Feb 2012 02:25:47 +0100

On Tue, Feb 21, 2012 at 01:37, Juri Linkov <juri <at> jurta.org> wrote:

> Wouldn't it be too weird for `read-char-by-name' to return "zzz"
> when the purpose of this function is to return a character,
> not a string the user typed.

Yes. I don't think `read-char-by-name' should return "zzz", I think
`ucs-insert' should not say the "nil" part. Perhaps just "Not a
Unicode character".

>> The second problem can be trivially fixed with
>> (not (string-match-p "[^[:xdigit:]]" character)),
>
> In `read-char-by-name', the condition for this purpose is:
>
>  (string-match-p "^[0-9a-fA-F]+$" input)

They are equivalent, aren't they? But my point was that the docstring
does not say what to expect in CHARACTER.

    Juanma

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#10857; Package emacs. (Tue, 21 Feb 2012 09:19:01 GMT) Full text and rfc822 format available.

Message #14 received at 10857 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Juanma Barranquero <lekktu <at> gmail.com>
Cc: Juri Linkov <juri <at> jurta.org>, 10857 <at> debbugs.gnu.org
Subject: Re: bug#10857: ucs-insert deals inconsistently with errors
Date: Tue, 21 Feb 2012 10:16:02 +0100

Juanma Barranquero <lekktu <at> gmail.com> writes:

> On Tue, Feb 21, 2012 at 01:37, Juri Linkov <juri <at> jurta.org> wrote:
>
>>> The second problem can be trivially fixed with
>>> (not (string-match-p "[^[:xdigit:]]" character)),
>>
>> In `read-char-by-name', the condition for this purpose is:
>>
>>  (string-match-p "^[0-9a-fA-F]+$" input)
>
> They are equivalent, aren't they?

No.  The latter ignores anything before or after a newline character, as
long as there is a match on the other side of it.  That can be fixed by
using "\\`[0-9a-fA-F]+\\'".

Andreas.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#10857; Package emacs. (Tue, 21 Feb 2012 10:43:02 GMT) Full text and rfc822 format available.

Message #17 received at 10857 <at> debbugs.gnu.org (full text, mbox):

From: Juanma Barranquero <lekktu <at> gmail.com>
To: Andreas Schwab <schwab <at> linux-m68k.org>
Cc: Juri Linkov <juri <at> jurta.org>, 10857 <at> debbugs.gnu.org
Subject: Re: bug#10857: ucs-insert deals inconsistently with errors
Date: Tue, 21 Feb 2012 11:39:00 +0100

On Tue, Feb 21, 2012 at 10:16, Andreas Schwab <schwab <at> linux-m68k.org> wrote:

> No.  The latter ignores anything before or after a newline character, as
> long as there is a match on the other side of it.  That can be fixed by
> using "\\`[0-9a-fA-F]+\\'".

I didn't say "identical". That seems like a corner case.

    Juanma

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#10857; Package emacs. (Wed, 22 Feb 2012 00:14:02 GMT) Full text and rfc822 format available.

Message #20 received at 10857 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> jurta.org>
To: Juanma Barranquero <lekktu <at> gmail.com>
Cc: 10857 <at> debbugs.gnu.org
Subject: Re: bug#10857: ucs-insert deals inconsistently with errors
Date: Wed, 22 Feb 2012 02:09:20 +0200

tags 10857 patch
thanks

> Yes. I don't think `read-char-by-name' should return "zzz", I think
> `ucs-insert' should not say the "nil" part. Perhaps just "Not a
> Unicode character".
>
>> In `read-char-by-name', the condition for this purpose is:
>>
>>  (string-match-p "^[0-9a-fA-F]+$" input)
>
> They are equivalent, aren't they? But my point was that the docstring
> does not say what to expect in CHARACTER.

This should be fixed by this patch:

=== modified file 'lisp/international/mule-cmds.el'
--- lisp/international/mule-cmds.el	2012-02-10 19:35:28 +0000
+++ lisp/international/mule-cmds.el	2012-02-22 00:07:34 +0000
@@ -2949,7 +2949,7 @@ (defun read-char-by-name (prompt)
                        '(metadata (category . unicode-name))
                      (complete-with-action action (ucs-names) string pred))))))
     (cond
-     ((string-match-p "^[0-9a-fA-F]+$" input)
+     ((string-match-p "\\`[0-9a-fA-F]+\\'" input)
       (string-to-number input 16))
      ((string-match-p "^#" input)
       (read input))
@@ -2967,6 +2967,10 @@ (defun ucs-insert (character &optional c
 the characters whose names include that substring, not necessarily
 at the beginning of the name.
 
+This function also accepts a hexadecimal number of Unicode code
+point or a number in hash notation, e.g. #o21430 for octal,
+#x2318 for hex, or #10r8984 for decimal.
+
 The optional third arg INHERIT (non-nil when called interactively),
 says to inherit text properties from adjoining text, if those
 properties are sticky."
@@ -2975,9 +2979,12 @@ (defun ucs-insert (character &optional c
 	 (prefix-numeric-value current-prefix-arg)
 	 t))
   (unless count (setq count 1))
-  (if (stringp character)
+  (if (and (stringp character)
+	   (string-match-p "\\`[0-9a-fA-F]+\\'" character))
       (setq character (string-to-number character 16)))
   (cond
+   ((null character)
+    (error "Not a Unicode character"))
    ((not (integerp character))
     (error "Not a Unicode character code: %S" character))
    ((or (< character 0) (> character #x10FFFF))

Added tag(s) patch. Request was from Juri Linkov <juri <at> jurta.org> to control <at> debbugs.gnu.org. (Wed, 22 Feb 2012 00:14:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#10857; Package emacs. (Wed, 22 Feb 2012 09:07:01 GMT) Full text and rfc822 format available.

Message #25 received at 10857 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Juri Linkov <juri <at> jurta.org>
Cc: Juanma Barranquero <lekktu <at> gmail.com>, 10857 <at> debbugs.gnu.org
Subject: Re: bug#10857: ucs-insert deals inconsistently with errors
Date: Wed, 22 Feb 2012 10:03:42 +0100

Juri Linkov <juri <at> jurta.org> writes:

> This should be fixed by this patch:
>
> === modified file 'lisp/international/mule-cmds.el'
> --- lisp/international/mule-cmds.el	2012-02-10 19:35:28 +0000
> +++ lisp/international/mule-cmds.el	2012-02-22 00:07:34 +0000
> @@ -2949,7 +2949,7 @@ (defun read-char-by-name (prompt)
>                         '(metadata (category . unicode-name))
>                       (complete-with-action action (ucs-names) string pred))))))
>      (cond
> -     ((string-match-p "^[0-9a-fA-F]+$" input)
> +     ((string-match-p "\\`[0-9a-fA-F]+\\'" input)
>        (string-to-number input 16))
>       ((string-match-p "^#" input)

This should also use \`.

Andreas.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

Reply sent to Juri Linkov <juri <at> jurta.org>:
You have taken responsibility. (Wed, 22 Feb 2012 23:39:02 GMT) Full text and rfc822 format available.

Notification sent to Juanma Barranquero <lekktu <at> gmail.com>:
bug acknowledged by developer. (Wed, 22 Feb 2012 23:39:02 GMT) Full text and rfc822 format available.

Message #30 received at 10857-done <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> jurta.org>
To: Andreas Schwab <schwab <at> linux-m68k.org>
Cc: Juanma Barranquero <lekktu <at> gmail.com>, 10857-done <at> debbugs.gnu.org
Subject: Re: bug#10857: ucs-insert deals inconsistently with errors
Date: Thu, 23 Feb 2012 01:35:30 +0200

>>       ((string-match-p "^#" input)
>
> This should also use \`.

All right, installed.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 22 Mar 2012 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 12 years and 60 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #10857 ucs-insert deals inconsistently with errors

GNU bug report logs - #10857
ucs-insert deals inconsistently with errors