GNU bug report logs - #65496
30.0.50; Issue with the regexp used to auto-detect PBM image data

Previous Next

Package: emacs;

Reported by: David Ponce <da_vid <at> orange.fr>

Date: Thu, 24 Aug 2023 10:56:02 UTC

Severity: normal

Tags: patch

Found in version 30.0.50

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 65496 in the body.
You can then email your comments to 65496 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#65496; Package emacs. (Thu, 24 Aug 2023 10:56:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to David Ponce <da_vid <at> orange.fr>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 24 Aug 2023 10:56:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: David Ponce <da_vid <at> orange.fr>
To: bug-gnu-emacs <at> gnu.org
Subject: 30.0.50; Issue with the regexp used to auto-detect PBM image data
Date: Thu, 24 Aug 2023 12:55:03 +0200
[Message part 1 (text/plain, inline)]
Hello,

While experimenting with code to create image from data, I encountered
an issue with the regexp in `image-type-header-regexps' used to
auto-detect PBM image type from the first bytes of image data. That is:

"\\`P[1-6]\\(?:\
\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[[:space:]]\\)+\
\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[0-9]\\)+\
\\)\\{2\\}"

Here is a simple recipe to illustrate the issue:

In *scratch* buffer eval:
-------------------------
;; Get content of a pbm file.
(setq test-data
      (with-current-buffer
          (find-file-noselect "[YourEmacsPath]/etc/images/splash.pbm")
        (prog1 (buffer-substring-no-properties (point-min) (point-max))
          (kill-buffer (current-buffer)))))

;; Check string data fail for pbm image-type!
(image-type-from-data test-data)
>>> nil
;; With a temp buffer current, the same test works!
(with-temp-buffer
 (image-type-from-data test-data))
>>> pbm
-------------------------

After further digging, I found that the problem might be due to the use
of the [:space:] character class whose meaning, according to the manual,
depends on the syntax of whitespace characters setup in current buffer.
So, using discrete values in place of syntax class seems to solve the
issue:

(setcar (nth 1 image-type-header-regexps)
        "\\`P[1-6]\\(?:\
\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[ \t\r\n]\\)+\
\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[0-9]\\)+\
\\)\\{2\\}")

(image-type-from-data test-data)
>>> pbm

I attached a patch proposal.
Hope it will help.
Regards


In GNU Emacs 30.0.50 (build 3, x86_64-pc-linux-gnu, GTK+ Version
 3.24.38, cairo version 1.17.8) of 2023-08-23
Repository revision: 26ca3e84e167f975afb4e9e9a838935bfe4a19a7
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12014000
System Description: Fedora Linux 38 (KDE Plasma)

Configured using:
 'configure --with-x-toolkit=gtk3
 --with-native-compilation=no
 PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:/usr/lib/pkgconfig'

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2 M17N_FLT MODULES NOTIFY
INOTIFY PDUMPER PNG RSVG SECCOMP SOUND SQLITE3 THREADS TIFF
TOOLKIT_SCROLL_BARS TREE_SITTER WEBP X11 XDBE XIM XINPUT2 XPM GTK3 ZLIB

Important settings:
  value of $LC_TIME: fr_FR.utf8
  value of $LANG: fr_FR.UTF-8
  locale-coding-system: utf-8-unix
[image-type-header-regexps-patch-V0.patch (text/x-patch, attachment)]

Added tag(s) patch. Request was from Stefan Kangas <stefankangas <at> gmail.com> to control <at> debbugs.gnu.org. (Thu, 24 Aug 2023 20:46:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#65496; Package emacs. (Mon, 04 Sep 2023 16:33:01 GMT) Full text and rfc822 format available.

Message #10 received at 65496 <at> debbugs.gnu.org (full text, mbox):

From: David Ponce <da_vid <at> orange.fr>
To: 65496 <at> debbugs.gnu.org
Subject: Re: 30.0.50; Issue with the regexp used to auto-detect PBM image data
Date: Mon, 4 Sep 2023 18:32:22 +0200
On 24/08/2023 12:55, David Ponce wrote:
> Hello,
> 
> While experimenting with code to create image from data, I encountered
> an issue with the regexp in `image-type-header-regexps' used to
> auto-detect PBM image type from the first bytes of image data. That is:
> 
> "\\`P[1-6]\\(?:\
> \\(?:\\(?:#[^\r\n]*[\r\n]\\)*[[:space:]]\\)+\
> \\(?:\\(?:#[^\r\n]*[\r\n]\\)*[0-9]\\)+\
> \\)\\{2\\}"
> 
> Here is a simple recipe to illustrate the issue:
> 
> In *scratch* buffer eval:
> -------------------------
> ;; Get content of a pbm file.
> (setq test-data
>        (with-current-buffer
>            (find-file-noselect "[YourEmacsPath]/etc/images/splash.pbm")
>          (prog1 (buffer-substring-no-properties (point-min) (point-max))
>            (kill-buffer (current-buffer)))))
> 
> ;; Check string data fail for pbm image-type!
> (image-type-from-data test-data)
>>>> nil
> ;; With a temp buffer current, the same test works!
> (with-temp-buffer
>   (image-type-from-data test-data))
>>>> pbm
> -------------------------
> 
> After further digging, I found that the problem might be due to the use
> of the [:space:] character class whose meaning, according to the manual,
> depends on the syntax of whitespace characters setup in current buffer.
> So, using discrete values in place of syntax class seems to solve the
> issue:
> 
> (setcar (nth 1 image-type-header-regexps)
>          "\\`P[1-6]\\(?:\
> \\(?:\\(?:#[^\r\n]*[\r\n]\\)*[ \t\r\n]\\)+\
> \\(?:\\(?:#[^\r\n]*[\r\n]\\)*[0-9]\\)+\
> \\)\\{2\\}")
> 
> (image-type-from-data test-data)
>>>> pbm
> 
> I attached a patch proposal.
> Hope it will help.
> Regards

Some additions.

Basic string matching recipe:

In *scratch* buffer eval:
-------------------------

(let ((re "\\`P[1-6]\\(?:\
\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[[:space:]]\\)+\
\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[0-9]\\)+\
\\)\\{2\\}")
      (text "P4
333 233"))
  (string-match-p re text))
>>> nil

(with-syntax-table (standard-syntax-table)
  (let ((re "\\`P[1-6]\\(?:\
\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[[:space:]]\\)+\
\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[0-9]\\)+\
\\)\\{2\\}")
        (text "P4
333 233"))
    (string-match-p re text)))
>>> 0

I wonder if it is expected that matching a regular expression against a string
object depends on the syntax-table setup in current buffer?
Shouldn't (standard-syntax-table) implied when matching a regexp against a string
object, that is, regardless of any buffer context?

Regards




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#65496; Package emacs. (Mon, 04 Sep 2023 17:37:02 GMT) Full text and rfc822 format available.

Message #13 received at 65496 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: David Ponce <da_vid <at> orange.fr>
Cc: 65496 <at> debbugs.gnu.org
Subject: Re: bug#65496: 30.0.50;
 Issue with the regexp used to auto-detect PBM image data
Date: Mon, 04 Sep 2023 20:36:06 +0300
> Date: Mon, 4 Sep 2023 18:32:22 +0200
> From: David Ponce <da_vid <at> orange.fr>
> 
> I wonder if it is expected that matching a regular expression
> against a string object depends on the syntax-table setup in current
> buffer?  Shouldn't (standard-syntax-table) implied when matching a
> regexp against a string object, that is, regardless of any buffer
> context?

Not necessarily, because you wouldn't expect, say, looking-at to
return a different result than (string-match-p (buffer-string)), would
you?

This belongs to the gray areas of Emacs.  The same situation exists
with functions like downcase, which use the buffer-local value of
case-table.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#65496; Package emacs. (Tue, 05 Sep 2023 11:09:01 GMT) Full text and rfc822 format available.

Message #16 received at 65496 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: David Ponce <da_vid <at> orange.fr>
Cc: 65496 <at> debbugs.gnu.org
Subject: Re: bug#65496: 30.0.50; Issue with the regexp used to auto-detect PBM
 image data
Date: Tue, 05 Sep 2023 14:08:15 +0300
[I presume you didn't intend to discuss this only with me in private.]

> Date: Mon, 4 Sep 2023 23:43:56 +0200
> From: David Ponce <da_vid <at> orange.fr>
> 
> On 04/09/2023 19:36, Eli Zaretskii wrote:
> >> Date: Mon, 4 Sep 2023 18:32:22 +0200
> >> From: David Ponce <da_vid <at> orange.fr>
> >>
> >> I wonder if it is expected that matching a regular expression
> >> against a string object depends on the syntax-table setup in current
> >> buffer?  Shouldn't (standard-syntax-table) implied when matching a
> >> regexp against a string object, that is, regardless of any buffer
> >> context?
> > 
> > Not necessarily, because you wouldn't expect, say, looking-at to
> > return a different result than (string-match-p (buffer-string)), would
> > you?
> 
> Sure, from this perspective you are right.  However, for other cases
> where the string object is not related to a buffer value, it's not so
> clear ;-)
> 
> > This belongs to the gray areas of Emacs.  The same situation exists
> > with functions like downcase, which use the buffer-local value of
> > case-table.
> 
> I can understand that.  Many things are not only black or white ;-)
> 
> Maybe for the use case of auto-detecting image type from image data,
> my proposed patch to replace character class by a list of unambiguous
> explicit character values in the regexp could make sense?

Yes, it makes sense, but are you sure you mention there all the
characters that can happen in PBM images, and only those characters?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#65496; Package emacs. (Wed, 06 Sep 2023 14:06:02 GMT) Full text and rfc822 format available.

Message #19 received at 65496 <at> debbugs.gnu.org (full text, mbox):

From: David Ponce <da_vid <at> orange.fr>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 65496 <at> debbugs.gnu.org
Subject: Re: bug#65496: 30.0.50; Issue with the regexp used to auto-detect PBM
 image data
Date: Wed, 6 Sep 2023 16:05:39 +0200
On 05/09/2023 13:08, Eli Zaretskii wrote:
> [I presume you didn't intend to discuss this only with me in private.]

Hi Eli,

You are right, my mistake, I did reply instead of reply to all :-\
I am sorry.

> 
>> Date: Mon, 4 Sep 2023 23:43:56 +0200
>> From: David Ponce <da_vid <at> orange.fr>
>>
>> On 04/09/2023 19:36, Eli Zaretskii wrote:
>>>> Date: Mon, 4 Sep 2023 18:32:22 +0200
>>>> From: David Ponce <da_vid <at> orange.fr>
>>>>
>>>> I wonder if it is expected that matching a regular expression
>>>> against a string object depends on the syntax-table setup in current
>>>> buffer?  Shouldn't (standard-syntax-table) implied when matching a
>>>> regexp against a string object, that is, regardless of any buffer
>>>> context?
>>>
>>> Not necessarily, because you wouldn't expect, say, looking-at to
>>> return a different result than (string-match-p (buffer-string)), would
>>> you?
>>
>> Sure, from this perspective you are right.  However, for other cases
>> where the string object is not related to a buffer value, it's not so
>> clear ;-)
>>
>>> This belongs to the gray areas of Emacs.  The same situation exists
>>> with functions like downcase, which use the buffer-local value of
>>> case-table.
>>
>> I can understand that.  Many things are not only black or white ;-)
>>
>> Maybe for the use case of auto-detecting image type from image data,
>> my proposed patch to replace character class by a list of unambiguous
>> explicit character values in the regexp could make sense?
> 
> Yes, it makes sense, but are you sure you mention there all the
> characters that can happen in PBM images, and only those characters?

Yes, according to the specification of pbm available at
<https://netpbm.sourceforge.net/doc/pbm.html>:

  "Each PBM image consists of the following:

    * A "magic number" for identifying the file type.
      A pbm image's magic number is the two characters "P4".

==> * Whitespace (blanks, TABs, CRs, LFs). <==

    * The width in pixels of the image, formatted as ASCII characters in decimal.

    ..."

Thanks






Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Wed, 06 Sep 2023 16:02:02 GMT) Full text and rfc822 format available.

Notification sent to David Ponce <da_vid <at> orange.fr>:
bug acknowledged by developer. (Wed, 06 Sep 2023 16:02:02 GMT) Full text and rfc822 format available.

Message #24 received at 65496-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: David Ponce <da_vid <at> orange.fr>
Cc: 65496-done <at> debbugs.gnu.org
Subject: Re: bug#65496: 30.0.50; Issue with the regexp used to auto-detect PBM
 image data
Date: Wed, 06 Sep 2023 19:00:28 +0300
> Date: Wed, 6 Sep 2023 16:05:39 +0200
> Cc: 65496 <at> debbugs.gnu.org
> From: David Ponce <da_vid <at> orange.fr>
> 
> >> Maybe for the use case of auto-detecting image type from image data,
> >> my proposed patch to replace character class by a list of unambiguous
> >> explicit character values in the regexp could make sense?
> > 
> > Yes, it makes sense, but are you sure you mention there all the
> > characters that can happen in PBM images, and only those characters?
> 
> Yes, according to the specification of pbm available at
> <https://netpbm.sourceforge.net/doc/pbm.html>:
> 
>    "Each PBM image consists of the following:
> 
>      * A "magic number" for identifying the file type.
>        A pbm image's magic number is the two characters "P4".
> 
> ==> * Whitespace (blanks, TABs, CRs, LFs). <==
> 
>      * The width in pixels of the image, formatted as ASCII characters in decimal.
> 
>      ..."

Thanks, I've now installed your patch on the emacs-29 branch, and I'm
closing this bug.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#65496; Package emacs. (Wed, 06 Sep 2023 16:20:01 GMT) Full text and rfc822 format available.

Message #27 received at 65496-done <at> debbugs.gnu.org (full text, mbox):

From: David Ponce <da_vid <at> orange.fr>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 65496-done <at> debbugs.gnu.org
Subject: Re: bug#65496: 30.0.50; Issue with the regexp used to auto-detect PBM
 image data
Date: Wed, 6 Sep 2023 18:19:33 +0200
On 06/09/2023 18:00, Eli Zaretskii wrote:

[...]
> 
> Thanks, I've now installed your patch on the emacs-29 branch, and I'm
> closing this bug.

Great! Thank you very much!




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 05 Oct 2023 11:24:14 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 256 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.