GNU bug report logs - #36251
Regex library doesn't recognize ']' in a character class

Previous Next

Package: guile;

Reported by: Abdulrahman Semrie <hsamireh <at> gmail.com>

Date: Sun, 16 Jun 2019 18:32:01 UTC

Severity: normal

Tags: notabug

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 36251 in the body.
You can then email your comments to 36251 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guile <at> gnu.org:
bug#36251; Package guile. (Sun, 16 Jun 2019 18:32:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Abdulrahman Semrie <hsamireh <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-guile <at> gnu.org. (Sun, 16 Jun 2019 18:32:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Abdulrahman Semrie <hsamireh <at> gmail.com>
To: bug-guile <at> gnu.org
Subject: Regex library doesn't recognize ']' in a character class
Date: Sun, 16 Jun 2019 20:16:29 +0300
[Message part 1 (text/plain, inline)]
I am using the pattern [\\[\\]a-zA-Z]+ to match a string with left or right bracket in it. However, the string-match function doesn’t match the ‘]’ character. To demonstrate with an example, try the following funciton:

(string-match "[\\[\\]a-zA-Z]+" "Text[ab]”)

The result for the above function should have been a match structure with Text[ab] matched. However, the string-match returns #f which is incorrect. To test if the pattern I am using was right, I tried on regex101.com and it works. Here (https://regex101.com/r/VAl6aI/1) is the link that demonstrates that it works.

Hence, the above leads me to believe there is a bug in the regex library that mishandles ] character in character-classes

—

Regards,

Abdulrahman Semrie

[Message part 2 (text/html, inline)]

Information forwarded to bug-guile <at> gnu.org:
bug#36251; Package guile. (Sun, 16 Jun 2019 19:41:01 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: <tomas <at> tuxteam.de>
To: bug-guile <at> gnu.org
Subject: Re: bug#36251: Regex library doesn't recognize ']' in a character
 class
Date: Sun, 16 Jun 2019 21:40:08 +0200
[Message part 1 (text/plain, inline)]
On Sun, Jun 16, 2019 at 08:16:29PM +0300, Abdulrahman Semrie wrote:
> 
> I am using the pattern [\\[\\]a-zA-Z]+ to match a string with left or right bracket in it. However, the string-match function doesn’t match the ‘]’ character. To demonstrate with an example, try the following funciton:
> 
> (string-match "[\\[\\]a-zA-Z]+" "Text[ab]”)
> 
> The result for the above function should have been a match structure with Text[ab] matched. However, the string-match returns #f which is incorrect. To test if the pattern I am using was right, I tried on regex101.com and it works. Here (https://regex101.com/r/VAl6aI/1) is the link that demonstrates that it works.
> 
> Hence, the above leads me to believe there is a bug in the regex library that mishandles ] character in character-classes

If I understood you correctly, you are using POSIX regular
expressions. Within a bracket expression ([...]), you can't
escape ']' with a backslash. Just put the ] as first character,
like so:

  [][a-zA-Z]

Quoting the man page (regex(7)):

   A bracket expression is a list of characters enclosed in "[]".
   It normally matches any single character from the list (but see
   below).  If the list begins  with  '^', it  matches  any  single
   character  (but see below) not from the rest of the list. [...]

   To  include  a  literal ']' in the list, make it the first
   character (following a possible '^').  To include a literal
   '-', make it the first or last character, or the second endpoint
   of a range [...]

See also [1], but the man page is more complete.

(I'm assuming your Guile is linked against some POSIX regex library).

Cheers
-- t
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guile <at> gnu.org:
bug#36251; Package guile. (Tue, 18 Jun 2019 11:11:02 GMT) Full text and rfc822 format available.

Message #11 received at 36251 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: Abdulrahman Semrie <hsamireh <at> gmail.com>
Cc: 36251 <at> debbugs.gnu.org
Subject: Re: bug#36251: Regex library doesn't recognize ']' in a character
 class
Date: Tue, 18 Jun 2019 07:08:06 -0400
Hi,

Abdulrahman Semrie <hsamireh <at> gmail.com> writes:

> I am using the pattern [\\[\\]a-zA-Z]+ to match a string with left or
> right bracket in it. However, the string-match function doesn’t match
> the ‘]’ character. To demonstrate with an example, try the following
> funciton:
>
> (string-match "[\\[\\]a-zA-Z]+" "Text[ab]”)
>
> The result for the above function should have been a match structure
> with Text[ab] matched. However, the string-match returns #f which is
> incorrect. To test if the pattern I am using was right, I tried on
> regex101.com and it works. Here (https://regex101.com/r/VAl6aI/1) is
> the link that demonstrates that it works.

It turns out that there are several flavors of regular expressions in
common use, with different features and syntax.  The link you provided
is using PCRE (PHP) regular expressions (see the "flavor" pane on the
left), and there are three other supported flavors on that web site.

Guile's (ice-9 regex) module provides a simpler flavor of regexps known
as "POSIX extended regular expressions", implemented as a thin wrapper
around your system's POSIX regular expression library ('regcomp' and
'regexec').  The web site you referenced does not appear to support
POSIX extended regular expressions, but here are some links about them:

  https://en.wikibooks.org/wiki/Regular_Expressions/POSIX-Extended_Regular_Expressions
  https://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_04

One of the notable differences is that in POSIX extended regular
expressions, character classes do not support backslash escapes, but
instead use a more ad-hoc approach as <tomas <at> tuxteam.de> described.

     Regards,
       Mark




Information forwarded to bug-guile <at> gnu.org:
bug#36251; Package guile. (Tue, 18 Jun 2019 11:21:01 GMT) Full text and rfc822 format available.

Message #14 received at 36251 <at> debbugs.gnu.org (full text, mbox):

From: <tomas <at> tuxteam.de>
To: Mark H Weaver <mhw <at> netris.org>
Cc: Abdulrahman Semrie <hsamireh <at> gmail.com>, 36251 <at> debbugs.gnu.org
Subject: Re: bug#36251: Regex library doesn't recognize ']' in a character
 class
Date: Tue, 18 Jun 2019 13:20:07 +0200
[Message part 1 (text/plain, inline)]
On Tue, Jun 18, 2019 at 07:08:06AM -0400, Mark H Weaver wrote:
> Hi,
> 
> Abdulrahman Semrie <hsamireh <at> gmail.com> writes:
> 
> > I am using the pattern [\\[\\]a-zA-Z]+ to match a string with left or
> > right bracket in it [...]

> It turns out that there are several flavors of regular expressions in
> common use, with different features and syntax.  The link you provided
> is using PCRE (PHP) regular expressions (see the "flavor" pane on the
> left), and there are three other supported flavors on that web site.
> 
> Guile's (ice-9 regex) module provides a simpler flavor of regexps known
> as "POSIX extended regular expressions" [...]

D'oh! I forgot about Perl compatible regexps. In those, you /can/ escape
things with a backslash whithin [...]. This would have explained Abdulrhaman's
confusion better.

Thanks, Mark
-- t
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guile <at> gnu.org:
bug#36251; Package guile. (Fri, 28 Jun 2019 11:22:02 GMT) Full text and rfc822 format available.

Message #17 received at 36251 <at> debbugs.gnu.org (full text, mbox):

From: David Pirotte <david <at> altosw.be>
To: Mark H Weaver <mhw <at> netris.org>
Cc: Abdulrahman Semrie <hsamireh <at> gmail.com>, 36251 <at> debbugs.gnu.org
Subject: Re: bug#36251: Regex library doesn't recognize ']' in a character
 class
Date: Fri, 28 Jun 2019 08:21:08 -0300
[Message part 1 (text/plain, inline)]
Hello,

> ...
> It turns out that there are several flavors of regular expressions in
> common use, with different features and syntax.  The link you provided
> is using PCRE (PHP) regular expressions (see the "flavor" pane on the
> left), and there are three other supported flavors on that web site.
> ...

Fwiw, I just came across a pcre binding for guile(*), here:

	https://github.com/NalaGinrut/guile-pcre-ffi

I didn't try it and I have no idea about the general quality and robustness of the
binding, last updated 4y ago it seems, but the code is really small, uses the ffi,
so it should be quite easy to patch if necessary and may be fun to 'resurrect' ...

David

(*)	I found it while looking for something else, here:

		http://sph.mn/foreign/guile-software.html
[Message part 2 (application/pgp-signature, inline)]

Added tag(s) notabug. Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Sun, 30 Jun 2019 19:40:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 36251 <at> debbugs.gnu.org and Abdulrahman Semrie <hsamireh <at> gmail.com> Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Sun, 30 Jun 2019 19:40:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 29 Jul 2019 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 263 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.