GNU bug report logs - #17373
24.3.50; match data is incorrect if there are too many groups

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: emacs; Severity: minor; Reported by: Nicolas Richard <theonewiththeevillook@HIDDEN>; Keywords: confirmed; dated Tue, 29 Apr 2014 19:20:02 UTC; Maintainer for emacs is bug-gnu-emacs@HIDDEN.
Added tag(s) confirmed. Request was from Noam Postavsky <npostavs@HIDDEN> to control <at> debbugs.gnu.org. Full text available.
Severity set to 'minor' from 'normal' Request was from Noam Postavsky <npostavs@HIDDEN> to control <at> debbugs.gnu.org. Full text available.
bug Marked as found in versions 25.0.94. Request was from Noam Postavsky <npostavs@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 17373 <at> debbugs.gnu.org:


Received: (at 17373) by debbugs.gnu.org; 10 Feb 2016 17:11:59 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Feb 10 12:11:59 2016
Received: from localhost ([127.0.0.1]:35215 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1aTYJ5-0004oZ-JU
	for submit <at> debbugs.gnu.org; Wed, 10 Feb 2016 12:11:59 -0500
Received: from msg.wmi.amu.edu.pl ([150.254.78.50]:40634)
 by debbugs.gnu.org with esmtp (Exim 4.84)
 (envelope-from <mbork@HIDDEN>) id 1aTYJ3-0004oQ-Vi
 for 17373 <at> debbugs.gnu.org; Wed, 10 Feb 2016 12:11:58 -0500
Received: from localhost (localhost [127.0.0.1])
 by msg.wmi.amu.edu.pl (Postfix) with ESMTP id 41BA87C964;
 Wed, 10 Feb 2016 18:11:56 +0100 (CET)
Received: from msg.wmi.amu.edu.pl ([127.0.0.1])
 by localhost (msg.wmi.amu.edu.pl [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id usnFlpKAyCE8; Wed, 10 Feb 2016 18:11:56 +0100 (CET)
Received: from localhost (unknown [109.232.24.28])
 by msg.wmi.amu.edu.pl (Postfix) with ESMTPSA id E1F847C940;
 Wed, 10 Feb 2016 18:11:55 +0100 (CET)
From: Marcin Borkowski <mbork@HIDDEN>
To: 17373 <at> debbugs.gnu.org
Subject: Re: bug#17373: 24.3.50;
 match data is incorrect if there are too many groups
References: <87ppk0hrkg.fsf@HIDDEN> <53799AF5.9090708@HIDDEN>
 <3dc9fa47-c3d8-40e2-b6e4-3f362a0c1b6e@default>
Date: Wed, 10 Feb 2016 18:11:54 +0100
In-Reply-To: <3dc9fa47-c3d8-40e2-b6e4-3f362a0c1b6e@default> (Drew Adams's
 message of "Mon, 19 May 2014 06:48:16 -0700 (PDT)")
Message-ID: <8737t0qylh.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Score: -0.2 (/)
X-Debbugs-Envelope-To: 17373
Cc: Paul Eggert <eggert@HIDDEN>, Drew Adams <drew.adams@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.2 (/)

On 2014-05-19, at 07:48, Drew Adams <drew.adams@HIDDEN> wrote:

>> Yes, unfortunately Emacs currently has a limit of at most 256 groups of
>> match data: one for the entire pattern, and 255 for parenthesized
>> subpatterns.  If you go over the limit, the excess matches are silently
>> discarded.  I don't see this limitation documented anywhere; it should
>> be.  Or better yet, the limitation should be removed.
>
> Good to know.  +1, to documenting it, at least.

I can write a patch to the manual, but I'm a bit afraid that if this
gets documented, the limit will stay there forever.  Is there a chance
of someone fluent in C to fix this?

(Incidentally, I have one package of mine where this limit could strike,
too.)

Best,

--
Marcin Borkowski




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#17373; Package emacs. Full text available.

Message received at 17373 <at> debbugs.gnu.org:


Received: (at 17373) by debbugs.gnu.org; 19 May 2014 13:48:37 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon May 19 09:48:37 2014
Received: from localhost ([127.0.0.1]:52968 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WmNvh-0002RJ-1G
	for submit <at> debbugs.gnu.org; Mon, 19 May 2014 09:48:37 -0400
Received: from userp1040.oracle.com ([156.151.31.81]:17978)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <drew.adams@HIDDEN>) id 1WmNve-0002R1-NZ
 for 17373 <at> debbugs.gnu.org; Mon, 19 May 2014 09:48:35 -0400
Received: from acsinet22.oracle.com (acsinet22.oracle.com [141.146.126.238])
 by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id
 s4JDmKc6031194
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
 Mon, 19 May 2014 13:48:21 GMT
Received: from userz7022.oracle.com (userz7022.oracle.com [156.151.31.86])
 by acsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s4JDmI14001371
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Mon, 19 May 2014 13:48:20 GMT
Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20])
 by userz7022.oracle.com (8.14.5+Sun/8.14.4) with ESMTP id s4JDmGIJ016865;
 Mon, 19 May 2014 13:48:17 GMT
MIME-Version: 1.0
Message-ID: <3dc9fa47-c3d8-40e2-b6e4-3f362a0c1b6e@default>
Date: Mon, 19 May 2014 06:48:16 -0700 (PDT)
From: Drew Adams <drew.adams@HIDDEN>
To: Paul Eggert <eggert@HIDDEN>, 17373 <at> debbugs.gnu.org
Subject: RE: bug#17373: 24.3.50; match data is incorrect if there are too many
 groups
References: <87ppk0hrkg.fsf@HIDDEN> <53799AF5.9090708@HIDDEN>
In-Reply-To: <53799AF5.9090708@HIDDEN>
X-Priority: 3
X-Mailer: Oracle Beehive Extensions for Outlook 2.0.1.8  (707110) [OL
 12.0.6691.5000 (x86)]
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Source-IP: acsinet22.oracle.com [141.146.126.238]
X-Spam-Score: -3.0 (---)
X-Debbugs-Envelope-To: 17373
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.0 (---)

> Yes, unfortunately Emacs currently has a limit of at most 256 groups of
> match data: one for the entire pattern, and 255 for parenthesized
> subpatterns.  If you go over the limit, the excess matches are silently
> discarded.  I don't see this limitation documented anywhere; it should
> be.  Or better yet, the limitation should be removed.

Good to know.  +1, to documenting it, at least.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#17373; Package emacs. Full text available.

Message received at 17373 <at> debbugs.gnu.org:


Received: (at 17373) by debbugs.gnu.org; 19 May 2014 05:47:51 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon May 19 01:47:51 2014
Received: from localhost ([127.0.0.1]:52820 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WmGQR-00034E-1Q
	for submit <at> debbugs.gnu.org; Mon, 19 May 2014 01:47:51 -0400
Received: from smtp.cs.ucla.edu ([131.179.128.62]:58369)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <eggert@HIDDEN>) id 1WmGQP-00033u-BV
 for 17373 <at> debbugs.gnu.org; Mon, 19 May 2014 01:47:50 -0400
Received: from localhost (localhost.localdomain [127.0.0.1])
 by smtp.cs.ucla.edu (Postfix) with ESMTP id 6C5CC39E807B
 for <17373 <at> debbugs.gnu.org>; Sun, 18 May 2014 22:47:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu
Received: from smtp.cs.ucla.edu ([127.0.0.1])
 by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id V4Kr0ESndZvq for <17373 <at> debbugs.gnu.org>;
 Sun, 18 May 2014 22:47:33 -0700 (PDT)
Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net
 [108.0.233.62])
 by smtp.cs.ucla.edu (Postfix) with ESMTPSA id ABB7139E801D
 for <17373 <at> debbugs.gnu.org>; Sun, 18 May 2014 22:47:33 -0700 (PDT)
Message-ID: <53799AF5.9090708@HIDDEN>
Date: Sun, 18 May 2014 22:47:33 -0700
From: Paul Eggert <eggert@HIDDEN>
Organization: UCLA Computer Science Department
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:24.0) Gecko/20100101 Thunderbird/24.5.0
MIME-Version: 1.0
To: 17373 <at> debbugs.gnu.org
Subject: Re: 24.3.50; match data is incorrect if there are too many groups
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: -3.0 (---)
X-Debbugs-Envelope-To: 17373
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.0 (---)

Yes, unfortunately Emacs currently has a limit of at most 256 groups of 
match data: one for the entire pattern, and 255 for parenthesized 
subpatterns.  If you go over the limit, the excess matches are silently 
discarded.  I don't see this limitation documented anywhere; it should 
be.  Or better yet, the limitation should be removed.

The limitation is wired into the representation of the 'start_memory' 
code in compiled regular expressions: this code has a one-byte operand. 
  As far as I know, the limitation is specific to Emacs, and is not 
present in the Gnulib or glibc versions of the regexp matcher.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#17373; Package emacs. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 29 Apr 2014 19:19:40 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Apr 29 15:19:40 2014
Received: from localhost ([127.0.0.1]:45404 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WfDZ6-0007t3-5r
	for submit <at> debbugs.gnu.org; Tue, 29 Apr 2014 15:19:40 -0400
Received: from eggs.gnu.org ([208.118.235.92]:47347)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <theonewiththeevillook@HIDDEN>) id 1WfDZ4-0007sr-0p
 for submit <at> debbugs.gnu.org; Tue, 29 Apr 2014 15:19:38 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <theonewiththeevillook@HIDDEN>) id 1WfDYr-0004NW-By
 for submit <at> debbugs.gnu.org; Tue, 29 Apr 2014 15:19:32 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM
 autolearn=disabled version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:52629)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <theonewiththeevillook@HIDDEN>) id 1WfDYr-0004NS-9s
 for submit <at> debbugs.gnu.org; Tue, 29 Apr 2014 15:19:25 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:48992)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <theonewiththeevillook@HIDDEN>) id 1WfDYl-0006RG-1y
 for bug-gnu-emacs@HIDDEN; Tue, 29 Apr 2014 15:19:25 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <theonewiththeevillook@HIDDEN>) id 1WfDYe-0004Jq-PP
 for bug-gnu-emacs@HIDDEN; Tue, 29 Apr 2014 15:19:18 -0400
Received: from mailrelay006.isp.belgacom.be ([195.238.6.172]:55562)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <theonewiththeevillook@HIDDEN>) id 1WfDYe-0004Jj-GB
 for bug-gnu-emacs@HIDDEN; Tue, 29 Apr 2014 15:19:12 -0400
X-Belgacom-Dynamic: yes
Received: from 41.233-178-91.adsl-dyn.isp.belgacom.be (HELO LDLC-portable)
 ([91.178.233.41])
 by relay.skynet.be with ESMTP; 29 Apr 2014 21:19:10 +0200
From: Nicolas Richard <theonewiththeevillook@HIDDEN>
To: bug-gnu-emacs@HIDDEN
Subject: 24.3.50; match data is incorrect if there are too many groups
Date: Tue, 29 Apr 2014 21:19:11 +0200
Message-ID: <87ppk0hrkg.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
 recognized.
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
 (bad octet value).
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -5.0 (-----)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -5.0 (-----)

Hi,

The following reports 2. Replace 255 by 254, and it'll report 512 as expected
#+BEGIN_SRC emacs-lisp
  (with-temp-buffer
    (insert "bar")
    (when
        (re-search-backward
         (concat
          (mapconcat (lambda (x) (format "\\(%s\\)" x)) (make-list 255 "foo") "\\|")
          "\\|"
          "\\(bar\\)")
         nil t)
      (length (match-data))))
#+END_SRC

Regexps with many groups is the kind of thing is used in AUCTeX, in
TeX-auto-parse-region. What auctex does in that function is construct a
big regexp out of a list of smaller ones (each small one is made into a
group) ; then when the big regexp matches it then tries to find out
which of the smaller regexps actually matched by checking which group is
non-nil.

In GNU Emacs 24.3.50.7 (i686-pc-linux-gnu, GTK+ Version 2.24.20)
 of 2014-04-10 on LDLC-portable
Windowing system distributor `The X.Org Foundation', version 11.0.11405000
System Description:	Ubuntu 13.10

Configured using:
 `configure 'CFLAGS=-g3 -O2''

Important settings:
  value of $LANG: fr_BE.UTF-8
  locale-coding-system: utf-8-unix

-- 
Nico.




Acknowledgement sent to Nicolas Richard <theonewiththeevillook@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs@HIDDEN. Full text available.
Report forwarded to bug-gnu-emacs@HIDDEN:
bug#17373; Package emacs. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Mon, 25 Nov 2019 12:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.