24.4; Stack overflow in regexp matcher
Reported by: lee <at>
Date: Sun, 31 May 2015 17:53:02 UTC
Severity: minor
Tags: wontfix
Found in version 24.4
Done: Lars Ingebrigtsen <larsi <at>>
Message #5 received at submit <at> (full text, mbox):
using projectile, trying to find a tag with C-p j
The TAGS file is 1.8GB.
In GNU Emacs 24.4.1 (x86_64-pc-linux-gnu, X toolkit)
of 2015-03-28 on heimdali
Windowing system distributor `The X.Org Foundation', version 11.0.11604000
Configured using:
`configure --prefix=/usr --build=x86_64-pc-linux-gnu
--host=x86_64-pc-linux-gnu --mandir=/usr/share/man
--infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc
--localstatedir=/var/lib --disable-dependency-tracking
--disable-silent-rules --libdir=/usr/lib64 --program-suffix=-emacs-24
--infodir=/usr/share/info/emacs-24 --localstatedir=/var
--with-gameuser=:gamestat --without-compress-install
--with-file-notification=inotify --enable-acl --without-dbus
--with-gnutls --without-gpm --without-hesiod --without-kerberos
--without-kerberos5 --with-xml2 --without-selinux --with-wide-int
--with-zlib --with-sound=alsa --with-x --without-ns --without-gconf
--without-gsettings --without-toolkit-scroll-bars --with-gif
--with-jpeg --with-png --with-rsvg --with-tiff --with-xpm
--with-imagemagick --with-xft --without-libotf --without-m17n-flt
--with-x-toolkit=lucid --with-xaw3d
GENTOO_PACKAGE=app-editors/emacs-24.4-r4 'CFLAGS=-O2 -pipe
-march=native' CPPFLAGS= 'LDFLAGS=-Wl,-O1 -Wl,--as-needed''
Important settings:
value of $LANG: en_GB.utf8
locale-coding-system: utf-8-unix
Major mode: Debugger
Minor modes in effect:
show-paren-mode: t
desktop-save-mode: t
projectile-global-mode: t
projectile-mode: t
global-auto-complete-mode: t
yas-global-mode: t
yas-minor-mode: t
tooltip-mode: t
electric-indent-mode: t
mouse-wheel-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
buffer-read-only: t
size-indication-mode: t
column-number-mode: t
line-number-mode: t
transient-mark-mode: t
Message #8 received at 20703 <at> (full text, mbox):
On 05/31/2015 07:46 PM, lee <at> wrote:
> using projectile, trying to find a tag with C-p j
> The TAGS file is 1.8GB.
What if you try `M-x find-tag'?
If it exhibits the same problem, try splitting the file in half, before
one of the page break characters (they look like ^L in Emacs) preceding
a file name. That will make both parts a valid tags file.
Try `M-x visit-tags-table' on each (say no when Emacs suggests to keep
the old one), and see if both halves have the same problem.
Message #11 received at 20703 <at> (full text, mbox):
> The TAGS file is 1.8GB.
Could you give us an idea of how you get such a large TAGS file?
Emacs's Lisp directory weights in at around 60MB, and its TAGS file is
about 3MB, so assuming a similar ratio, your 1.8GB file seems to imply
that the indexed code of your project is more than 30GB in size, which
seems rather unusual.
Do you also index files which are not human-written, maybe?
Message #14 received at 20703 <at> (full text, mbox):
Please keep the bug address in Cc.
On 06/01/2015 09:03 PM, lee wrote:
>> What if you try `M-x find-tag'?
> That works.
What if you type `M-x find-tag TAB' (to ask Emacs for all available tags)?
> I cut off roughly the bottom half of the TAGS file and tried again. With
> that, I'm getting the error when progress is at 85% instead of
> 42%. Cutting off the bottom half again, leaving about 1/4 of the
> original file, does not yield an error and says no matching tags were
> found.
The idea was to split the file in half, and do a sort of binary search.
E.g., try cutting off the top half in the first step now.
> So I guess the problem might have to do with the size of the TAGS file
> ...
Not necessarily. The TAGS file is parsed sequentially, without recursion
in the Lisp code.
In all likelihood, there is a problematic line around 42% of the
original TAGS, and the error goes away when that line is not in the file
We need to know that line to fix the bug.
Message #17 received at 20703 <at> (full text, mbox):
Stefan Monnier <monnier <at>> writes:
>> The TAGS file is 1.8GB.
> Could you give us an idea of how you get such a large TAGS file?
I'm not doing anything special I'd be aware of, only C-p R to (re)build
the TAGS file.
> Emacs's Lisp directory weights in at around 60MB, and its TAGS file is
> about 3MB, so assuming a similar ratio, your 1.8GB file seems to imply
> that the indexed code of your project is more than 30GB in size, which
> seems rather unusual.
It isn't --- the code is available here:
Projectile recognises the repo as a project and lets me build the TAGS
> Do you also index files which are not human-written, maybe?
The compiled version resides in a subdirectory, so it's possible. I
don't know which files are considered by default when creating the TAGS
file with C-p R. I only just started trying out projectile; that
compilation results are included would be a bit unexpected.
Message #20 received at 20703 <at> (full text, mbox):
Dmitry Gutov <dgutov <at>> writes:
> Please keep the bug address in Cc.
> On 06/01/2015 09:03 PM, lee wrote:
>>> What if you try `M-x find-tag'?
>> That works.
> What if you type `M-x find-tag TAB' (to ask Emacs for all available tags)?
Processing goes to 42% before the debugger comes up:
Debugger entered--Lisp error: (error "Stack overflow in regexp matcher")
re-search-forward("^\\(\\([^]+[^-a-zA-Z0-9_+*$:]+\\)?\\([-a-zA-Z0-9_+*$?:]+\\)[^-a-zA-Z0-9_+*$?:]*\\)\\(\\([^\n.]+\\).\\)?\\([0-9]+\\)?,\\([0-9]+\\)?\n" nil t)
#[0 "\303\211C\304\305\"\210\212\306.\242\205. \307!\2038. \262.\211\242\2031.\310\311\312\313\314\315.!\316\"\317\320%.\"\210\202 .\211.\240\210\202 .)\304\321\"\210\211\242\211.\207" [buffer-file-name tags-completion-table-function tags-completion-table nil message "Making tags completion table for %s..." visit-tags-table-buffer t mapatoms make-byte-code 257 "\301\302.!\300\242\"\207" vconcat vector [intern symbol-name] 4 "\n\n(fn SYM)" "Making tags completion table for %s...done"] 9 "\n\n(fn)"]()
funcall(#[0 "\303\211C\304\305\"\210\212\306.\242\205. \307!\2038. \262.\211\242\2031.\310\311\312\313\314\315.!\316\"\317\320%.\"\210\202 .\211.\240\210\202 .)\304\321\"\210\211\242\211.\207" [buffer-file-name tags-completion-table-function tags-completion-table nil message "Making tags completion table for %s..." visit-tags-table-buffer t mapatoms make-byte-code 257 "\301\302.!\300\242\"\207" vconcat vector [intern symbol-name] 4 "\n\n(fn SYM)" "Making tags completion table for %s...done"] 9 "\n\n(fn)"])
#[771 "r\300q\210\212\302.\303 \210)\304.\305 .$*\207" [#<buffer llagent.cpp> enable-recursive-minibuffers t visit-tags-table-buffer complete-with-action tags-completion-table] 8 "\n\n(fn STRING PRED ACTION)"]("" nil metadata)
completion-metadata("" #[771 "r\300q\210\212\302.\303 \210)\304.\305 .$*\207" [#<buffer llagent.cpp> enable-recursive-minibuffers t visit-tags-table-buffer complete-with-action tags-completion-table] 8 "\n\n(fn STRING PRED ACTION)"] nil)
completion--do-completion(28 28)
completion--in-region-1(28 28)
#[1028 "..\n\203!.\304.!\203. .\202. \305.!\305.\306\".F.\307\310!\210\311.\"*\207" [minibuffer-completion-predicate minibuffer-completion-table completion-in-region-mode-predicate completion-in-region--data markerp copy-marker t completion-in-region-mode 1 completion--in-region-1] 8 "\n\n(fn START END COLLECTION PREDICATE)"](28 28 #[771 "r\300q\210\212\302.\303 \210)\304.\305 .$*\207" [#<buffer llagent.cpp> enable-recursive-minibuffers t visit-tags-table-buffer complete-with-action tags-completion-table] 8 "\n\n(fn STRING PRED ACTION)"] nil)
apply(#[1028 "..\n\203!.\304.!\203. .\202. \305.!\305.\306\".F.\307\310!\210\311.\"*\207" [minibuffer-completion-predicate minibuffer-completion-table completion-in-region-mode-predicate completion-in-region--data markerp copy-marker t completion-in-region-mode 1 completion--in-region-1] 8 "\n\n(fn START END COLLECTION PREDICATE)"] (28 28 #[771 "r\300q\210\212\302.\303 \210)\304.\305 .$*\207" [#<buffer llagent.cpp> enable-recursive-minibuffers t visit-tags-table-buffer complete-with-action tags-completion-table] 8 "\n\n(fn STRING PRED ACTION)"] nil))
#[771 ".:\2030.@\301=\203. \300\242\302.A\"\303.#\207\304.@\305\306\307\310\311\312\300!\313\"\314\315%.A.#.#\207\304\316.\"\207" [(#0) t append nil apply apply-partially make-byte-code 642 "\300\242..#\207" vconcat vector [] 7 "\n\n(fn FUNS GLOBAL &rest ARGS)" #[1028 "..\n\203!.\304.!\203. .\202. \305.!\305.\306\".F.\307\310!\210\311.\"*\207" [minibuffer-completion-predicate minibuffer-completion-table completion-in-region-mode-predicate completion-in-region--data markerp copy-marker t completion-in-region-mode 1 completion--in-region-1] 8 "\n\n(fn START END COLLECTION PREDICATE)"]] 12 "\n\n(fn FUNS GLOBAL ARGS)"](nil nil (28 28 #[771 "r\300q\210\212\302.\303 \210)\304.\305 .$*\207" [#<buffer llagent.cpp> enable-recursive-minibuffers t visit-tags-table-buffer complete-with-action tags-completion-table] 8 "\n\n(fn STRING PRED ACTION)"] nil))
completion--in-region(28 28 #[771 "r\300q\210\212\302.\303 \210)\304.\305 .$*\207" [#<buffer llagent.cpp> enable-recursive-minibuffers t visit-tags-table-buffer complete-with-action tags-completion-table] 8 "\n\n(fn STRING PRED ACTION)"] nil)
completion-in-region(28 28 #[771 "r\300q\210\212\302.\303 \210)\304.\305 .$*\207" [#<buffer llagent.cpp> enable-recursive-minibuffers t visit-tags-table-buffer complete-with-action tags-completion-table] 8 "\n\n(fn STRING PRED ACTION)"] nil)
call-interactively(minibuffer-complete nil nil)
read-from-minibuffer("Find tag (default gAgent): " nil (keymap (menu-bar keymap (minibuf "Minibuf" keymap (tab menu-item "Complete" minibuffer-complete :help "Complete as far as possible") (space menu-item "Complete Word" minibuffer-complete-word :help "Complete at most one word") (63 menu-item "List Completions" minibuffer-completion-help :help "Display all possible completions") "Minibuf")) (27 keymap (118 . switch-to-completions)) (prior . switch-to-completions) (63 . minibuffer-completion-help) (32 . minibuffer-complete-word) (9 . minibuffer-complete) keymap (menu-bar keymap (minibuf "Minibuf" keymap (previous menu-item "Previous History Item" previous-history-element :help "Put previous minibuffer history element in the minibuffer") (next menu-item "Next History Item" next-history-element :help "Put next minibuffer history element in the minibuffer") (isearch-backward menu-item "Isearch History Backward" isearch-backward :help "Incrementally search minibuffer history backward") (isearch-forward menu-item "Isearch History Forward" isearch-forward :help "Incrementally search minibuffer history forward") (return menu-item "Enter" exit-minibuffer :key-sequence "." :help "Terminate input and exit minibuffer") (quit menu-item "Quit" abort-recursive-edit :help "Abort input and exit minibuffer") "Minibuf")) (10 . exit-minibuffer) (13 . exit-minibuffer) (7 . abort-recursive-edit) (C-tab . file-cache-minibuffer-complete) (9 . self-insert-command) (XF86Back . previous-history-element) (up . previous-history-element) (prior . previous-history-element) (XF86Forward . next-history-element) (down . next-history-element) (next . next-history-element) (27 keymap (114 . previous-matching-history-element) (115 . next-matching-history-element) (112 . previous-history-element) (110 . next-history-element))) nil nil "gAgent" nil)
completing-read-default("Find tag (default gAgent): " #[771 "r\300q\210\212\302.\303 \210)\304.\305 .$*\207" [#<buffer llagent.cpp> enable-recursive-minibuffers t visit-tags-table-buffer complete-with-action tags-completion-table] 8 "\n\n(fn STRING PRED ACTION)"] nil nil nil nil "gAgent" nil)
completing-read("Find tag (default gAgent): " #[771 "r\300q\210\212\302.\303 \210)\304.\305 .$*\207" [#<buffer llagent.cpp> enable-recursive-minibuffers t visit-tags-table-buffer complete-with-action tags-completion-table] 8 "\n\n(fn STRING PRED ACTION)"] nil nil nil nil "gAgent")
find-tag-tag("Find tag: ")
find-tag-interactive("Find tag: ")
call-interactively(find-tag record nil)
command-execute(find-tag record)
execute-extended-command(nil "find-tag")
call-interactively(execute-extended-command nil nil)
When I use C-p j and interrupt making the completion table with C-g
before the debugger comes up, I can enter what tag I'm searching for and
it can be found.
>> I cut off roughly the bottom half of the TAGS file and tried again. With
>> that, I'm getting the error when progress is at 85% instead of
>> 42%. Cutting off the bottom half again, leaving about 1/4 of the
>> original file, does not yield an error and says no matching tags were
>> found.
> The idea was to split the file in half, and do a sort of binary
> search. E.g., try cutting off the top half in the first step now.
>> So I guess the problem might have to do with the size of the TAGS file
>> ...
> Not necessarily. The TAGS file is parsed sequentially, without
> recursion in the Lisp code.
> In all likelihood, there is a problematic line around 42% of the
> original TAGS, and the error goes away when that line is not in the
> file anymore.
> We need to know that line to fix the bug.
I tried to find the line and only got to the point where so much of the
file was cut out that I didn't manage to go back to a step at which I'm
getting the error. If I have some time this weekend, I can try again.
Isn't there a way to get a better hint than the pretty vague "42%"?
Message #23 received at 20703 <at> (full text, mbox):
>> Could you give us an idea of how you get such a large TAGS file?
> I'm not doing anything special I'd be aware of, only C-p R to (re)build
> the TAGS file.
`C-p' is bound to `previous-line' by default, so "C-p R to (re)build
the TAGS file" does not ring a bell. Can you give us more details about
what this `C-p R' does (e.g. which command does it run)?
>> Emacs's Lisp directory weights in at around 60MB, and its TAGS file is
>> about 3MB, so assuming a similar ratio, your 1.8GB file seems to imply
>> that the indexed code of your project is more than 30GB in size, which
>> seems rather unusual.
> It isn't --- the code is available here:
So maybe there's a problem in the way the TAGS file was built.
Message #26 received at 20703 <at> (full text, mbox):
On 06/03/2015 12:10 AM, lee wrote:
> Processing goes to 42% before the debugger comes up:
Good. That means this error is not limited to Projectile.
> I tried to find the line and only got to the point where so much of the
> file was cut out that I didn't manage to go back to a step at which I'm
> getting the error. If I have some time this weekend, I can try again.
If you can get it down even to a 1000 lines you're comfortable sending,
it'll be good enough.
I'd like to improve our regexp to avoid the overflow problem if
possible, however the line in question is likely simply too long. If
there are a lot of these lines in TAGS (and there probably are, since
it's 1.8GB), you'll need to improve the method of its generation anyway.
With Projectile, it would likely mean adding some directories to the
ignored list, see projectile-tags-exclude-patterns.
> Isn't there a way to get a better hint than the pretty vague "42%"?
You can open the file and isearch-forward-regexp for .\{200,\} (or some
bigger value). That will find abnormally long lines.
Or try this patch:
diff --git a/lisp/progmodes/etags.el b/lisp/progmodes/etags.el
index bf57770..4e6a844 100644
--- a/lisp/progmodes/etags.el
+++ b/lisp/progmodes/etags.el
@@ -1267,18 +1267,22 @@ buffer-local values of tags table format variables."
;; \5 is the explicitly-specified tag name.
;; \6 is the line to start searching at;
;; \7 is the char to start searching at.
- (while (re-search-forward
- "^\\(\\([^\177]+[^-a-zA-Z0-9_+*$:\177]+\\)?\
+ (condition-case err
+ (while (re-search-forward
+ "^\\(\\([^\177]+[^-a-zA-Z0-9_+*$:\177]+\\)?\
- nil t)
- (push (prog1 (if (match-beginning 5)
- ;; There is an explicit tag name.
- (buffer-substring (match-beginning 5) (match-end 5))
- ;; No explicit tag name. Best guess.
- (buffer-substring (match-beginning 3) (match-end 3)))
- (progress-reporter-update progress-reporter (point)))
- table)))
+ nil t)
+ (push (prog1 (if (match-beginning 5)
+ ;; There is an explicit tag name.
+ (buffer-substring (match-beginning
5) (match-end 5))
+ ;; No explicit tag name. Best guess.
+ (buffer-substring (match-beginning 3)
(match-end 3)))
+ (progress-reporter-update progress-reporter
+ table))
+ (error
+ (message "error happened near %d" (point))
+ (error (error-message-string err)))))
(defun etags-snarf-tag (&optional use-explicit) ; Doc string?
Message #29 received at 20703 <at> (full text, mbox):
> From: lee <lee <at>>
> Date: Tue, 02 Jun 2015 23:26:10 +0200
> Cc: 20703 <at>
> > Emacs's Lisp directory weights in at around 60MB, and its TAGS file is
> > about 3MB, so assuming a similar ratio, your 1.8GB file seems to imply
> > that the indexed code of your project is more than 30GB in size, which
> > seems rather unusual.
> It isn't --- the code is available here:
> Projectile recognises the repo as a project and lets me build the TAGS
> file.
> > Do you also index files which are not human-written, maybe?
> The compiled version resides in a subdirectory, so it's possible.
FWIW, just cloning the Git repository and running etags on it produces
a TAGS file that is about 3.5MB, a far cry from 1.8GB.
Message #32 received at 20703 <at> (full text, mbox):
Hi Sam,
On 01/13/2016 08:54 PM, Sam Halliday wrote:
> I have been seeing a problem that is described in this bug report
> I have applied the suggested patch to etags-tags-completion-table (copied below in completeness for your convenience) and trapped an error case.
You should try the current version in emacs-25, it's smaller and faster
than previously, although it also probably fails at long-enough lines.
> I'm triggering the error in an extremely long line of code (46,000 characters!). I presume somebody programmatically generated the line and pasted it into the source. A workaround could be to simply filter such lines at the ctag building or loading stage, just something that deletes "long" lines, whatever that may mean. Probably 500 characters is long enough!
> I could also look at adding maximum sizes to my regexes in ctags, but that really isn't a general solution because many ctags patterns do not have such limits.
I can think of some other possible solutions:
- External pre-processor that removes lines that are too long.
- Extra step, together with a custom variable, in visit-tags-table, that
goes through the opened files and does the same.
- re-search-forward with limit, as implemented in the patch below
(against emacs-25), that might work against problematic files like that
(I haven't tested it).
I don't really know if we should install it, though, because it adds a
performance overhead of ~10%. And I don't know if this problem is common
Because another way to combat it is at the source: through judicious
application of --exclude argument. As a bonus, the generation phase will
become faster as well (sometimes dramatically).
Should we add a validation phase to visit-tags-table instead? Like, one
that would say "your TAGS files contains obviously malformed entries
from file XXX.min.js, go back and ignore it"?
diff --git a/lisp/progmodes/etags.el b/lisp/progmodes/etags.el
index 2db7220..9a663d4 100644
--- a/lisp/progmodes/etags.el
+++ b/lisp/progmodes/etags.el
@@ -1252,8 +1252,9 @@ etags-file-of-tag
(expand-file-name str (file-truename default-directory))))))
+(defvar etags--table-line-limit 500)
-(defun etags-tags-completion-table () ; Doc string?
+(defun etags-tags-completion-table () ; Doc string?
(let (table
@@ -1263,10 +1264,13 @@ etags-tags-completion-table
(goto-char (point-min))
;; This regexp matches an explicit tag name or the place where
;; it would start.
- (while (re-search-forward
- "[\f\t\n\r()=,; ]?\177\\\(?:\\([^\n\001]+\\)\001\\)?"
- nil t)
- (push (prog1 (if (match-beginning 1)
+ (while (not (eobp))
+ (if (not (re-search-forward
+ "[\f\t\n\r()=,; ]?\177\\\(?:\\([^\n\001]+\\)\001\\)?"
+ ;; Avoid lines that are too long (bug#20703).
+ (+ (point) etags--table-line-limit) t))
+ (forward-line 1)
+ (push (prog1 (if (match-beginning 1)
;; There is an explicit tag name.
(buffer-substring (match-beginning 1) (match-end 1))
;; No explicit tag name. Backtrack a little,
@@ -1277,7 +1281,7 @@ etags-tags-completion-table
(buffer-substring (point)
(match-beginning 0))
(goto-char (match-end 0))))
(progress-reporter-update progress-reporter (point)))
- table)))
+ table))))
(defun etags-snarf-tag (&optional use-explicit) ; Doc string?
Message #35 received at 20703 <at> (full text, mbox):
Dmitry Gutov <dgutov <at>> writes:
>> I'm triggering the error in an extremely long line of code (46,000
>> characters!).
> - re-search-forward with limit, as implemented in the patch below
> (against emacs-25), that might work against problematic files like
> that (I haven't tested it).
> I don't really know if we should install it, though, because it adds a
> performance overhead of ~10%. And I don't know if this problem is
> common enough.
I think this is a use case (46K long lines) that's really obscure, and a
10% performance it wouldn't be appropriate.
> Because another way to combat it is at the source: through judicious
> application of --exclude argument. As a bonus, the generation phase
> will become faster as well (sometimes dramatically).
> Should we add a validation phase to visit-tags-table instead? Like,
> one that would say "your TAGS files contains obviously malformed
> entries from file XXX.min.js, go back and ignore it"?
If that can be done efficiently, then that sounds like a good idea.
Otherwise, perhaps we should just say that etags just doesn't support
46K long line source files and close this report as a wontfix?
Message #38 received at 20703 <at> (full text, mbox):
Message #41 received at 20703 <at> (full text, mbox):
Lars Ingebrigtsen <larsi <at>> writes:
> If that can be done efficiently, then that sounds like a good idea.
> Otherwise, perhaps we should just say that etags just doesn't support
> 46K long line source files and close this report as a wontfix?
No comments in six weeks, so I'm closing this as a wontfix.
