GNU bug report logs - #20703
24.4; Stack overflow in regexp matcher

Previous Next

Package: emacs;

Reported by: lee <at> yagibdah.de

Date: Sun, 31 May 2015 17:53:02 UTC

Severity: minor

Tags: wontfix

Found in version 24.4

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 20703 in the body.
You can then email your comments to 20703 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#20703; Package emacs. (Sun, 31 May 2015 17:53:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to lee <at> yagibdah.de:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 31 May 2015 17:53:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: lee <at> yagibdah.de
To: bug-gnu-emacs <at> gnu.org
Subject: 24.4; Stack overflow in regexp matcher
Date: Sun, 31 May 2015 18:46:21 +0200
using projectile, trying to find a tag with C-p j

The TAGS file is 1.8GB.



In GNU Emacs 24.4.1 (x86_64-pc-linux-gnu, X toolkit)
 of 2015-03-28 on heimdali
Windowing system distributor `The X.Org Foundation', version 11.0.11604000
Configured using:
 `configure --prefix=/usr --build=x86_64-pc-linux-gnu
 --host=x86_64-pc-linux-gnu --mandir=/usr/share/man
 --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc
 --localstatedir=/var/lib --disable-dependency-tracking
 --disable-silent-rules --libdir=/usr/lib64 --program-suffix=-emacs-24
 --infodir=/usr/share/info/emacs-24 --localstatedir=/var
 --enable-locallisppath=/etc/emacs:/usr/share/emacs/site-lisp
 --with-gameuser=:gamestat --without-compress-install
 --with-file-notification=inotify --enable-acl --without-dbus
 --with-gnutls --without-gpm --without-hesiod --without-kerberos
 --without-kerberos5 --with-xml2 --without-selinux --with-wide-int
 --with-zlib --with-sound=alsa --with-x --without-ns --without-gconf
 --without-gsettings --without-toolkit-scroll-bars --with-gif
 --with-jpeg --with-png --with-rsvg --with-tiff --with-xpm
 --with-imagemagick --with-xft --without-libotf --without-m17n-flt
 --with-x-toolkit=lucid --with-xaw3d
 GENTOO_PACKAGE=app-editors/emacs-24.4-r4 'CFLAGS=-O2 -pipe
 -march=native' CPPFLAGS= 'LDFLAGS=-Wl,-O1 -Wl,--as-needed''

Important settings:
  value of $LANG: en_GB.utf8
  locale-coding-system: utf-8-unix

Major mode: Debugger

Minor modes in effect:
  show-paren-mode: t
  desktop-save-mode: t
  projectile-global-mode: t
  projectile-mode: t
  global-auto-complete-mode: t
  yas-global-mode: t
  yas-minor-mode: t
  tooltip-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  size-indication-mode: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:
y q C-x 1 <left> <left> <left> <left> <left> <left> 
<left> <left> <left> <left> <left> <left> <left> <left> 
<left> <left> <left> <left> <left> <left> <left> <left> 
<left> <left> <left> <left> <left> <left> <left> <left> 
<left> <down> C-c p j q q <escape> . <return> <right> 
<right> <right> <right> <right> <right> <right> <right> 
<right> <right> <right> <right> <right> <right> <left> 
<left> <left> C-x b <return> C-x b <return> C-x b <return> 
<left> <left> <left> <left> <left> <left> <left> <left> 
<left> <left> <left> <left> <left> <left> <left> <left> 
<left> <left> <left> <left> <right> <right> <right> 
<right> <right> C-SPC <right> <right> <right> <right> 
<right> <right> <right> <right> <right> <right> <right> 
<right> <right> <right> <right> <right> <right> <escape> 
. L L F o n <tab> q q C-x b T A G <tab> <tab> S <tab> 
<return> C-x b <return> <left> <left> <left> <left> 
<left> <left> <left> <left> <left> <left> <left> C-g 
<right> <right> <right> <right> <right> <right> C-c 
p j <home> C-SPC <end> <escape> w <down> <down> <right> 
<down> <left> <right> <up> <up> <up> <up> <up> C-SPC 
<down> <down> <down> <down> <down> <down> <down> <down> 
<down> <down> <down> <down> <down> <down> <down> <down> 
<escape> w <escape> x r e p o <tab> r <tab> <retur
n>

Recent messages:
Lazy desktop load complete
Quit
Making tags completion table for [some file]
Making tags completion table for [...]/TAGS...42%
Entering debugger...
Mark set
Beginning of buffer
Mark activated
End of buffer [7 times]
Making completion list...

Load-path shadows:
None found.

Features:
(shadow sort flyspell ispell mail-extr emacsbug vc-dispatcher vc-svn
cperl-mode conf-mode info js byte-opt bytecomp byte-compile cconv json
imenu make-mode sh-script smie executable debug gnus-dired etags
nxml-uchnm rng-xsd xsd-regexp rng-cmpct rng-nxml rng-valid rng-loc
rng-uri rng-parse nxml-parse rng-match rng-dt rng-util rng-pttrn nxml-ns
nxml-mode nxml-outln nxml-rap nxml-util nxml-glyph nxml-enc xmltok
org-element org-rmail org-mhe org-irc org-info org-gnus org-docview
doc-view jka-compr image-mode org-bibtex bibtex org-bbdb org-w3m org
org-macro org-footnote org-pcomplete pcomplete org-list org-faces
org-entities noutline outline easy-mmode org-version ob-emacs-lisp ob
ob-tangle ob-ref ob-lob ob-table ob-exp org-src ob-keys ob-comint comint
ansi-color ring ob-core ob-eval org-compat org-macs org-loaddefs
find-func cal-menu calendar cal-loaddefs vc-git cc-langs cc-mode
cc-fonts cc-guess cc-menus cc-cmds cc-styles cc-align cc-engine cc-vars
cc-defs nnir gnus-msg gnus-art mm-uu mml2015 mm-view mml-smime smime
password-cache dig mailcap gnus-sum nnoo gnus-group gnus-undo nnmail
mail-source gnus-start gnus-spec gnus-int gnus-range gnus-win message
sendmail format-spec rfc822 mml mml-sec mm-decode mm-bodies mm-encode
mail-parse rfc2231 rfc2047 rfc2045 ietf-drums mailabbrev gmm-utils
mailheader gnus gnus-ems nnheader gnus-util mail-utils mm-util
mail-prsvr wid-edit server two-column paren cus-start cus-load desktop
frameset dired boxquote rect package epg-config projectile advice
ibuf-ext ibuffer dash thingatpt fvwm-mode hi-lock lsl-mode
auto-complete-config auto-complete popup edmacro kmacro help-fns cl-macs
yasnippet help-mode easymenu cl gv cl-loaddefs cl-lib site-gentoo
bbdb-autoloads bbdb timezone time-date tooltip electric uniquify
ediff-hook vc-hooks lisp-float-type mwheel x-win x-dnd tool-bar dnd
fontset image regexp-opt fringe tabulated-list newcomment lisp-mode
prog-mode register page menu-bar rfn-eshadow timer select scroll-bar
mouse jit-lock font-lock syntax facemenu font-core frame cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev
minibuffer nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote make-network-process
inotify dynamic-setting font-render-setting x-toolkit x multi-tty emacs)

Memory information:
((conses 16 602091 50577)
 (symbols 48 147518 0)
 (miscs 40 1241 761)
 (strings 32 202695 50296)
 (string-bytes 1 5158936)
 (vectors 16 45193)
 (vector-slots 8 1104101 29866)
 (floats 8 1250 225)
 (intervals 56 25545 63)
 (buffers 960 223)
 (heap 1024 82413 1928))

-- 
Again we must be afraid of speaking of daemons for fear that daemons
might swallow us.  Finally, this fear has become reasonable.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20703; Package emacs. (Sun, 31 May 2015 22:27:02 GMT) Full text and rfc822 format available.

Message #8 received at 20703 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: lee <at> yagibdah.de, 20703 <at> debbugs.gnu.org
Subject: Re: bug#20703: 24.4; Stack overflow in regexp matcher
Date: Mon, 1 Jun 2015 01:26:06 +0300
On 05/31/2015 07:46 PM, lee <at> yagibdah.de wrote:
>
> using projectile, trying to find a tag with C-p j
>
> The TAGS file is 1.8GB.

What if you try `M-x find-tag'?

If it exhibits the same problem, try splitting the file in half, before 
one of the page break characters (they look like ^L in Emacs) preceding 
a file name. That will make both parts a valid tags file.

Try `M-x visit-tags-table' on each (say no when Emacs suggests to keep 
the old one), and see if both halves have the same problem.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20703; Package emacs. (Mon, 01 Jun 2015 14:11:02 GMT) Full text and rfc822 format available.

Message #11 received at 20703 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: lee <at> yagibdah.de
Cc: 20703 <at> debbugs.gnu.org
Subject: Re: bug#20703: 24.4; Stack overflow in regexp matcher
Date: Mon, 01 Jun 2015 10:10:24 -0400
> The TAGS file is 1.8GB.

Could you give us an idea of how you get such a large TAGS file?
Emacs's Lisp directory weights in at around 60MB, and its TAGS file is
about 3MB, so assuming a similar ratio, your 1.8GB file seems to imply
that the indexed code of your project is more than 30GB in size, which
seems rather unusual.

Do you also index files which are not human-written, maybe?


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20703; Package emacs. (Mon, 01 Jun 2015 18:46:02 GMT) Full text and rfc822 format available.

Message #14 received at 20703 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: lee <lee <at> yagibdah.de>
Cc: 20703 <at> debbugs.gnu.org
Subject: Re: bug#20703: 24.4; Stack overflow in regexp matcher
Date: Mon, 1 Jun 2015 21:45:13 +0300
Please keep the bug address in Cc.

On 06/01/2015 09:03 PM, lee wrote:

>> What if you try `M-x find-tag'?
>
> That works.

What if you type `M-x find-tag TAB' (to ask Emacs for all available tags)?

> I cut off roughly the bottom half of the TAGS file and tried again. With
> that, I'm getting the error when progress is at 85% instead of
> 42%. Cutting off the bottom half again, leaving about 1/4 of the
> original file, does not yield an error and says no matching tags were
> found.

The idea was to split the file in half, and do a sort of binary search. 
E.g., try cutting off the top half in the first step now.

> So I guess the problem might have to do with the size of the TAGS file
> ...

Not necessarily. The TAGS file is parsed sequentially, without recursion 
in the Lisp code.

In all likelihood, there is a problematic line around 42% of the 
original TAGS, and the error goes away when that line is not in the file 
anymore.

We need to know that line to fix the bug.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20703; Package emacs. (Tue, 02 Jun 2015 21:27:02 GMT) Full text and rfc822 format available.

Message #17 received at 20703 <at> debbugs.gnu.org (full text, mbox):

From: lee <lee <at> yagibdah.de>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 20703 <at> debbugs.gnu.org
Subject: Re: bug#20703: 24.4; Stack overflow in regexp matcher
Date: Tue, 02 Jun 2015 23:26:10 +0200
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

>> The TAGS file is 1.8GB.
>
> Could you give us an idea of how you get such a large TAGS file?

I'm not doing anything special I'd be aware of, only C-p R to (re)build
the TAGS file.

> Emacs's Lisp directory weights in at around 60MB, and its TAGS file is
> about 3MB, so assuming a similar ratio, your 1.8GB file seems to imply
> that the indexed code of your project is more than 30GB in size, which
> seems rather unusual.

It isn't --- the code is available here: https://github.com/Ratany/SingularityViewer

Projectile recognises the repo as a project and lets me build the TAGS
file.

> Do you also index files which are not human-written, maybe?

The compiled version resides in a subdirectory, so it's possible.  I
don't know which files are considered by default when creating the TAGS
file with C-p R.  I only just started trying out projectile; that
compilation results are included would be a bit unexpected.


-- 
Again we must be afraid of speaking of daemons for fear that daemons
might swallow us.  Finally, this fear has become reasonable.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20703; Package emacs. (Tue, 02 Jun 2015 21:27:03 GMT) Full text and rfc822 format available.

Message #20 received at 20703 <at> debbugs.gnu.org (full text, mbox):

From: lee <lee <at> yagibdah.de>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 20703 <at> debbugs.gnu.org
Subject: Re: bug#20703: 24.4; Stack overflow in regexp matcher
Date: Tue, 02 Jun 2015 23:10:33 +0200
Dmitry Gutov <dgutov <at> yandex.ru> writes:

> Please keep the bug address in Cc.
>
> On 06/01/2015 09:03 PM, lee wrote:
>
>>> What if you try `M-x find-tag'?
>>
>> That works.
>
> What if you type `M-x find-tag TAB' (to ask Emacs for all available tags)?

Processing goes to 42% before the debugger comes up:


Debugger entered--Lisp error: (error "Stack overflow in regexp matcher")
  re-search-forward("^\\(\\([^]+[^-a-zA-Z0-9_+*$:]+\\)?\\([-a-zA-Z0-9_+*$?:]+\\)[^-a-zA-Z0-9_+*$?:]*\\)\\(\\([^\n.]+\\).\\)?\\([0-9]+\\)?,\\([0-9]+\\)?\n" nil t)
  etags-tags-completion-table()
  #[0 "\303\211C\304\305\"\210\212\306.\242\205.\307!\2038.	 \262.\211\242\2031.\310\311\312\313\314\315.!\316\"\317\320%.\"\210\202	.\211.\240\210\202	.)\304\321\"\210\211\242\211.\207" [buffer-file-name tags-completion-table-function tags-completion-table nil message "Making tags completion table for %s..." visit-tags-table-buffer t mapatoms make-byte-code 257 "\301\302.!\300\242\"\207" vconcat vector [intern symbol-name] 4 "\n\n(fn SYM)" "Making tags completion table for %s...done"] 9 "\n\n(fn)"]()
  funcall(#[0 "\303\211C\304\305\"\210\212\306.\242\205.\307!\2038.	 \262.\211\242\2031.\310\311\312\313\314\315.!\316\"\317\320%.\"\210\202	.\211.\240\210\202	.)\304\321\"\210\211\242\211.\207" [buffer-file-name tags-completion-table-function tags-completion-table nil message "Making tags completion table for %s..." visit-tags-table-buffer t mapatoms make-byte-code 257 "\301\302.!\300\242\"\207" vconcat vector [intern symbol-name] 4 "\n\n(fn SYM)" "Making tags completion table for %s...done"] 9 "\n\n(fn)"])
  tags-completion-table()
  #[771 "r\300q\210\212\302.\303 \210)\304.\305 .$*\207" [#<buffer llagent.cpp> enable-recursive-minibuffers t visit-tags-table-buffer complete-with-action tags-completion-table] 8 "\n\n(fn STRING PRED ACTION)"]("" nil metadata)
  completion-metadata("" #[771 "r\300q\210\212\302.\303 \210)\304.\305 .$*\207" [#<buffer llagent.cpp> enable-recursive-minibuffers t visit-tags-table-buffer complete-with-action tags-completion-table] 8 "\n\n(fn STRING PRED ACTION)"] nil)
  completion--field-metadata(28)
  completion--do-completion(28 28)
  completion--in-region-1(28 28)
  #[1028 "..\n\203!.\304.!\203..\202.\305.!\305.\306\".F.\307\310!\210\311.\"*\207" [minibuffer-completion-predicate minibuffer-completion-table completion-in-region-mode-predicate completion-in-region--data markerp copy-marker t completion-in-region-mode 1 completion--in-region-1] 8 "\n\n(fn START END COLLECTION PREDICATE)"](28 28 #[771 "r\300q\210\212\302.\303 \210)\304.\305 .$*\207" [#<buffer llagent.cpp> enable-recursive-minibuffers t visit-tags-table-buffer complete-with-action tags-completion-table] 8 "\n\n(fn STRING PRED ACTION)"] nil)
  apply(#[1028 "..\n\203!.\304.!\203..\202.\305.!\305.\306\".F.\307\310!\210\311.\"*\207" [minibuffer-completion-predicate minibuffer-completion-table completion-in-region-mode-predicate completion-in-region--data markerp copy-marker t completion-in-region-mode 1 completion--in-region-1] 8 "\n\n(fn START END COLLECTION PREDICATE)"] (28 28 #[771 "r\300q\210\212\302.\303 \210)\304.\305 .$*\207" [#<buffer llagent.cpp> enable-recursive-minibuffers t visit-tags-table-buffer complete-with-action tags-completion-table] 8 "\n\n(fn STRING PRED ACTION)"] nil))
  #[771 ".:\2030.@\301=\203.\300\242\302.A\"\303.#\207\304.@\305\306\307\310\311\312\300!\313\"\314\315%.A.#.#\207\304\316.\"\207" [(#0) t append nil apply apply-partially make-byte-code 642 "\300\242..#\207" vconcat vector [] 7 "\n\n(fn FUNS GLOBAL &rest ARGS)" #[1028 "..\n\203!.\304.!\203..\202.\305.!\305.\306\".F.\307\310!\210\311.\"*\207" [minibuffer-completion-predicate minibuffer-completion-table completion-in-region-mode-predicate completion-in-region--data markerp copy-marker t completion-in-region-mode 1 completion--in-region-1] 8 "\n\n(fn START END COLLECTION PREDICATE)"]] 12 "\n\n(fn FUNS GLOBAL ARGS)"](nil nil (28 28 #[771 "r\300q\210\212\302.\303 \210)\304.\305 .$*\207" [#<buffer llagent.cpp> enable-recursive-minibuffers t visit-tags-table-buffer complete-with-action tags-completion-table] 8 "\n\n(fn STRING PRED ACTION)"] nil))
  completion--in-region(28 28 #[771 "r\300q\210\212\302.\303 \210)\304.\305 .$*\207" [#<buffer llagent.cpp> enable-recursive-minibuffers t visit-tags-table-buffer complete-with-action tags-completion-table] 8 "\n\n(fn STRING PRED ACTION)"] nil)
  completion-in-region(28 28 #[771 "r\300q\210\212\302.\303 \210)\304.\305 .$*\207" [#<buffer llagent.cpp> enable-recursive-minibuffers t visit-tags-table-buffer complete-with-action tags-completion-table] 8 "\n\n(fn STRING PRED ACTION)"] nil)
  minibuffer-complete()
  call-interactively(minibuffer-complete nil nil)
  command-execute(minibuffer-complete)
  read-from-minibuffer("Find tag (default gAgent): " nil (keymap (menu-bar keymap (minibuf "Minibuf" keymap (tab menu-item "Complete" minibuffer-complete :help "Complete as far as possible") (space menu-item "Complete Word" minibuffer-complete-word :help "Complete at most one word") (63 menu-item "List Completions" minibuffer-completion-help :help "Display all possible completions") "Minibuf")) (27 keymap (118 . switch-to-completions)) (prior . switch-to-completions) (63 . minibuffer-completion-help) (32 . minibuffer-complete-word) (9 . minibuffer-complete) keymap (menu-bar keymap (minibuf "Minibuf" keymap (previous menu-item "Previous History Item" previous-history-element :help "Put previous minibuffer history element in the minibuffer") (next menu-item "Next History Item" next-history-element :help "Put next minibuffer history element in the minibuffer") (isearch-backward menu-item "Isearch History Backward" isearch-backward :help "Incrementally search minibuffer history backward") (isearch-forward menu-item "Isearch History Forward" isearch-forward :help "Incrementally search minibuffer history forward") (return menu-item "Enter" exit-minibuffer :key-sequence "." :help "Terminate input and exit minibuffer") (quit menu-item "Quit" abort-recursive-edit :help "Abort input and exit minibuffer") "Minibuf")) (10 . exit-minibuffer) (13 . exit-minibuffer) (7 . abort-recursive-edit) (C-tab . file-cache-minibuffer-complete) (9 . self-insert-command) (XF86Back . previous-history-element) (up . previous-history-element) (prior . previous-history-element) (XF86Forward . next-history-element) (down . next-history-element) (next . next-history-element) (27 keymap (114 . previous-matching-history-element) (115 . next-matching-history-element) (112 . previous-history-element) (110 . next-history-element))) nil nil "gAgent" nil)
  completing-read-default("Find tag (default gAgent): " #[771 "r\300q\210\212\302.\303 \210)\304.\305 .$*\207" [#<buffer llagent.cpp> enable-recursive-minibuffers t visit-tags-table-buffer complete-with-action tags-completion-table] 8 "\n\n(fn STRING PRED ACTION)"] nil nil nil nil "gAgent" nil)
  completing-read("Find tag (default gAgent): " #[771 "r\300q\210\212\302.\303 \210)\304.\305 .$*\207" [#<buffer llagent.cpp> enable-recursive-minibuffers t visit-tags-table-buffer complete-with-action tags-completion-table] 8 "\n\n(fn STRING PRED ACTION)"] nil nil nil nil "gAgent")
  find-tag-tag("Find tag: ")
  find-tag-interactive("Find tag: ")
  call-interactively(find-tag record nil)
  command-execute(find-tag record)
  execute-extended-command(nil "find-tag")
  call-interactively(execute-extended-command nil nil)
  command-execute(execute-extended-command)


When I use C-p j and interrupt making the completion table with C-g
before the debugger comes up, I can enter what tag I'm searching for and
it can be found.

>> I cut off roughly the bottom half of the TAGS file and tried again. With
>> that, I'm getting the error when progress is at 85% instead of
>> 42%. Cutting off the bottom half again, leaving about 1/4 of the
>> original file, does not yield an error and says no matching tags were
>> found.
>
> The idea was to split the file in half, and do a sort of binary
> search. E.g., try cutting off the top half in the first step now.
>
>> So I guess the problem might have to do with the size of the TAGS file
>> ...
>
> Not necessarily. The TAGS file is parsed sequentially, without
> recursion in the Lisp code.
>
> In all likelihood, there is a problematic line around 42% of the
> original TAGS, and the error goes away when that line is not in the
> file anymore.
>
> We need to know that line to fix the bug.

I tried to find the line and only got to the point where so much of the
file was cut out that I didn't manage to go back to a step at which I'm
getting the error.  If I have some time this weekend, I can try again.

Isn't there a way to get a better hint than the pretty vague "42%"?


-- 
Again we must be afraid of speaking of daemons for fear that daemons
might swallow us.  Finally, this fear has become reasonable.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20703; Package emacs. (Wed, 03 Jun 2015 00:47:02 GMT) Full text and rfc822 format available.

Message #23 received at 20703 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: lee <lee <at> yagibdah.de>
Cc: 20703 <at> debbugs.gnu.org
Subject: Re: bug#20703: 24.4; Stack overflow in regexp matcher
Date: Tue, 02 Jun 2015 20:46:08 -0400
>> Could you give us an idea of how you get such a large TAGS file?
> I'm not doing anything special I'd be aware of, only C-p R to (re)build
> the TAGS file.

`C-p' is bound to `previous-line' by default, so "C-p R to (re)build
the TAGS file" does not ring a bell.  Can you give us more details about
what this `C-p R' does (e.g. which command does it run)?

>> Emacs's Lisp directory weights in at around 60MB, and its TAGS file is
>> about 3MB, so assuming a similar ratio, your 1.8GB file seems to imply
>> that the indexed code of your project is more than 30GB in size, which
>> seems rather unusual.
> It isn't --- the code is available here: https://github.com/Ratany/SingularityViewer

So maybe there's a problem in the way the TAGS file was built.


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20703; Package emacs. (Wed, 03 Jun 2015 00:59:02 GMT) Full text and rfc822 format available.

Message #26 received at 20703 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: lee <lee <at> yagibdah.de>
Cc: 20703 <at> debbugs.gnu.org
Subject: Re: bug#20703: 24.4; Stack overflow in regexp matcher
Date: Wed, 3 Jun 2015 03:58:10 +0300
On 06/03/2015 12:10 AM, lee wrote:

> Processing goes to 42% before the debugger comes up:

Good. That means this error is not limited to Projectile.

> I tried to find the line and only got to the point where so much of the
> file was cut out that I didn't manage to go back to a step at which I'm
> getting the error.  If I have some time this weekend, I can try again.

If you can get it down even to a 1000 lines you're comfortable sending, 
it'll be good enough.

I'd like to improve our regexp to avoid the overflow problem if 
possible, however the line in question is likely simply too long. If 
there are a lot of these lines in TAGS (and there probably are, since 
it's 1.8GB), you'll need to improve the method of its generation anyway.

With Projectile, it would likely mean adding some directories to the 
ignored list, see projectile-tags-exclude-patterns.

> Isn't there a way to get a better hint than the pretty vague "42%"?

You can open the file and isearch-forward-regexp for .\{200,\} (or some 
bigger value). That will find abnormally long lines.

Or try this patch:

diff --git a/lisp/progmodes/etags.el b/lisp/progmodes/etags.el
index bf57770..4e6a844 100644
--- a/lisp/progmodes/etags.el
+++ b/lisp/progmodes/etags.el
@@ -1267,18 +1267,22 @@ buffer-local values of tags table format variables."
       ;;   \5 is the explicitly-specified tag name.
       ;;   \6 is the line to start searching at;
       ;;   \7 is the char to start searching at.
-      (while (re-search-forward
-	      "^\\(\\([^\177]+[^-a-zA-Z0-9_+*$:\177]+\\)?\
+      (condition-case err
+          (while (re-search-forward
+                  "^\\(\\([^\177]+[^-a-zA-Z0-9_+*$:\177]+\\)?\
 \\([-a-zA-Z0-9_+*$?:]+\\)[^-a-zA-Z0-9_+*$?:\177]*\\)\177\
 \\(\\([^\n\001]+\\)\001\\)?\\([0-9]+\\)?,\\([0-9]+\\)?\n"
-	      nil t)
-	(push	(prog1 (if (match-beginning 5)
-			   ;; There is an explicit tag name.
-			   (buffer-substring (match-beginning 5) (match-end 5))
-			 ;; No explicit tag name.  Best guess.
-			 (buffer-substring (match-beginning 3) (match-end 3)))
-		  (progress-reporter-update progress-reporter (point)))
-		table)))
+                  nil t)
+            (push	(prog1 (if (match-beginning 5)
+                                   ;; There is an explicit tag name.
+                                   (buffer-substring (match-beginning 
5) (match-end 5))
+                                 ;; No explicit tag name.  Best guess.
+                                 (buffer-substring (match-beginning 3) 
(match-end 3)))
+                          (progress-reporter-update progress-reporter 
(point)))
+                        table))
+        (error
+         (message "error happened near %d" (point))
+         (error (error-message-string err)))))
     table))

 (defun etags-snarf-tag (&optional use-explicit) ; Doc string?





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20703; Package emacs. (Wed, 03 Jun 2015 14:47:02 GMT) Full text and rfc822 format available.

Message #29 received at 20703 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: lee <lee <at> yagibdah.de>
Cc: 20703 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#20703: 24.4; Stack overflow in regexp matcher
Date: Wed, 03 Jun 2015 17:46:33 +0300
> From: lee <lee <at> yagibdah.de>
> Date: Tue, 02 Jun 2015 23:26:10 +0200
> Cc: 20703 <at> debbugs.gnu.org
> 
> > Emacs's Lisp directory weights in at around 60MB, and its TAGS file is
> > about 3MB, so assuming a similar ratio, your 1.8GB file seems to imply
> > that the indexed code of your project is more than 30GB in size, which
> > seems rather unusual.
> 
> It isn't --- the code is available here: https://github.com/Ratany/SingularityViewer
> 
> Projectile recognises the repo as a project and lets me build the TAGS
> file.
> 
> > Do you also index files which are not human-written, maybe?
> 
> The compiled version resides in a subdirectory, so it's possible.

FWIW, just cloning the Git repository and running etags on it produces
a TAGS file that is about 3.5MB, a far cry from 1.8GB.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20703; Package emacs. (Wed, 13 Jan 2016 21:26:01 GMT) Full text and rfc822 format available.

Message #32 received at 20703 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Sam Halliday <sam.halliday <at> gmail.com>, help-gnu-emacs <at> gnu.org
Cc: 20703 <at> debbugs.gnu.org
Subject: Re: BUG 20703 further evidence
Date: Thu, 14 Jan 2016 00:25:32 +0300
Hi Sam,

On 01/13/2016 08:54 PM, Sam Halliday wrote:

> I have been seeing a problem that is described in this bug report
>
>    https://debbugs.gnu.org/db/20/20703.html
>
> I have applied the suggested patch to etags-tags-completion-table (copied below in completeness for your convenience) and trapped an error case.

You should try the current version in emacs-25, it's smaller and faster 
than previously, although it also probably fails at long-enough lines.

> I'm triggering the error in an extremely long line of code (46,000 characters!). I presume somebody programmatically generated the line and pasted it into the source. A workaround could be to simply filter such lines at the ctag building or loading stage, just something that deletes "long" lines, whatever that may mean. Probably 500 characters is long enough!
>
> I could also look at adding maximum sizes to my regexes in ctags, but that really isn't a general solution because many ctags patterns do not have such limits.

I can think of some other possible solutions:

- External pre-processor that removes lines that are too long.

- Extra step, together with a custom variable, in visit-tags-table, that 
goes through the opened files and does the same.

- re-search-forward with limit, as implemented in the patch below 
(against emacs-25), that might work against problematic files like that 
(I haven't tested it).

I don't really know if we should install it, though, because it adds a 
performance overhead of ~10%. And I don't know if this problem is common 
enough.

Because another way to combat it is at the source: through judicious 
application of --exclude argument. As a bonus, the generation phase will 
become faster as well (sometimes dramatically).

Should we add a validation phase to visit-tags-table instead? Like, one 
that would say "your TAGS files contains obviously malformed entries 
from file XXX.min.js, go back and ignore it"?

diff --git a/lisp/progmodes/etags.el b/lisp/progmodes/etags.el
index 2db7220..9a663d4 100644
--- a/lisp/progmodes/etags.el
+++ b/lisp/progmodes/etags.el
@@ -1252,8 +1252,9 @@ etags-file-of-tag
 	  str
 	(expand-file-name str (file-truename default-directory))))))

+(defvar etags--table-line-limit 500)

-(defun etags-tags-completion-table () ; Doc string?
+(defun etags-tags-completion-table ()   ; Doc string?
   (let (table
 	(progress-reporter
 	 (make-progress-reporter
@@ -1263,10 +1264,13 @@ etags-tags-completion-table
       (goto-char (point-min))
       ;; This regexp matches an explicit tag name or the place where
       ;; it would start.
-      (while (re-search-forward
-              "[\f\t\n\r()=,; ]?\177\\\(?:\\([^\n\001]+\\)\001\\)?"
-	      nil t)
-	(push	(prog1 (if (match-beginning 1)
+      (while (not (eobp))
+        (if (not (re-search-forward
+                  "[\f\t\n\r()=,; ]?\177\\\(?:\\([^\n\001]+\\)\001\\)?"
+                  ;; Avoid lines that are too long (bug#20703).
+                  (+ (point) etags--table-line-limit) t))
+            (forward-line 1)
+          (push (prog1 (if (match-beginning 1)
 			   ;; There is an explicit tag name.
 			   (buffer-substring (match-beginning 1) (match-end 1))
 			 ;; No explicit tag name.  Backtrack a little,
@@ -1277,7 +1281,7 @@ etags-tags-completion-table
                              (buffer-substring (point) 
(match-beginning 0))
                            (goto-char (match-end 0))))
 		  (progress-reporter-update progress-reporter (point)))
-		table)))
+		table))))
     table))

 (defun etags-snarf-tag (&optional use-explicit) ; Doc string?





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20703; Package emacs. (Tue, 25 Aug 2020 09:14:02 GMT) Full text and rfc822 format available.

Message #35 received at 20703 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: Sam Halliday <sam.halliday <at> gmail.com>, 20703 <at> debbugs.gnu.org,
 help-gnu-emacs <at> gnu.org
Subject: Re: bug#20703: BUG 20703 further evidence
Date: Tue, 25 Aug 2020 11:13:08 +0200
Dmitry Gutov <dgutov <at> yandex.ru> writes:

>> I'm triggering the error in an extremely long line of code (46,000
>> characters!).

[...]

> - re-search-forward with limit, as implemented in the patch below
>   (against emacs-25), that might work against problematic files like
>   that (I haven't tested it).
>
> I don't really know if we should install it, though, because it adds a
> performance overhead of ~10%. And I don't know if this problem is
> common enough.

I think this is a use case (46K long lines) that's really obscure, and a
10% performance it wouldn't be appropriate.

> Because another way to combat it is at the source: through judicious
> application of --exclude argument. As a bonus, the generation phase
> will become faster as well (sometimes dramatically).
>
> Should we add a validation phase to visit-tags-table instead? Like,
> one that would say "your TAGS files contains obviously malformed
> entries from file XXX.min.js, go back and ignore it"?

If that can be done efficiently, then that sounds like a good idea.
Otherwise, perhaps we should just say that etags just doesn't support
46K long line source files and close this report as a wontfix?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20703; Package emacs. (Tue, 25 Aug 2020 14:56:02 GMT) Full text and rfc822 format available.

Message #38 received at 20703 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>, Dmitry Gutov <dgutov <at> yandex.ru>
Cc: Sam Halliday <sam.halliday <at> gmail.com>, 20703 <at> debbugs.gnu.org
Subject: RE: bug#20703: BUG 20703 further evidence
Date: Tue, 25 Aug 2020 07:54:52 -0700 (PDT)
Is there really a need to cc help-gnu-emacs <at> gnu.org?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20703; Package emacs. (Sun, 11 Oct 2020 03:10:02 GMT) Full text and rfc822 format available.

Message #41 received at 20703 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: Sam Halliday <sam.halliday <at> gmail.com>, 20703 <at> debbugs.gnu.org,
 help-gnu-emacs <at> gnu.org
Subject: Re: bug#20703: BUG 20703 further evidence
Date: Sun, 11 Oct 2020 05:08:52 +0200
Lars Ingebrigtsen <larsi <at> gnus.org> writes:

> If that can be done efficiently, then that sounds like a good idea.
> Otherwise, perhaps we should just say that etags just doesn't support
> 46K long line source files and close this report as a wontfix?

No comments in six weeks, so I'm closing this as a wontfix.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Added tag(s) wontfix. Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Sun, 11 Oct 2020 03:10:03 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 20703 <at> debbugs.gnu.org and lee <at> yagibdah.de Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Sun, 11 Oct 2020 03:10:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 08 Nov 2020 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 167 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.