GNU bug report logs - #8528
24.0.50; 32-bit Emacs with apparent 128M buffer size limit

Previous Next

Package: emacs;

Reported by: Evans Winner <ego111 <at> gmail.com>

Date: Wed, 20 Apr 2011 21:05:01 UTC

Severity: normal

Found in version 24.0.50

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 8528 in the body.
You can then email your comments to 8528 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8528; Package emacs. (Wed, 20 Apr 2011 21:05:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Evans Winner <ego111 <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 20 Apr 2011 21:05:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Evans Winner <ego111 <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit
Date: Wed, 20 Apr 2011 15:04:06 -0600
My understanding is that a 32-bit GNU Emacs should be able
to open files up to 512 M.  If I am wrong about that, please
let me know.  I have compiled Emacs trunk from source
several times in the last couple of months and somewhere in
the last month or so it seems that the limit on my machine
has become 128 M.  My math could be off, but on the
assumption that 128 Mebibytes = 2^27 bytes = 1024 * 131072
bytes, and starting with emacs -Q I tried:

    $ dd if=/dev/zero of=testfile bs=1024 count=131072

and tried to open the file, and got: "Maximum buffer size
exceeded".  Then I tried one K less:

    $ dd if=/dev/zero of=testfile bs=1024 count=131071

and the buffer opened.

I have verified using the `top' command that there is
sufficient free memory for the files.

Also, for what it's worth:

    ELISP> most-positive-fixnum ==> 536870911

I discovered this as a result of not being able to open a
large (~160Mb) .pdf file that I had earlier been able to
open.  Please let me know if there is any other information
I can provide, or if there is something simple I am doing
wrong.


In GNU Emacs 24.0.50.1 (i686-pc-linux-gnu, GTK+ Version 3.0.8)
 of 2011-04-19 on braintron
Windowing system distributor `The X.Org Foundation', version 11.0.11001000
configured using `configure  '--with-x-toolkit=gtk3''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default enable-multibyte-characters: t




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8528; Package emacs. (Thu, 21 Apr 2011 05:54:01 GMT) Full text and rfc822 format available.

Message #8 received at 8528 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Evans Winner <ego111 <at> gmail.com>, Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 8528 <at> debbugs.gnu.org
Subject: Re: bug#8528: 24.0.50;
	32-bit Emacs with apparent 128M buffer size limit
Date: Thu, 21 Apr 2011 08:52:34 +0300
> From: Evans Winner <ego111 <at> gmail.com>
> Date: Wed, 20 Apr 2011 15:04:06 -0600
> 
> My understanding is that a 32-bit GNU Emacs should be able
> to open files up to 512 M.  If I am wrong about that, please
> let me know.  I have compiled Emacs trunk from source
> several times in the last couple of months and somewhere in
> the last month or so it seems that the limit on my machine
> has become 128 M.  My math could be off, but on the
> assumption that 128 Mebibytes = 2^27 bytes = 1024 * 131072
> bytes, and starting with emacs -Q I tried:
> 
>     $ dd if=/dev/zero of=testfile bs=1024 count=131072
> 
> and tried to open the file, and got: "Maximum buffer size
> exceeded".

This happens because of the following test in insert-file-contents:

  /* Arithmetic overflow can occur if an Emacs integer cannot represent the
     file size, or if the calculations below overflow.  The calculations below
     double the file size twice, so check that it can be multiplied by 4
     safely.

     Also check whether the size is negative, which can happen on a platform
     that allows file sizes greater than the maximum off_t value.  */
  if (! not_regular
      && ! (0 <= st.st_size && st.st_size <= MOST_POSITIVE_FIXNUM / 4))
    error ("Maximum buffer size exceeded");

This test was commented out for the last 2 years, but lately it was
uncommented by Paul Eggert in revision 103841 on the trunk.

Paul, could you please tell where do you see twice doubling of the
file size in insert-file-contents?  Back in 1999, when this test was
first introduced, there was indeed such doubling.  But even then it
was only when the REPLACE argument was non-nil (according to my
reading of the code).  In any case, that part of code was completely
rewritten since then, and I don't believe we double the file size even
once.

By disabling that test, I was able to visit a 260-MB file on a 32-bit
machine.  So it seems like this test could be removed, if I'm not
missing anything.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8528; Package emacs. (Thu, 21 Apr 2011 06:20:02 GMT) Full text and rfc822 format available.

Message #11 received at 8528 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 8528 <at> debbugs.gnu.org, Evans Winner <ego111 <at> gmail.com>
Subject: Re: bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size
	limit
Date: Wed, 20 Apr 2011 23:18:55 -0700
On 04/20/11 22:52, Eli Zaretskii wrote:
> Paul, could you please tell where do you see twice doubling of the
> file size in insert-file-contents?

I assumed that it was because the internal buffers contain an
Emacs-encoded version of the file, which could be as long as four
times the actual file size, because a single byte in the file
might expand to 4 bytes inside Emacs in some cases.

That would explain the behavior that you saw: if your file's
internal encoding was the same as the external, you wouldn't observe any
problem.  The problem would be exhibited only with files containing
many characters that bloat when read into memory.  However,
I didn't investigate the matter thoroughly; perhaps someone
who's more expert on how Emacs encodes things internally could
speak up.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8528; Package emacs. (Thu, 21 Apr 2011 06:41:02 GMT) Full text and rfc822 format available.

Message #14 received at 8528 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 8528 <at> debbugs.gnu.org, ego111 <at> gmail.com
Subject: Re: bug#8528: 24.0.50;
	32-bit Emacs with apparent 128M buffer size limit
Date: Thu, 21 Apr 2011 09:40:26 +0300
> Date: Wed, 20 Apr 2011 23:18:55 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: Evans Winner <ego111 <at> gmail.com>, 8528 <at> debbugs.gnu.org
> 
> On 04/20/11 22:52, Eli Zaretskii wrote:
> > Paul, could you please tell where do you see twice doubling of the
> > file size in insert-file-contents?
> 
> I assumed that it was because the internal buffers contain an
> Emacs-encoded version of the file, which could be as long as four
> times the actual file size, because a single byte in the file
> might expand to 4 bytes inside Emacs in some cases.

Actually, it could potentially expand even 5-fold (because Emacs
extends UTF-8 to codepoints as large as 0x3FFFFF).  But we test the
buffer size and avoid overflowing it in many other places, both
further down in insert-file-contents and in insdel.c.  If those are
not enough, we could add more such tests, particularly after decoding
the file's contents, where we know the full buffer size in bytes.

So I think artificially limiting the maximum size of a file that can
be visited in that particular place in insert-file-contents is too
harsh.

> That would explain the behavior that you saw: if your file's
> internal encoding was the same as the external, you wouldn't observe any
> problem.  The problem would be exhibited only with files containing
> many characters that bloat when read into memory.

Right, but wouldn't you agree that such a limitation is too stringent?
E.g., I should be able to use find-file-literally to visit a 512MB
file, but currently I cannot.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8528; Package emacs. (Thu, 21 Apr 2011 06:59:01 GMT) Full text and rfc822 format available.

Message #17 received at 8528 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 8528 <at> debbugs.gnu.org, ego111 <at> gmail.com
Subject: Re: bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size
	limit
Date: Wed, 20 Apr 2011 23:58:36 -0700
On 04/20/11 23:40, Eli Zaretskii wrote:
> Right, but wouldn't you agree that such a limitation is too stringent?

Yes, absolutely, the limit should be removed if possible.

In a brief look at the code, it appeared to me that there were
places where the it does not check for integer overflow
in size calculations when converting external to internal form.  So
it could well be that this preliminary check may be needed to
avoid catastrophe later.  I have not checked this out carefully,
though, and I could be wrong.  (One way to find out would be
to test it with a worst-case-bloat file, but I haven't had time to
do that.)

> E.g., I should be able to use find-file-literally to visit a 512MB
> file, but currently I cannot.

If we know that byte bloat cannot occur, which is the case with
find-file-literally, then the divide-by-4 limit should not be needed.
That case should be easy, in that it shouldn't require a lot
of analysis to fix that case safely.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8528; Package emacs. (Thu, 21 Apr 2011 13:21:01 GMT) Full text and rfc822 format available.

Message #20 received at 8528 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 8528 <at> debbugs.gnu.org, ego111 <at> gmail.com
Subject: Re: bug#8528: 24.0.50;
	32-bit Emacs with apparent 128M buffer size limit
Date: Thu, 21 Apr 2011 16:20:54 +0300
> Date: Wed, 20 Apr 2011 23:58:36 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: ego111 <at> gmail.com, 8528 <at> debbugs.gnu.org
> 
> On 04/20/11 23:40, Eli Zaretskii wrote:
> > Right, but wouldn't you agree that such a limitation is too stringent?
> 
> Yes, absolutely, the limit should be removed if possible.
> 
> In a brief look at the code, it appeared to me that there were
> places where the it does not check for integer overflow
> in size calculations when converting external to internal form.  So
> it could well be that this preliminary check may be needed to
> avoid catastrophe later.  I have not checked this out carefully,
> though, and I could be wrong.

I have now reviewed the code involved in this, and I think the limit
can be lifted.

In general, there could be two ways for us to insert text into the
buffer as result of calling insert-file-contents: (a) directly, by
reading from the file into its buffer (or some temporary buffer used
as part of processing); or (b) indirectly, by decoding inserted text
through various functions in coding.c, which write the decoded text
into the destination buffer.

I found that both of these ways include tests for potential overflows
of the buffer size.  Inserting text directly is protected because it
enlarges the buffer's gap before inserting text, and make_gap which
does that errors out if the new size will overflow (actually, it
errors out 2000 bytes too early, because it wants some extra space).

Insertion by decoding text is also protected because it makes sure the
destination buffer has enough space before it writes another chunk of
decoded text into it.  It assures that by enlarging the gap, which
again goes through make_gap.

I found only one place where we were not protected from overflowing
MOST_POSITIVE_FIXNUM (not sure if it is relevant to
insert-file-contents), and one other place where I wasn't sure we were
protected, so I added a suitable protection in both those places.  See
the proposed patch below.

If no one objects, I will commit these changes in a week or so.

> > E.g., I should be able to use find-file-literally to visit a 512MB
> > file, but currently I cannot.
> 
> If we know that byte bloat cannot occur, which is the case with
> find-file-literally, then the divide-by-4 limit should not be needed.
> That case should be easy, in that it shouldn't require a lot
> of analysis to fix that case safely.

Yes, definitely.  But I think the patch below solves this problem as
well, so there's no need for special treatment for unibyte or pure
ASCII files.

Here's the proposed patch.  Evans, I'd appreciate if you could try it
and see if it solves the original problem for you.

=== modified file 'src/ChangeLog'
--- src/ChangeLog	2011-04-19 10:48:30 +0000
+++ src/ChangeLog	2011-04-21 12:35:30 +0000
@@ -1,3 +1,16 @@
+2011-04-21  Eli Zaretskii  <eliz <at> gnu.org>
+
+	* coding.c (coding_alloc_by_realloc): Error out if destination
+	will grow beyond MOST_POSITIVE_FIXNUM.
+	(decode_coding_emacs_mule): Abort if there isn't enough place in
+	charbuf for the composition carryover bytes.  Reserve an extra
+	space for up to 2 characters produced in a loop.
+	(decode_coding_iso_2022): Abort if there isn't enough place in
+	charbuf for the composition carryover bytes.
+
+	* fileio.c (Finsert_file_contents): Don't limit file size to 1/4
+	of MOST_POSITIVE_FIXNUM.
+
 2011-04-19  Eli Zaretskii  <eliz <at> gnu.org>
 
 	* syntax.h (SETUP_SYNTAX_TABLE_FOR_OBJECT): Fix setting of

=== modified file 'src/coding.c'
--- src/coding.c	2011-04-14 05:04:02 +0000
+++ src/coding.c	2011-04-21 12:35:33 +0000
@@ -1071,6 +1071,8 @@ coding_set_destination (struct coding_sy
 static void
 coding_alloc_by_realloc (struct coding_system *coding, EMACS_INT bytes)
 {
+  if (coding->dst_bytes > MOST_POSITIVE_FIXNUM - bytes)
+    error ("Maximum size of buffer or string exceeded");
   coding->destination = (unsigned char *) xrealloc (coding->destination,
 						    coding->dst_bytes + bytes);
   coding->dst_bytes += bytes;
@@ -2333,7 +2335,9 @@ decode_coding_emacs_mule (struct coding_
   /* We may produce two annotations (charset and composition) in one
      loop and one more charset annotation at the end.  */
   int *charbuf_end
-    = coding->charbuf + coding->charbuf_size - (MAX_ANNOTATION_LENGTH * 3);
+    = coding->charbuf + coding->charbuf_size - (MAX_ANNOTATION_LENGTH * 3)
+      /* We can produce up to 2 characters in a loop.  */
+      - 1;
   EMACS_INT consumed_chars = 0, consumed_chars_base;
   int multibytep = coding->src_multibyte;
   EMACS_INT char_offset = coding->produced_char;
@@ -2348,6 +2352,8 @@ decode_coding_emacs_mule (struct coding_
     {
       int i;
 
+      if (charbuf_end - charbuf < cmp_status->length)
+	abort ();
       for (i = 0; i < cmp_status->length; i++)
 	*charbuf++ = cmp_status->carryover[i];
       coding->annotated = 1;
@@ -3479,6 +3485,8 @@ decode_coding_iso_2022 (struct coding_sy
 
   if (cmp_status->state != COMPOSING_NO)
     {
+      if (charbuf_end - charbuf < cmp_status->length)
+	abort ();
       for (i = 0; i < cmp_status->length; i++)
 	*charbuf++ = cmp_status->carryover[i];
       coding->annotated = 1;

=== modified file 'src/fileio.c'
--- src/fileio.c	2011-04-14 20:20:17 +0000
+++ src/fileio.c	2011-04-21 12:07:44 +0000
@@ -3245,15 +3245,10 @@ variable `last-coding-system-used' to th
   record_unwind_protect (close_file_unwind, make_number (fd));
 
 
-  /* Arithmetic overflow can occur if an Emacs integer cannot represent the
-     file size, or if the calculations below overflow.  The calculations below
-     double the file size twice, so check that it can be multiplied by 4
-     safely.
-
-     Also check whether the size is negative, which can happen on a platform
-     that allows file sizes greater than the maximum off_t value.  */
+  /* Check whether the size is too large or negative, which can happen on a
+     platform that allows file sizes greater than the maximum off_t value.  */
   if (! not_regular
-      && ! (0 <= st.st_size && st.st_size <= MOST_POSITIVE_FIXNUM / 4))
+      && ! (0 <= st.st_size && st.st_size <= MOST_POSITIVE_FIXNUM))
     error ("Maximum buffer size exceeded");
 
   /* Prevent redisplay optimizations.  */





Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Fri, 29 Apr 2011 19:50:02 GMT) Full text and rfc822 format available.

Notification sent to Evans Winner <ego111 <at> gmail.com>:
bug acknowledged by developer. (Fri, 29 Apr 2011 19:50:03 GMT) Full text and rfc822 format available.

Message #25 received at 8528-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: eggert <at> cs.ucla.edu, ego111 <at> gmail.com
Cc: 8528-done <at> debbugs.gnu.org
Subject: Re: bug#8528: 24.0.50;
	32-bit Emacs with apparent 128M buffer size limit
Date: Fri, 29 Apr 2011 22:49:16 +0300
> Date: Thu, 21 Apr 2011 16:20:54 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 8528 <at> debbugs.gnu.org, ego111 <at> gmail.com
> 
> If no one objects, I will commit these changes in a week or so.

No one objected, so I installed this.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8528; Package emacs. (Mon, 02 May 2011 14:54:02 GMT) Full text and rfc822 format available.

Message #28 received at 8528 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: 8528 <at> debbugs.gnu.org
Cc: eliz <at> gnu.org
Subject: Re: bug#8528: 24.0.50;
	32-bit Emacs with apparent 128M buffer size limit
Date: Mon, 02 May 2011 11:53:24 -0300
> No one objected, so I installed this.

Thanks,


        Stefan




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 31 May 2011 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 12 years and 341 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.