GNU bug report logs - #8545
issues with recent doprnt-related changes

Previous Next

Package: emacs;

Reported by: Paul Eggert <eggert <at> cs.ucla.edu>

Date: Mon, 25 Apr 2011 05:48:01 UTC

Severity: normal

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 8545 in the body.
You can then email your comments to 8545 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Mon, 25 Apr 2011 05:48:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Paul Eggert <eggert <at> cs.ucla.edu>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Mon, 25 Apr 2011 05:48:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: bug-gnu-emacs <at> gnu.org
Cc: Eli Zaretskii <eliz <at> gnu.org>
Subject: issues with recent doprnt-related changes
Date: Sun, 24 Apr 2011 22:46:33 -0700
This is a followup to Bug#8435.  Eli invited me to review the recent
doprnt-related changes, so here's a quick review:

* doprnt returns size_t.  But Stefan wrote that he prefers sizes to be
  signed values, and doprnt always returns a value that can fit in
  EMACS_INT.  So shouldn't doprnt return EMACS_INT, as it did before?

* doprnt supports only a small subset of the standard printf formats,
  but this subset is not documented.  It's unclear what the subset is.
  Or it's a superset of the subset, with %S and %l?  Anyway, this
  should be documented clearly in the lead comment.

* I suggest that every feature in doprnt be a feature that is actually
  needed and used; this will simplify maintainance.

* Format strings never include embedded null bytes, so there's
  no need for doprnt to support that.

* If the format string is too long, the alloca inside doprnt will
  crash Emacs on some hosts.  I suggest removing the alloca,
  instituting a fixed size limit on format specifiers, and documenting
  that limit.  Since user code cannot ever supply one of these
  formats, that should be good enough.

* The width features of doprnt (e.g., %25s) are never used.  That part
  of the code is still buggy; please see some comments below.  I
  suggest removing it entirely; this will simplify things.  But if not:

  - doprnt mishandles format specifications such as %0.0.0d.
    It passes them off to printf, and this results in undefined
    behavior, near as I can tell.

  - doprnt uses atoi (&fmtcpy[1]), but surely this isn't right if
    there are flags such as '-'.

  - Quite possibly there are other problems in this area, but I
    didn't want to spend further time reviewing a never-used feature.

* In this code, in verror:

      used = doprnt (buffer, size, m, m + mlen, ap);

      /* Note: the -1 below is because `doprnt' returns the number of bytes     
         excluding the terminating null byte, and it always terminates with a   
         null byte, even when producing a truncated message.  */
      if (used < size - 1)
        break;

  I don't see the reason for the "- 1".  If you replace this with:

      used = doprnt (buffer, size, m, m + mlen, ap);

      if (used < size)
        break;

  the code should still work, because, when used < size, the buffer
  should be properly null-terminated.  If it isn't then there's something
  wrong with doprnt, no?

* In this code, in verror:

      else if (size < size_max - 1)
        size = size_max - 1;

  there's no need for the "- 1"s.  Just use this:

      else if (size < size_max)
        size = size_max;

* This code in verror:

      if (buffer == buf)
        buffer = (char *) xmalloc (size);
      else
        buffer = (char *) xrealloc (buffer, size);

  uses xrealloc, which is unnecessarily expensive, as it may copy the
  buffer's contents even though they are entirely garbage here.  Use
  this instead, to avoid the useless copy:

      if (buffer != buf)
        xfree (buffer);
      buffer = (char *) xmalloc (size);




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Mon, 25 Apr 2011 09:01:03 GMT) Full text and rfc822 format available.

Message #8 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 8545 <at> debbugs.gnu.org
Subject: Re: issues with recent doprnt-related changes
Date: Mon, 25 Apr 2011 12:00:44 +0300
> Date: Sun, 24 Apr 2011 22:46:33 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: Eli Zaretskii <eliz <at> gnu.org>
> 
> This is a followup to Bug#8435.  Eli invited me to review the recent
> doprnt-related changes, so here's a quick review:

Thanks for the review and the comments.

> * doprnt returns size_t.  But Stefan wrote that he prefers sizes to be
>   signed values, and doprnt always returns a value that can fit in
>   EMACS_INT.  So shouldn't doprnt return EMACS_INT, as it did before?

I made it return size_t because all the related variables in verror
are size_t, and I didn't want to mix signed with unsigned.  AFAIU, the
preference to use signed is for those values that come from Lisp or go
back to the Lisp level, which is not the case here.

But I will let Stefan comment on this.  Changing doprnt to return a
signed value, and making the respective changes in verror, would be
trivial, and I won't mind doing that, if that's the verdict.

> * doprnt supports only a small subset of the standard printf formats,
>   but this subset is not documented.  It's unclear what the subset is.
>   Or it's a superset of the subset, with %S and %l?  Anyway, this
>   should be documented clearly in the lead comment.

I added such a documentation.

> * I suggest that every feature in doprnt be a feature that is actually
>   needed and used; this will simplify maintainance.

I agree, but I didn't add any features, except the support for %ld,
which is surely needed for error messages that show EMACS_INT values.
All the rest was already there in the original code of doprnt.  I see
no reason to remove that code just because no error message currently
uses some of those features.  According to "bzr annotate", most of the
related code in doprnt survived largely untouched since the 1990s,
except some cleanup.  This is no guarantee of being bug-free, of
course, but it does have _some_ weight in my eyes.

> * Format strings never include embedded null bytes, so there's
>   no need for doprnt to support that.

Potentially, someone could call `error' with its first argument taken
from a Lisp string, which could include null characters.  But again,
this feature was there to begin with, and I see no particular need to
remove it.

> * If the format string is too long, the alloca inside doprnt will
>   crash Emacs on some hosts.

You are right.  I modified doprnt to use SAFE_ALLOCA instead.

>                               I suggest removing the alloca,
>   instituting a fixed size limit on format specifiers, and documenting
>   that limit.  Since user code cannot ever supply one of these
>   formats, that should be good enough.

GNU coding standards frown on arbitrary limits, so I didn't want to
take that route, what with SAFE_ALLOCA readily available and easy to
use.

> * The width features of doprnt (e.g., %25s) are never used.

Again, an old feature that I see no reasons to remove.  And, since
doprnt produces error messages meant to be displayed, I find that
this feature actually makes sense.

>   - doprnt mishandles format specifications such as %0.0.0d.
>     It passes them off to printf, and this results in undefined
>     behavior, near as I can tell.

Since both error and verror are now marked as ATTRIBUTE_FORMAT_PRINTF,
the compiler will detect such invalid formats and flag them.  If the
warning is disregarded, the result of such a format is just a somewhat
illegible message.  In any case, vsnprintf would do the same, right?

>   - doprnt uses atoi (&fmtcpy[1]), but surely this isn't right if
>     there are flags such as '-'.

Why not?  In that case, atoi will produce a negative value for
`width', which is already handled by the code.  If I'm missing
something, please point out the specific problems with that.

>   - Quite possibly there are other problems in this area, but I
>     didn't want to spend further time reviewing a never-used feature.

I did read that code.  It looked solid to me, but if you or someone
else see specific problems, please point them out.

> * In this code, in verror:
> 
>       used = doprnt (buffer, size, m, m + mlen, ap);
> 
>       /* Note: the -1 below is because `doprnt' returns the number of bytes     
>          excluding the terminating null byte, and it always terminates with a   
>          null byte, even when producing a truncated message.  */
>       if (used < size - 1)
>         break;
> 
>   I don't see the reason for the "- 1".  If you replace this with:
> 
>       used = doprnt (buffer, size, m, m + mlen, ap);
> 
>       if (used < size)
>         break;
> 
>   the code should still work, because, when used < size, the buffer
>   should be properly null-terminated.  If it isn't then there's something
>   wrong with doprnt, no?

As the comment says, doprnt always null-terminates the result, even if
the result is truncated, and it never returns a value larger than the
buffer size it was given.  (In that, it differs from vsnprintf, which
can return larger values.)  When doprnt does truncate the output
string, it returns `size - 1'; if we compare against `size', we will
happily bail out of the loop, and never try to enlarge the buffer.

I saw no reason to enhance doprnt to continue processing the format
string and the arguments once the buffer is exhausted.  So I modified
verror instead to DTRT.

> * In this code, in verror:
> 
>       else if (size < size_max - 1)
>         size = size_max - 1;
> 
>   there's no need for the "- 1"s.  Just use this:
> 
>       else if (size < size_max)
>         size = size_max;

I made that change, thanks.

The reason I originally limited to `size_max - 1' is that the games
you play with the maximum size, viz.:

  size_t size_max =
    min (MOST_POSITIVE_FIXNUM, min (INT_MAX, SIZE_MAX - 1)) + 1;

are neither clear nor commented.  E.g., why the second `min'? could
INT_MAX be ever larger than SIZE_MAX-1? if so, what does that mean in
terms of relation between `int' and `size_t' on such a platform?

I'm only familiar with a very small number of architectures, where all
these tricks are unnecessary.  When I see such code, it makes me dizzy
and unsure of what I may be missing.  So I opted for a safer way out
(since error messages as long as SIZE_MAX are only theoretically
possible), that would not risk overflowing signed values into the sign
bit.  Perhaps in the future you could comment such obscure code to
make it understandable by mere mortals such as myself.

> * This code in verror:
> 
>       if (buffer == buf)
>         buffer = (char *) xmalloc (size);
>       else
>         buffer = (char *) xrealloc (buffer, size);
> 
>   uses xrealloc, which is unnecessarily expensive, as it may copy the
>   buffer's contents even though they are entirely garbage here.  Use
>   this instead, to avoid the useless copy:
> 
>       if (buffer != buf)
>         xfree (buffer);
>       buffer = (char *) xmalloc (size);

You are right, I made that change.

I believe this takes care of all the imminent problems you discovered
in your review.  Nevertheless, I will leave this bug report open for a
while, to allow you and others to come up with more problems and
suggestions for improvements.

Thanks again for taking the time to review and comment.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Mon, 25 Apr 2011 13:38:02 GMT) Full text and rfc822 format available.

Message #11 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 8545 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Mon, 25 Apr 2011 10:37:44 -0300
>> * doprnt returns size_t.  But Stefan wrote that he prefers sizes to be
>> signed values, and doprnt always returns a value that can fit in
>> EMACS_INT.  So shouldn't doprnt return EMACS_INT, as it did before?
> I made it return size_t because all the related variables in verror
> are size_t, and I didn't want to mix signed with unsigned.  AFAIU, the
> preference to use signed is for those values that come from Lisp or go
> back to the Lisp level, which is not the case here.

Mixing the two is what I find problematic, so if it's size_t all the
way, that's OK.


        Stefan




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Tue, 26 Apr 2011 06:03:01 GMT) Full text and rfc822 format available.

Message #14 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 8545 <at> debbugs.gnu.org
Subject: Re: issues with recent doprnt-related changes
Date: Mon, 25 Apr 2011 23:02:25 -0700
On 04/25/11 02:00, Eli Zaretskii wrote:

>> * Format strings never include embedded null bytes, so there's
>>   no need for doprnt to support that.
> 
> Potentially, someone could call `error' with its first argument taken
> from a Lisp string, which could include null characters.  But again,
> this feature was there to begin with, and I see no particular need to
> remove it.

The feature is buggy, because the code does not check
fmt versus fmt_end every time it increases fmt; it checks
only sometimes.  Hence one can construct examples where
doprnt will overrun the format, e.g., by having '%l' at the
end of the format.

If doprnt were written to expect a null-terminated string, which
is what all its callers pass anyway, it would be simpler and
easier to maintain and would not have this problem.

"%l" is a strange case anyway, since one cannot reliably use
"%l" as an alias for "%d".  For example, the format "%dx" prints
an integer followed by an 'x', but if you try to use "%lx" instead,
it doesn't work.  At least, we should remove "%l" as a format
specifier, as it's a rightly-unused feature and it's just asking
for trouble to try to support it.  This should also fix the
format-overrun bug mentioned earlier.

>> * If the format string is too long, the alloca inside doprnt will
>>   crash Emacs on some hosts.
> 
> You are right.  I modified doprnt to use SAFE_ALLOCA instead.

There's no need for alloca or SAFE_ALLOCA or xmalloc or any
dynamic allocator.  Instead, convert any width and precision
values to integers, and use "*".  For example, if the caller
specifies this:

	"%012345.6789g", 3.14

pass this to sprintf:

	"%0*.*g", 12345, 6789, 3.14

That way, the format string itself has easily-bounded size and
the code never needs to use alloca or xmalloc or whatever.

> Since both error and verror are now marked as ATTRIBUTE_FORMAT_PRINTF,
> the compiler will detect such invalid formats and flag them.

Sure, but in that case why maintain code to implement the two
formats (%S and %l) that are flagged invalid and are never used and
should never be used?

>>   - doprnt uses atoi (&fmtcpy[1]), but surely this isn't right if
>>     there are flags such as '-'.
> 
> Why not?  In that case, atoi will produce a negative value for
> `width', which is already handled by the code.  If I'm missing
> something, please point out the specific problems with that.

I don't see how the negative value is handled correctly.
%-10s means to print a string right-justified, but the code
surely treats it as if it were %0s.  And other flags
are possible, e.g., atoi will parse "%0-3d" as if the
width were zero, but the width is 3 (the "0" is a flag).

A quick second scan found a minor bug in size parsing: the
expression "n >= SIZE_MAX / 10" should be "n > SIZE_MAX / 10".

> The reason I originally limited to `size_max - 1' is that the games
> you play with the maximum size, viz.:
> 
>   size_t size_max =
>     min (MOST_POSITIVE_FIXNUM, min (INT_MAX, SIZE_MAX - 1)) + 1;
> 
> are neither clear nor commented.  E.g., why the second `min'? could
> INT_MAX be ever larger than SIZE_MAX-1? if so, what does that mean in
> terms of relation between `int' and `size_t' on such a platform?

The C Standard allows INT_MAX to be larger than SIZE_MAX - 1, yes.
I don't know of any current targets with that property, but it didn't
hurt to be safe.  The second 'min' was needed because vsnprintf
can't create a string longer than INT_MAX bytes.  Since doprnt doesn't
have that silly limit, the above line should be changed to something
like the following (this time with a comment :-):

  /* Limit the string to sizes that both Emacs and size_t can represent.  */
  size_t size_max = min (MOST_POSITIVE_FIXNUM + 1, SIZE_MAX);


> You are right, I made that change.

Thanks, can you make a similar change inside doprint?  It
also uses xrealloc where xfree+xmalloc would be better.

One other thing, the documentation says that lower-case l
is a flag, but it's a length modifer and not a flag.  It
must be specified after the precision (if given) and
immediately before the conversion specifier character,
and it cannot be intermixed with flags like 0 and - and +.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Tue, 26 Apr 2011 20:26:02 GMT) Full text and rfc822 format available.

Message #17 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 8545 <at> debbugs.gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Tue, 26 Apr 2011 13:25:21 -0700
On 04/25/11 06:37, Stefan Monnier wrote:
>> AFAIU, the
>> > preference to use signed is for those values that come from Lisp or go
>> > back to the Lisp level, which is not the case here.
> Mixing the two is what I find problematic, so if it's size_t all the
> way, that's OK.

Sorry, but I don't see the general principle.  Earlier, it was
thought that emacs_write should return a signed value, because there's
code like (emacs_write (...) != n) in fileio.c, where 'n' is
signed, and signed-versus-unsigned comparison is problematic.
I can certainly understand this point of view.

With doprnt returning size_t, though, we still have this problem.
In eval.c's verror we see this:

  size_t size_max =
    min (MOST_POSITIVE_FIXNUM, min (INT_MAX, SIZE_MAX - 1)) + 1;
  size_t used = ..., size = ...;
  ...
  while (1)
    {
      ...
      if (used < size - 1)
	break;
      if (size <= size_max / 2)
        size *= 2;
      else if (size < size_max)
        size = size_max;
      else
        break;  /* and leave the message truncated */
      ...
    }

Here, the code is carefully comparing a signed value
MOST_POSITIVE_FIXNUM to a possibly-different-width
unsigned value SIZE_MAX - 1, storing the result into an
unsigned variable, and using that unsigned variable.
This comparison happens to be safe, but one has to stare
at it a bit to make sure that the
unsigned-versus-signed comparison isn't bogus.  Why is
this unsigned-versus-signed comparison OK, but the one
with emacs_write problematic?

I'm not saying this to be difficult: I'm just trying to
understand the general principle here.

I thought the point of preferring signed was so that we
didn't have to worry about stuff like the above.  Also I assumed
the idea is that one should be able to compile GCC with -ftrapv
and catch overflow errors.  But if the above code is OK as-is,
then clearly I'm misunderstanding the overall goal here.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Wed, 27 Apr 2011 01:16:02 GMT) Full text and rfc822 format available.

Message #20 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 8545 <at> debbugs.gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Tue, 26 Apr 2011 22:14:54 -0300
> Sorry, but I don't see the general principle.  Earlier, it was
> thought that emacs_write should return a signed value, because there's
> code like (emacs_write (...) != n) in fileio.c, where 'n' is
> signed, and signed-versus-unsigned comparison is problematic.
> I can certainly understand this point of view.

That's the point of view, yes.

> With doprnt returning size_t, though, we still have this problem.

I haven't looked at the code.  I only commented based on Eli's
description, who said that all the relevant code used size_t.
Personally, I'd use `int' for such things, since any message larger than
2GB should be a sign that something went very wrong long before.


        Stefan




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Wed, 27 Apr 2011 20:12:02 GMT) Full text and rfc822 format available.

Message #23 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 8545 <at> debbugs.gnu.org
Subject: Re: issues with recent doprnt-related changes
Date: Wed, 27 Apr 2011 22:34:45 +0300
> Date: Mon, 25 Apr 2011 23:02:25 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: 8545 <at> debbugs.gnu.org
> 
> On 04/25/11 02:00, Eli Zaretskii wrote:
> 
> >> * Format strings never include embedded null bytes, so there's
> >>   no need for doprnt to support that.
> > 
> > Potentially, someone could call `error' with its first argument taken
> > from a Lisp string, which could include null characters.  But again,
> > this feature was there to begin with, and I see no particular need to
> > remove it.
> 
> The feature is buggy, because the code does not check
> fmt versus fmt_end every time it increases fmt; it checks
> only sometimes.

I added more checks, thanks for pointing this out.

> "%l" is a strange case anyway, since one cannot reliably use
> "%l" as an alias for "%d".  For example, the format "%dx" prints
> an integer followed by an 'x', but if you try to use "%lx" instead,
> it doesn't work.  At least, we should remove "%l" as a format
> specifier, as it's a rightly-unused feature and it's just asking
> for trouble to try to support it.

You convinced me, so I removed %l.

> >> * If the format string is too long, the alloca inside doprnt will
> >>   crash Emacs on some hosts.
> > 
> > You are right.  I modified doprnt to use SAFE_ALLOCA instead.
> 
> There's no need for alloca or SAFE_ALLOCA or xmalloc or any
> dynamic allocator.  Instead, convert any width and precision
> values to integers, and use "*".  For example, if the caller
> specifies this:
> 
> 	"%012345.6789g", 3.14
> 
> pass this to sprintf:
> 
> 	"%0*.*g", 12345, 6789, 3.14

I see no reason for such complexity, just to avoid SAFE_ALLOCA.  But
feel free to make this change, if you think it's important enough.

> >>   - doprnt uses atoi (&fmtcpy[1]), but surely this isn't right if
> >>     there are flags such as '-'.
> > 
> > Why not?  In that case, atoi will produce a negative value for
> > `width', which is already handled by the code.  If I'm missing
> > something, please point out the specific problems with that.
> 
> I don't see how the negative value is handled correctly.
> %-10s means to print a string right-justified, but the code
> surely treats it as if it were %0s.

??? %-10s means to print a string LEFT-justified, and the code handles
that in this loop (which runs after the string was copied to its
destination):

	      if (minlen < 0)
		{
		  while (minlen < - width && bufsize > 0)
		    {
		      *bufptr++ = ' ';
		      bufsize--;
		      minlen++;
		    }
		  minlen = 0;
		}

I actually tried using %-30s, and it did work correctly (as did %30s).

>                                       And other flags
> are possible, e.g., atoi will parse "%0-3d" as if the
> width were zero, but the width is 3 (the "0" is a flag).

The code doesn't call atoi for numeric arguments.  It delegates that
case to sprintf, which will handle the likes of %0-3d correctly.  And
for %s and %c the "0" flag is not supported anyway (as stated in the
comments) and GCC flags that with a warning.  So I see no problem
here.

> A quick second scan found a minor bug in size parsing: the
> expression "n >= SIZE_MAX / 10" should be "n > SIZE_MAX / 10".

When they get to messages as long as SIZE_MAX, let them sue me for
taking away one byte.  verror will reject SIZE_MAX-long messages
anyway, so I see no reason to squeeze one more byte here just to throw
it away there.

>   /* Limit the string to sizes that both Emacs and size_t can represent.  */
>   size_t size_max = min (MOST_POSITIVE_FIXNUM + 1, SIZE_MAX);

"MOST_POSITIVE_FIXNUM + 1" is too much, since MOST_POSITIVE_FIXNUM
should be able to cover the terminating null character in Emacs.  So I
used this:

   size_t size_max = min (MOST_POSITIVE_FIXNUM, SIZE_MAX);

> Thanks, can you make a similar change inside doprint?  It
> also uses xrealloc where xfree+xmalloc would be better.

Done.

> One other thing, the documentation says that lower-case l
> is a flag, but it's a length modifer and not a flag.

I fixed the doc on that account.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Wed, 27 Apr 2011 23:52:01 GMT) Full text and rfc822 format available.

Message #26 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 8545 <at> debbugs.gnu.org
Subject: Re: issues with recent doprnt-related changes
Date: Wed, 27 Apr 2011 16:51:06 -0700
On 04/27/11 12:34, Eli Zaretskii wrote:

> I added more checks, thanks for pointing this out.

Thanks, but I don't see the need for this newly-added check:

          if (fmt > format_end)                                                 
            fmt = format_end;                                                   

If fmt is actually greater than format_end, it's pointing past the end
of an object, so the C code is relying on undefined behavior and the
check therefore isn't portable.  But how can fmt ever be greater than
format_end here?  I suggest removing the check.

> ??? %-10s means to print a string LEFT-justified, and the code handles
> that in this loop...
> for %s and %c the "0" flag is not supported anyway (as stated in the
> comments) and GCC flags that with a warning.

Good points both; thanks.

>> A quick second scan found a minor bug in size parsing: the
>> expression "n >= SIZE_MAX / 10" should be "n > SIZE_MAX / 10".
> 
> When they get to messages as long as SIZE_MAX, let them sue me for
> taking away one byte.

It's not a question of saving space at run-time.  It's a question of
helping the reader.  The reader is left wondering: why is that ">="
there?  And why is there another test "n * 10 > SIZE_MAX - (fmt[1] -
'0')" that always returns 0, no matter what?  Code like that will
puzzle future maintainers, who may well mess it up later because they
don't know what's going on.  Also, most printf implementations will
mess up if given a width or precision greater than INT_MAX, so I
suggest not allowing widths or precisions greater than that.  In summary,
I suggest replacing this:

  if (n >= SIZE_MAX / 10
       || n * 10 > SIZE_MAX - (fmt[1] - '0'))
     error ("Format width or precision too large");
 
with this:

  /* Avoid int overflow, because many sprintfs seriously mess up
     with widths or precisions greater than INT_MAX.  Avoid size_t
     overflow, since our counters use size_t.  This test is slightly
     conservative, for speed and simplicity.  */
  if (n >= min (INT_MAX, SIZE_MAX) / 10)
     error ("Format width or precision too large");
 
> "MOST_POSITIVE_FIXNUM + 1" is too much, since MOST_POSITIVE_FIXNUM
> should be able to cover the terminating null character in Emacs.

Why?  Emacs size fields count the bytes in the string, and does not
count the terminating null byte (which is not part of the string).


In briefly reviewing the new version, I found that doprnt
doesn't support the "ll" modifier for "long long", and thus wouldn't
port to platforms that use long long for EMACS_INT.  This is easy to
fix, so I installed a fix for that.  I also found three more problems:

* doprnt invokes strlen to find the length of the format.  The
  vsnprintf code didn't need to do that: it traversed the format once.
  Surely it shouldn't be hard to change doprnt so that it traverses
  the format once rather than twice.

* Sometimes verror will incorrectly truncate a string, even when there
  is plenty of memory.  verror might call doprnt (buffer, SIZE, m, m +
  mlen, ap), and doprnt might discover that a multibyte character is
  chopped in half at the end of the output buffer, and might return
  (say) SIZE - 2.  verror will incorrectly conclude that the output
  was just fine, but it wasn't complete.

* verror might invoke doprnt two or more times, which means that
  doprnt will traverse ap twice.  This does not work in general; the C
  standard is quite clear that the behavior is undefined in this case.
  One way to fix this would be to modify doprnt so that it always
  walks through the format completely, and generates all the output
  that it is going to generate, and that it reallocates the output
  buffer as needed as it goes.  This would require an API change.





Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Thu, 28 Apr 2011 01:34:02 GMT) Full text and rfc822 format available.

Message #29 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Juanma Barranquero <lekktu <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 8545 <at> debbugs.gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Thu, 28 Apr 2011 03:32:23 +0200
On Thu, Apr 28, 2011 at 01:51, Paul Eggert <eggert <at> cs.ucla.edu> wrote:

> If fmt is actually greater than format_end, it's pointing past the end
> of an object, so the C code is relying on undefined behavior and the
> check therefore isn't portable.

I'm no expert on the C standard, but would it be undefined behavior,
as long as the pointer has not been dereferenced? A cursory look
suggests that fmt == format_end + 1 is possible, but fmt is not
dereferenced in that case.

    Juanma




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Thu, 28 Apr 2011 03:13:02 GMT) Full text and rfc822 format available.

Message #32 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Juanma Barranquero <lekktu <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 8545 <at> debbugs.gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Wed, 27 Apr 2011 20:11:52 -0700
On 04/27/11 18:32, Juanma Barranquero wrote:

> A cursory look suggests that fmt == format_end + 1 is possible

Thanks, I had missed that possibility.  (Evidently your cursory looks
are better than mine. :-)  A possible patch is below.

> would it be undefined behavior,
> as long as the pointer has not been dereferenced?

Yes.  A portable C program is not allowed to create a pointer that
doesn't point to an object, with the two exceptions of a null pointer
and a pointer to the address immediately after an object.  On
some architectures, attempting to point to random addresses can cause
exceptions or other undefined behavior.

=== modified file 'src/doprnt.c'
--- src/doprnt.c	2011-04-27 23:04:20 +0000
+++ src/doprnt.c	2011-04-28 03:00:59 +0000
@@ -194,22 +194,21 @@ doprnt (char *buffer, register size_t bu
 		     This might be a field width or a precision; e.g.
 		     %1.1000f and %1000.1f both might need 1000+ bytes.
 		     Parse the width or precision, checking for overflow.  */
-		  size_t n = *fmt - '0';
-		  while (fmt < format_end
-			 && '0' <= fmt[1] && fmt[1] <= '9')
+		  size_t n = *fmt++ - '0';
+		  while (fmt < format_end && '0' <= *fmt && *fmt <= '9')
 		    {
 		      if (n >= SIZE_MAX / 10
 			  || n * 10 > SIZE_MAX - (fmt[1] - '0'))
 			error ("Format width or precision too large");
-		      n = n * 10 + fmt[1] - '0';
-		      *string++ = *++fmt;
+		      n = n * 10 + *fmt - '0';
+		      *string++ = *fmt++;
 		    }
 
 		  if (size_bound < n)
 		    size_bound = n;
 		}
 	      else if (*fmt == '-' || *fmt == ' ' || *fmt == '.' || *fmt == '+')
-		;
+		fmt++;
 	      else if (*fmt == 'l')
 		{
 		  long_flag = 1 + (fmt + 1 < format_end && fmt[1] == 'l');
@@ -218,10 +217,7 @@ doprnt (char *buffer, register size_t bu
 		}
 	      else
 		break;
-	      fmt++;
 	    }
-	  if (fmt > format_end)
-	    fmt = format_end;
 	  *string = 0;
 
 	  /* Make the size bound large enough to handle floating point formats





Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Thu, 28 Apr 2011 03:44:01 GMT) Full text and rfc822 format available.

Message #35 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Juanma Barranquero <lekktu <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 8545 <at> debbugs.gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Thu, 28 Apr 2011 05:42:32 +0200
On Thu, Apr 28, 2011 at 05:11, Paul Eggert <eggert <at> cs.ucla.edu> wrote:

>> would it be undefined behavior,
>> as long as the pointer has not been dereferenced?
>
> Yes.  A portable C program is not allowed to create a pointer that
> doesn't point to an object, with the two exceptions of a null pointer
> and a pointer to the address immediately after an object.

That's weird, because it would mean that every pointer variable must
be initialized (either explicitly to some value, or implicitly to the
null pointer), or else the program will have undefined behavior.

Anyway, in this case fmt == format_end + 1 would point to the address
immediately after an object, wouldn't it?

> On
> some architectures, attempting to point to random addresses can cause
> exceptions or other undefined behavior.

On dereferencing, sure. But just on assignment to the pointer variable?

    Juanma




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Thu, 28 Apr 2011 05:03:01 GMT) Full text and rfc822 format available.

Message #38 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Juanma Barranquero <lekktu <at> gmail.com>
Cc: 8545 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Thu, 28 Apr 2011 01:02:13 -0400
> From: Juanma Barranquero <lekktu <at> gmail.com>
> Date: Thu, 28 Apr 2011 03:32:23 +0200
> Cc: Eli Zaretskii <eliz <at> gnu.org>, 8545 <at> debbugs.gnu.org
> 
> On Thu, Apr 28, 2011 at 01:51, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> 
> > If fmt is actually greater than format_end, it's pointing past the end
> > of an object, so the C code is relying on undefined behavior and the
> > check therefore isn't portable.
> 
> I'm no expert on the C standard, but would it be undefined behavior,
> as long as the pointer has not been dereferenced? A cursory look
> suggests that fmt == format_end + 1 is possible, but fmt is not
> dereferenced in that case.

My (not-so cursory) look at the code suggests that we do dereference
it, in this fragment:

	  switch (*fmt++)
	    {
	    default:
	      error ("Invalid format operation %%%s%c",
		     long_flag ? "l" : "", fmt[-1]);

If fmt > format_end, this will dereference the address beyond
format_end.  I thought showing the last character of the format string
itself is a better idea.  It is also exactly equivalent to what the
code will do and display when *format_end == '\0'.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Thu, 28 Apr 2011 05:07:02 GMT) Full text and rfc822 format available.

Message #41 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Juanma Barranquero <lekktu <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 8545 <at> debbugs.gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Wed, 27 Apr 2011 22:06:23 -0700
On 04/27/11 20:42, Juanma Barranquero wrote:

> in this case fmt == format_end + 1 would point to the address
> immediately after an object, wouldn't it?

No, format_end is already pointing after the object;
the object's size is format_end - format.  So
format_end + 1 might not be a valid pointer.

> That's weird, because it would mean that every pointer variable must
> be initialized (either explicitly to some value, or implicitly to the
> null pointer), or else the program will have undefined behavior.

No, undefined behavior occurs only when an (invalid)
pointer value is created (e.g., by casting from integer, or by
adding to another pointer variable), or copied.  It doesn't occur
merely because storage is allocated for a pointer variable.

In this respect, it's like creating an (invalid) integer value.
If you assign i = INT_MAX + 1, the resulting behavior is undefined.
It's the same if you assign p = &x + 2.  That doesn't mean
"char *p;" has undefined behavior, any more than "int i;" does.

> On dereferencing, sure. But just on assignment to the pointer variable?

Yes.  To take an extreme example, some architectures can compute
a pointer only by using a special pointer register, and the register's
contents are always checked for validity, even if you don't dereference the
pointer.  I don't know whether Emacs has been ported to these machines,
but there are also problems with pointers wrapping around even on
more-conventional architectures.

This issue is covered by one of the questions in the C FAQ; see
<http://www.c-faq.com/aryptr/non0based.html>.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Thu, 28 Apr 2011 05:16:02 GMT) Full text and rfc822 format available.

Message #44 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: lekktu <at> gmail.com, 8545 <at> debbugs.gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Thu, 28 Apr 2011 01:15:28 -0400
> Date: Wed, 27 Apr 2011 20:11:52 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: Eli Zaretskii <eliz <at> gnu.org>, 8545 <at> debbugs.gnu.org
> 
> On 04/27/11 18:32, Juanma Barranquero wrote:
> 
> > A cursory look suggests that fmt == format_end + 1 is possible
> 
> Thanks, I had missed that possibility.  (Evidently your cursory looks
> are better than mine. :-)  A possible patch is below.

I strenuously object to that patch, see below.  Please don't install
it.

> > would it be undefined behavior,
> > as long as the pointer has not been dereferenced?
> 
> Yes.  A portable C program is not allowed to create a pointer that
> doesn't point to an object, with the two exceptions of a null pointer
> and a pointer to the address immediately after an object.  On
> some architectures, attempting to point to random addresses can cause
> exceptions or other undefined behavior.

As I explain in another message, we _can_ dereference this invalid
pointer.  Which is why that test was added in the first place.

> -		  size_t n = *fmt - '0';
> -		  while (fmt < format_end
> -			 && '0' <= fmt[1] && fmt[1] <= '9')
> +		  size_t n = *fmt++ - '0';
> +		  while (fmt < format_end && '0' <= *fmt && *fmt <= '9')
>  		    {
>  		      if (n >= SIZE_MAX / 10
>  			  || n * 10 > SIZE_MAX - (fmt[1] - '0'))
>  			error ("Format width or precision too large");
> -		      n = n * 10 + fmt[1] - '0';
> -		      *string++ = *++fmt;
> +		      n = n * 10 + *fmt - '0';
> +		      *string++ = *fmt++;
>  		    }
>  
>  		  if (size_bound < n)
>  		    size_bound = n;
>  		}
>  	      else if (*fmt == '-' || *fmt == ' ' || *fmt == '.' || *fmt == '+')
> -		;
> +		fmt++;
>  	      else if (*fmt == 'l')
>  		{
>  		  long_flag = 1 + (fmt + 1 < format_end && fmt[1] == 'l');
> @@ -218,10 +217,7 @@ doprnt (char *buffer, register size_t bu
>  		}
>  	      else
>  		break;
> -	      fmt++;
>  	    }
> -	  if (fmt > format_end)
> -	    fmt = format_end;

I don't see how this is a better idea.  Instead of a simple two-liner,
which could be commented if its intent isn't clear enough, and which
makes the code 100% verifiable to not dereference anything beyond
format_end, how is it better to sprinkle weird post-increments in
several places?  This totally obfuscates the intent, does NOT allow to
comment it in any reasonable way (because the increments are no longer
in a single place, and serve more than one purpose), makes the code
much harder to grasp and analyze, and makes it almost impossible to
convince ourselves that it will never get past format_end without
unduly complicated analysis.  All that for getting rid of a simple and
clearly correct line??  No, thanks!




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Thu, 28 Apr 2011 05:30:03 GMT) Full text and rfc822 format available.

Message #47 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: lekktu <at> gmail.com, 8545 <at> debbugs.gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Wed, 27 Apr 2011 22:29:25 -0700
On 04/27/11 22:15, Eli Zaretskii wrote:
> As I explain in another message, we _can_ dereference this invalid
> pointer.

Sorry, I'm not quite following, since I'm not sure what
the "another message" refers to.

Hmm, perhaps you're talking about this pattern in the code?

        while (fmt < format_end)
	  { ... fmt++ ... }
        switch (*fmt++)

Here, the code is dereferencing *format_end,
which means it's dereferencing one past the end of the
format string that is passed to it.  This is normally
not how buffers are used in C: normally, the pointer to
the end of a buffer is intended to point "one past" the
last byte of the buffer, and is not intended to be dereferenced.

If the intent here is that one should call doprnt with
the pattern (doprnt (A, ASIZE, B, B + BSIZE - 1, AP)) then
I suggest that the point be made clearly in doprnt's comment,
as part of doprnt's API, to prevent future confusion in
this area.




Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Thu, 28 Apr 2011 05:51:03 GMT) Full text and rfc822 format available.

Notification sent to Paul Eggert <eggert <at> cs.ucla.edu>:
bug acknowledged by developer. (Thu, 28 Apr 2011 05:51:03 GMT) Full text and rfc822 format available.

Message #52 received at 8545-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 8545-done <at> debbugs.gnu.org
Subject: Re: issues with recent doprnt-related changes
Date: Thu, 28 Apr 2011 01:50:53 -0400
> Date: Wed, 27 Apr 2011 16:51:06 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: 8545 <at> debbugs.gnu.org
> 
> >> A quick second scan found a minor bug in size parsing: the
> >> expression "n >= SIZE_MAX / 10" should be "n > SIZE_MAX / 10".
> > 
> > When they get to messages as long as SIZE_MAX, let them sue me for
> > taking away one byte.
> 
> It's not a question of saving space at run-time.  It's a question of
> helping the reader.  The reader is left wondering: why is that ">="
> there?

The reader will be wondering with ">" as well.  There's a comment
about checking for overflow which should be a good hint, especially
since SIZE_MAX is compared against.

>  And why is there another test "n * 10 > SIZE_MAX - (fmt[1] -
> '0')" that always returns 0, no matter what?

??? What happens if n*10 is SIZE_MAX-1 and fmt[1] is '2'?  Is the
result still zero?

>   /* Avoid int overflow, because many sprintfs seriously mess up
>      with widths or precisions greater than INT_MAX.  Avoid size_t
>      overflow, since our counters use size_t.  This test is slightly
>      conservative, for speed and simplicity.  */
>   if (n >= min (INT_MAX, SIZE_MAX) / 10)
>      error ("Format width or precision too large");

Sorry, I don't see how this is clearer.  The current code after the
test is built out of the same building blocks as the test, and
therefore the intent and the details of the test are easier to
understand than with your variant, which perhaps is mathematically and
numerically equivalent, but makes the code reading _harder_ because it
severs the syntactical connection between the two.

> > "MOST_POSITIVE_FIXNUM + 1" is too much, since MOST_POSITIVE_FIXNUM
> > should be able to cover the terminating null character in Emacs.
> 
> Why?  Emacs size fields count the bytes in the string, and does not
> count the terminating null byte (which is not part of the string).

That's not what I know.  String positions are zero-based and extend to
include the terminating null character.  See the relevant parts of the
display engine code.

> * doprnt invokes strlen to find the length of the format.  The
>   vsnprintf code didn't need to do that: it traversed the format once.
>   Surely it shouldn't be hard to change doprnt so that it traverses
>   the format once rather than twice.

doprnt is invoked in the context of displaying an error message that
throws to top level, and so it doesn't need to be optimized (which
will surely make the code more complex and error-prone, and its
maintenance harder).

> * Sometimes verror will incorrectly truncate a string, even when there
>   is plenty of memory.  verror might call doprnt (buffer, SIZE, m, m +
>   mlen, ap), and doprnt might discover that a multibyte character is
>   chopped in half at the end of the output buffer, and might return
>   (say) SIZE - 2.  verror will incorrectly conclude that the output
>   was just fine, but it wasn't complete.

Not an issue, what with the initial buffer size you enlarged to 4000.
I needed to artificially lower it to just 2 dozen bytes, just to see
the recovery code in action.  If someone wants to display a 4001-byte
message that ends with a multibyte non-ASCII character, let them be
punished for not knowing how to write concisely.

> * verror might invoke doprnt two or more times, which means that
>   doprnt will traverse ap twice.  This does not work in general; the C
>   standard is quite clear that the behavior is undefined in this case.

Are there any platforms supported by Emacs where this actually
happens?  If not, let's forget about this issue until it hits us.

I'm closing this bug.  We are already well past any real problems, and
invested too much energy and efforts of two busy people on this tiny
function, all because of your stubborn insistence on using a library
function where it doesn't fit the bill.  I hope you now have more
respect for views and code of others in general, and mine in
particular, so we won't need to go through this painful experience
again in the future.

Let's move on; I still need to work on the bidirectional display of
overlay strings and display properties, a job that was already delayed
by several precious days due to these disputes and the gratuitous work
on the code that should have been left alone in the first place.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Thu, 28 Apr 2011 06:12:01 GMT) Full text and rfc822 format available.

Message #55 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: lekktu <at> gmail.com, 8545 <at> debbugs.gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Thu, 28 Apr 2011 02:10:55 -0400
> Date: Wed, 27 Apr 2011 22:29:25 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: lekktu <at> gmail.com, 8545 <at> debbugs.gnu.org
> 
> On 04/27/11 22:15, Eli Zaretskii wrote:
> > As I explain in another message, we _can_ dereference this invalid
> > pointer.
> 
> Sorry, I'm not quite following, since I'm not sure what
> the "another message" refers to.

If you didn't receive it, you will find it filed in the bug tracker.

> Hmm, perhaps you're talking about this pattern in the code?
> 
>         while (fmt < format_end)
> 	  { ... fmt++ ... }
>         switch (*fmt++)

Yes, the loop (which increments the pointer more than once), the
reference with postincrement in the switch statement, and the
following dereference in fmt[-1] in the call to `error'.

> Here, the code is dereferencing *format_end,
> which means it's dereferencing one past the end of the
> format string that is passed to it.

No, it can dereference *(format_end+1).

> If the intent here is that one should call doprnt with
> the pattern (doprnt (A, ASIZE, B, B + BSIZE - 1, AP)) then
> I suggest that the point be made clearly in doprnt's comment,
> as part of doprnt's API, to prevent future confusion in
> this area.

No, it should be called as B+BSIZE.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Thu, 28 Apr 2011 06:44:02 GMT) Full text and rfc822 format available.

Message #58 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: lekktu <at> gmail.com, 8545 <at> debbugs.gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Wed, 27 Apr 2011 23:42:57 -0700
On 04/27/11 23:10, Eli Zaretskii wrote:

> No, it can dereference *(format_end+1).
> 
>> If the intent here is that one should call doprnt with
>> the pattern (doprnt (A, ASIZE, B, B + BSIZE - 1, AP)) then
>> I suggest that the point be made clearly in doprnt's comment,
>> as part of doprnt's API, to prevent future confusion in
>> this area.
> 
> No, it should be called as B+BSIZE.

OK, but format_end == B + BSIZE.
So if doprnt (A, ASIZE, B, B + BSIZE, AP) can dereference format_end + 1,
this means doprnt can access B[BSIZE + 1], which means that
B should point to a char array of at least BSIZE + 2 bytes.

Normally, B is a C-language string literal such as "abc%d",
and BSIZE is the length of the string, which means
there is potential trouble because normally code
should not try to read the byte that follows the null
byte at the end of the string.

I expect that the cases where doprnt actually accesses B[BSIZE + 1]
are rare, and don't currently happen in practice; still, this is a confusing
area and whatever constraints are actually placed on doprnt's caller
should be made clear in the doprnt documentation, so that others
are warned about the situation and don't make the mistake
of passing formats that could cause problems.




Message #59 received at 8545-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 8545-done <at> debbugs.gnu.org
Subject: Re: issues with recent doprnt-related changes
Date: Thu, 28 Apr 2011 00:17:01 -0700
On 04/27/11 22:50, Eli Zaretskii wrote:

>> * verror might invoke doprnt two or more times, which means that
>>   doprnt will traverse ap twice.  This does not work in general; the C
>>   standard is quite clear that the behavior is undefined in this case.
> 
> Are there any platforms supported by Emacs where this actually
> happens? 

Yes, absolutely.  The bug happens on the platform I normally develop
Emacs on, namely RHEL 5.6 (x86-64).

> If someone wants to display a 4001-byte
> message that ends with a multibyte non-ASCII character, let them be
> punished for not knowing how to write concisely.

It's not merely a question of the format string itself.
It could be that there's an error message that formats long file names
that contain non-ASCII characters, or something else like that;
the point is that the file names themselves are not under the Emacs
programmer's control.

And anyway, verror should not have arbitrary limits like 4000 bytes;
that's too small.

> I'm closing this bug.

That doesn't sound right.  doprnt has real bugs, which will bite
users when they process large but plausible strings.  These bugs
should be fixed.  Could you please reopen the bug report?

> all because of your stubborn insistence on using a library
> function where it doesn't fit the bill.

This misrepresents my position.  I already agreed with
you that it's OK for Emacs to use its own doprnt function
<http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8435#58>,
and I have not insisted on using any particular library function.
Please don't personalize this bug report with phrases like
"your stubborn insistence", and please do not ascribe to me
motivations that I do not have.

>>  And why is there another test "n * 10 > SIZE_MAX - (fmt[1] -
>> '0')" that always returns 0, no matter what?
> 
> ??? What happens if n*10 is SIZE_MAX-1 and fmt[1] is '2'?

n * 10 cannot possibly be SIZE_MAX - 1, because at that
point n < SIZE_MAX / 10, which means n * 10 is at most
SIZE_MAX - SIZE_MAX % 10 - 10.  fmt[1] is an ASCII digit,
so n * 10 therefore cannot possibly be greater than
SIZE_MAX - (fmt[1] - '0').  Hence the expression
n * 10 > SIZE_MAX - (fmt[1] - '0') always returns zero.

>>   /* Avoid int overflow, because many sprintfs seriously mess up
>>      with widths or precisions greater than INT_MAX.  Avoid size_t
>>      overflow, since our counters use size_t.  This test is slightly
>>      conservative, for speed and simplicity.  */
>>   if (n >= min (INT_MAX, SIZE_MAX) / 10)
>>      error ("Format width or precision too large");
> 
> Sorry, I don't see how this is clearer.

It's clearer because it says that it's intended that the test is
conservative, and that the test might sometimes err on the side of
rejecting widths or precisions even though they are not too large
for the code to represent.  Currently, this is not clear from the
code and its comments.

>> Emacs size fields count the bytes in the string, and does not
>> count the terminating null byte (which is not part of the string).
> 
> String positions are zero-based and extend to
> include the terminating null character.  See the relevant parts of the
> display engine code.

Yes, a position can point to the terminating null character, but if that
position is N, then the string contains N bytes (not counting the
terminating null).  So I still don't see the problem here.  Could you
please identify a part of the display engine code that is going past
the terminating null character, so that I can understand the situation
better?




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Thu, 28 Apr 2011 07:27:02 GMT) Full text and rfc822 format available.

Message #62 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: lekktu <at> gmail.com, 8545 <at> debbugs.gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Thu, 28 Apr 2011 03:26:28 -0400
> Date: Wed, 27 Apr 2011 23:42:57 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: lekktu <at> gmail.com, 8545 <at> debbugs.gnu.org
> 
> OK, but format_end == B + BSIZE.
> So if doprnt (A, ASIZE, B, B + BSIZE, AP) can dereference format_end + 1,
> this means doprnt can access B[BSIZE + 1], which means that
> B should point to a char array of at least BSIZE + 2 bytes.

With the original code, that was the case, yes.  But that is why I
forcibly reset fmt to point to format_end: to avoid dereferencing past
the end of the array.

If you are saying that such invalid dereferencing can still happen,
please show how is that possible, with the code that is now in the
repository.

> Normally, B is a C-language string literal such as "abc%d",
> and BSIZE is the length of the string, which means
> there is potential trouble because normally code
> should not try to read the byte that follows the null
> byte at the end of the string.

That trouble shouldn't happen with the code in the repository.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Thu, 28 Apr 2011 07:55:02 GMT) Full text and rfc822 format available.

Message #65 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: lekktu <at> gmail.com, 8545 <at> debbugs.gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Thu, 28 Apr 2011 00:54:21 -0700
On 04/28/11 00:26, Eli Zaretskii wrote:
> If you are saying that such invalid dereferencing can still happen,
> please show how is that possible, with the code that is now in the
> repository.

Sorry, I misunderstood your earlier comment to mean that
doprnt can now compute *(format_end+1).

If all that doprnt does is compute *format_end (or format_end+1
without dereferencing format_end+1), and if the documentation
notes that format_end must point to a character
(format_end cannot point to one-past-the-buffer-end,
which is what I expected), then that part's OK.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Thu, 28 Apr 2011 11:15:02 GMT) Full text and rfc822 format available.

Message #68 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: lekktu <at> gmail.com, 8545 <at> debbugs.gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Thu, 28 Apr 2011 07:14:14 -0400
> Date: Thu, 28 Apr 2011 00:54:21 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: lekktu <at> gmail.com, 8545 <at> debbugs.gnu.org
> 
> If all that doprnt does is compute *format_end (or format_end+1
> without dereferencing format_end+1), and if the documentation
> notes that format_end must point to a character
> (format_end cannot point to one-past-the-buffer-end,
> which is what I expected), then that part's OK.

Yes, that's how things are.  format_end points to the last byte of the
buffer passed to doprnt.




Message #69 received at 8545-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 8545-done <at> debbugs.gnu.org
Subject: Re: issues with recent doprnt-related changes
Date: Thu, 28 Apr 2011 08:39:38 -0400
> Date: Thu, 28 Apr 2011 00:17:01 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: 8545-done <at> debbugs.gnu.org
> 
> On 04/27/11 22:50, Eli Zaretskii wrote:
> 
> >> * verror might invoke doprnt two or more times, which means that
> >>   doprnt will traverse ap twice.  This does not work in general; the C
> >>   standard is quite clear that the behavior is undefined in this case.
> > 
> > Are there any platforms supported by Emacs where this actually
> > happens? 
> 
> Yes, absolutely.  The bug happens on the platform I normally develop
> Emacs on, namely RHEL 5.6 (x86-64).

What is the implementation of va_start on that platform?  Is stashing
away the ap argument all that's required here?

> > If someone wants to display a 4001-byte
> > message that ends with a multibyte non-ASCII character, let them be
> > punished for not knowing how to write concisely.
> 
> It's not merely a question of the format string itself.
> It could be that there's an error message that formats long file names
> that contain non-ASCII characters, or something else like that;
> the point is that the file names themselves are not under the Emacs
> programmer's control.

If that's what you mean, it's easy to fix.  Done.

> > I'm closing this bug.
> 
> That doesn't sound right.  doprnt has real bugs, which will bite
> users when they process large but plausible strings.  These bugs
> should be fixed.  Could you please reopen the bug report?

You don't need me to reopen a bug, and you don't need an open bug
report to work on some code, if you feel like it.

> > all because of your stubborn insistence on using a library
> > function where it doesn't fit the bill.
> 
> This misrepresents my position.  I already agreed with
> you that it's OK for Emacs to use its own doprnt function
> <http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8435#58>,
> and I have not insisted on using any particular library function.
> Please don't personalize this bug report with phrases like
> "your stubborn insistence", and please do not ascribe to me
> motivations that I do not have.

I have no idea what are your motivations.  All I know is what you
write and I read, and how you act.  I see that and I make my
conclusions.  If you don't want your positions misrepresented, try
making your words and deeds speak unequivocally about them.  E.g., try
insisting less that you are always right and others are always wrong,
when judging merits and demerits of alternative solutions to problems.

IOW, this is primarily a social activity.  Good spirit around here can
easily be destroyed by lack of respect to dissenting opinions.  Trying
to reach some kind of consensus, and if that's not possible, some
compromise, is a much better way of having your positions and your
motivations respected.

And that is the last time I'm talking about this in this regard.

> >>  And why is there another test "n * 10 > SIZE_MAX - (fmt[1] -
> >> '0')" that always returns 0, no matter what?
> > 
> > ??? What happens if n*10 is SIZE_MAX-1 and fmt[1] is '2'?
> 
> n * 10 cannot possibly be SIZE_MAX - 1, because at that
> point n < SIZE_MAX / 10, which means n * 10 is at most
> SIZE_MAX - SIZE_MAX % 10 - 10.  fmt[1] is an ASCII digit,
> so n * 10 therefore cannot possibly be greater than
> SIZE_MAX - (fmt[1] - '0').  Hence the expression
> n * 10 > SIZE_MAX - (fmt[1] - '0') always returns zero.

So now you suddenly realize that the ">=" in the first part of that if
clause was a good idea after all, yes?

> >>   /* Avoid int overflow, because many sprintfs seriously mess up
> >>      with widths or precisions greater than INT_MAX.  Avoid size_t
> >>      overflow, since our counters use size_t.  This test is slightly
> >>      conservative, for speed and simplicity.  */
> >>   if (n >= min (INT_MAX, SIZE_MAX) / 10)
> >>      error ("Format width or precision too large");
> > 
> > Sorry, I don't see how this is clearer.
> 
> It's clearer because it says that it's intended that the test is
> conservative

The intent of the test is clear, but its relation to what is done if
the test succeeds is not.

> and that the test might sometimes err on the side of rejecting
> widths or precisions even though they are not too large for the code
> to represent.

Precisely my motivation to include a possibly redundant test which you
so eloquently argue against above.  Can we have some consistency here,
please?  Can you please judge your own arguments like you judge those
of others?

> >> Emacs size fields count the bytes in the string, and does not
> >> count the terminating null byte (which is not part of the string).
> > 
> > String positions are zero-based and extend to
> > include the terminating null character.  See the relevant parts of the
> > display engine code.
> 
> Yes, a position can point to the terminating null character, but if that
> position is N, then the string contains N bytes (not counting the
> terminating null).

The issue is not whether it includes the null or not.  The issue is
the range of values that an EMACS_INT data type must be able to
represent, and the relation of that range to the maximum number of
bytes in a string and to the largest possible string byte position
that Emacs is able to handle.

> Could you please identify a part of the display engine code that is
> going past the terminating null character, so that I can understand
> the situation better?

Start with reseat_to_string and then read (the relevant parts of)
set_iterator_to_next and next_element_from_c_string, for example.




Message #70 received at 8545-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 8545-done <at> debbugs.gnu.org
Subject: Re: issues with recent doprnt-related changes
Date: Thu, 28 Apr 2011 15:11:11 -0700
> If that's what you mean, it's easy to fix.  Done.

If I understand things correctly, that fix (in bzr 104036) handles the
case where the format itself has a non-ASCII character that is
truncated, but it doesn't handle the case where the format is
something like "file name = %s", and %s expands to a long file name
that is truncated.  If so, surely that case still needs to be fixed.


>>>> * verror might invoke doprnt two or more times, which means that
>>>>   doprnt will traverse ap twice.  This does not work in general; the C
>>>>   standard is quite clear that the behavior is undefined in this case.
>>>
>>> Are there any platforms supported by Emacs where this actually
>>> happens?
>>
>> Yes, absolutely.  The bug happens on the platform I normally develop
>> Emacs on, namely RHEL 5.6 (x86-64).
>
> What is the implementation of va_start on that platform?

Sorry, I don't know.

> Is stashing away the ap argument all that's required here?

Yes, portable code is supposed to use va_copy.  Code that traverses
through an argument list N times can call va_start once, va_copy N - 1
times, and va_end N times (once on the original, once on each copy).

va_copy is a C99-ism, but we can use it as-is in Emacs source code,
and use the relevant gnulib module for obsolete platforms that lack it.
Do the DOS and NT ports have va_copy?  If not, it should be simple
to supply a substitute.


> You don't need me to reopen a bug, and you don't need an open bug
> report to work on some code, if you feel like it.

OK, thanks, I'll see if I can reopen the bug and install a fix or two.


>>>>  And why is there another test "n * 10 > SIZE_MAX - (fmt[1] -
>>>> '0')" that always returns 0, no matter what?
>>>
>>> ??? What happens if n*10 is SIZE_MAX-1 and fmt[1] is '2'?
>>
>> n * 10 cannot possibly be SIZE_MAX - 1, because at that
>> point n < SIZE_MAX / 10, which means n * 10 is at most
>> SIZE_MAX - SIZE_MAX % 10 - 10.  fmt[1] is an ASCII digit,
>> so n * 10 therefore cannot possibly be greater than
>> SIZE_MAX - (fmt[1] - '0').  Hence the expression
>> n * 10 > SIZE_MAX - (fmt[1] - '0') always returns zero.
>
> So now you suddenly realize that the ">=" in the first part of that if
> clause was a good idea after all, yes?

I already indicated that I understand how that first part works, in
<http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8545#26>.  But with the
">=" in place, the second part of that 'if' clause always returns 0:
that's unnecessarily confusing, since it misleads the reader into
thinking that the second part is needed, when it's not.

In other words, if "digit" is in the range 0 through 9, then this:

   if (n > SIZE_MAX / 10 || n * 10 > SIZE_MAX - digit) error (...);

is exact but complicated, and this:

   if (n >= SIZE_MAX / 10) error (...);

is simple but slightly conservative (it reports overflow for a few
very large but not overflowed numbers, but that's good enough here).
Finally, this:

   if (n >= SIZE_MAX / 10 || n * 10 > SIZE_MAX - digit) error (...);

is is neither exact nor simple, and is more confusing than either of
the above; there's nothing to recommend it over the other two.

Also, as I mentioned earlier, the code should also check for 'int'
overflow, as most sprintf implementations mess up with widths or
precisions greater than INT_MAX.  I'll check in a fix for that, with a
comment that tries to explain the situation and responds to your other
remarks about clarity here; please feel free to improve it.

Another possibility is to remove the 'if' test entirely, making it the
caller's responsibility to not specify outlandish widths in format
strings.  That would make the code simpler yet.  The other overflow
checks in doprnt would have to stay, since they depend on user data,
but this overflow check is superfluous.


> The issue is not whether it includes the null or not.  The issue is
> the range of values that an EMACS_INT data type must be able to
> represent, and the relation of that range to the maximum number of
> bytes in a string and to the largest possible string byte position
> that Emacs is able to handle.

OK, thanks.  I read the code, and if I understand it correctly, since
'point' is 1-origin, a buffer with MOST_POSITIVE_FIXNUM characters
will have values of 'point' ranging from 1 through
MOST_POSITIVE_FIXNUM + 1, but that "+ 1" would mean Fpoint wouldn't
work: so we should limit buffers to contain at most
MOST_POSITIVE_FIXNUM - 1 bytes.

Is it also the case that Emacs should limit strings to at most
MOST_POSITIVE_FIXNUM - 1 bytes?  Sorry, I couldn't tell this from the
functions you mentioned; there's a lot of code there, and this stuff
isn't immediately obvious.

If so, then should this doprnt code:

              tem = strlen (string);
              if (tem > MOST_POSITIVE_FIXNUM)
                error ("Format width or precision too large");
 
use ">=" rather than ">"?  There are two instances of this
sort of thing.




Did not alter fixed versions and reopened. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 28 Apr 2011 22:14:02 GMT) Full text and rfc822 format available.

Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Fri, 29 Apr 2011 11:18:02 GMT) Full text and rfc822 format available.

Message #75 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 8545 <at> debbugs.gnu.org
Subject: Re: issues with recent doprnt-related changes
Date: Fri, 29 Apr 2011 14:16:57 +0300
> Date: Thu, 28 Apr 2011 15:11:11 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: 8545-done <at> debbugs.gnu.org
> 
> > If that's what you mean, it's easy to fix.  Done.
> 
> If I understand things correctly, that fix (in bzr 104036) handles the
> case where the format itself has a non-ASCII character that is
> truncated, but it doesn't handle the case where the format is
> something like "file name = %s", and %s expands to a long file name
> that is truncated.  If so, surely that case still needs to be fixed.

Ah, yes.  Missed one more place where this truncation could happen.
Should be fixed now (including an older bug with that code).

> Yes, portable code is supposed to use va_copy.  Code that traverses
> through an argument list N times can call va_start once, va_copy N - 1
> times, and va_end N times (once on the original, once on each copy).
> 
> va_copy is a C99-ism, but we can use it as-is in Emacs source code,
> and use the relevant gnulib module for obsolete platforms that lack it.
> Do the DOS and NT ports have va_copy?  If not, it should be simple
> to supply a substitute.

The MS-DOS and MinGW builds use GCC, so they have va_copy by
definition.  MSVC doesn't, but we can provide a trivial definition
which will work for x86.  If we still support MSVC by the time Emacs
can be built as a 64-bit executable on Windows, and if MSVC still
doesn't have va_copy by that time, we can handle this better at that
time.

> Another possibility is to remove the 'if' test entirely, making it the
> caller's responsibility to not specify outlandish widths in format
> strings.

I don't think this is a good idea.  verror is in many cases the last
line of defense, so it should IMO be rock-stable and try very hard to
emit something useful even in the most improbable situations.

For that reason, I also don't like the calls to `abort' you
introduced.  I understand the motivation (detection of invalid Emacs
code), but why not make it call `error' instead, like we do here:

	  switch (*fmt++)
	    {
	    default:
	      error ("Invalid format operation %%%s%c",
		     "ll" + 2 - long_flag, fmt[-1]);

After all, using %ll when the long long data type isn't supported is
not different from using %a or some other unsupported format letter,
right?

> OK, thanks.  I read the code, and if I understand it correctly, since
> 'point' is 1-origin, a buffer with MOST_POSITIVE_FIXNUM characters
> will have values of 'point' ranging from 1 through
> MOST_POSITIVE_FIXNUM + 1, but that "+ 1" would mean Fpoint wouldn't
> work: so we should limit buffers to contain at most
> MOST_POSITIVE_FIXNUM - 1 bytes.

I guess so, yes.  I would like to have other opinions, though, so I
will start a new thread on emacs-devel about that.

> Is it also the case that Emacs should limit strings to at most
> MOST_POSITIVE_FIXNUM - 1 bytes?

Only if we are thinking about copying a MOST_POSITIVE_FIXNUM-long
string into a buffer.  Otherwise, since string positions are
zero-based, I think strings are safe with MOST_POSITIVE_FIXNUM.

> Sorry, I couldn't tell this from the functions you mentioned;
> there's a lot of code there, and this stuff isn't immediately
> obvious.

Yeah, tell me about that.  I've been hacking that code extensively for
the last year and a half, and I still don't always know who's who ;-)




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Fri, 29 Apr 2011 12:29:01 GMT) Full text and rfc822 format available.

Message #78 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: lekktu <at> gmail.com, 8545 <at> debbugs.gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Fri, 29 Apr 2011 08:28:35 -0400
    No, format_end is already pointing after the object;
    the object's size is format_end - format.  So
    format_end + 1 might not be a valid pointer.

20 years ago,, Emacs allocated an extra byte with a null character
after the end of every string or buffer.  If that is still true
then this pointer actually is valid.

But it doesn't matter anyway, for reasons described below.

    Yes.  A portable C program is not allowed to create a pointer that
    doesn't point to an object,

Our C programs are allowed to create any sort of pointer we want them
to.  ISO has no authority over us.  We are concerned with standards
insofar as they matter in practice for the convenience and reliability
of our software.

Thus, the question that matters to us is whether a construct is going
to cause a problem on the plausibke platforms we will want to support.
That does not include all theoretical ISO C implementations.

    Yes.  To take an extreme example, some architectures can compute
    a pointer only by using a special pointer register, and the register's
    contents are always checked for validity, even if you don't dereference the
    pointer.

They are outside GNU's design range of targets.

    If you assign i = INT_MAX + 1, the resulting behavior is undefined.

The result is INT_MIN.  We don't try to support any theoretical machine
where this would not be so.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org, www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use free telephony http://directory.fsf.org/category/tel/




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Fri, 29 Apr 2011 14:42:01 GMT) Full text and rfc822 format available.

Message #81 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 8545 <at> debbugs.gnu.org
Subject: Re: issues with recent doprnt-related changes
Date: Fri, 29 Apr 2011 07:41:40 -0700
On 04/29/11 04:16, Eli Zaretskii wrote:
> I also don't like the calls to `abort' you
> introduced.  I understand the motivation (detection of invalid Emacs
> code), but why not make it call `error' instead

Yes, that would be fine.  I hadn't thought of that possibility.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Fri, 29 Apr 2011 19:36:02 GMT) Full text and rfc822 format available.

Message #84 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 8545 <at> debbugs.gnu.org
Subject: Re: issues with recent doprnt-related changes
Date: Fri, 29 Apr 2011 22:35:48 +0300
> Date: Fri, 29 Apr 2011 07:41:40 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: 8545 <at> debbugs.gnu.org
> 
> On 04/29/11 04:16, Eli Zaretskii wrote:
> > I also don't like the calls to `abort' you
> > introduced.  I understand the motivation (detection of invalid Emacs
> > code), but why not make it call `error' instead
> 
> Yes, that would be fine.

I installed such a change.

Btw, none of the platforms currently defines HAVE_LONG_LONG_INT or
HAVE_UNSIGNED_LONG_LONG_INT, AFAICS.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Fri, 29 Apr 2011 19:57:02 GMT) Full text and rfc822 format available.

Message #87 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: rms <at> gnu.org
Cc: lekktu <at> gmail.com, 8545 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Fri, 29 Apr 2011 22:56:39 +0300
> Date: Fri, 29 Apr 2011 08:28:35 -0400
> From: Richard Stallman <rms <at> gnu.org>
> Cc: lekktu <at> gmail.com, 8545 <at> debbugs.gnu.org
> 
>     No, format_end is already pointing after the object;
>     the object's size is format_end - format.  So
>     format_end + 1 might not be a valid pointer.
> 
> 20 years ago, Emacs allocated an extra byte with a null character
> after the end of every string or buffer.

20 years later, we still do that for strings, but (AFAICS) not for
buffers.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Fri, 29 Apr 2011 20:33:02 GMT) Full text and rfc822 format available.

Message #90 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 8545 <at> debbugs.gnu.org
Subject: Re: issues with recent doprnt-related changes
Date: Fri, 29 Apr 2011 13:32:29 -0700
On 04/29/11 12:35, Eli Zaretskii wrote:
> none of the platforms currently defines HAVE_LONG_LONG_INT or
> HAVE_UNSIGNED_LONG_LONG_INT, AFAICS.

It's done automatically, by 'configure'.  HAVE_LONG_LONG_INT is
1 on all the platforms I regularly use, e.g., x86 GNU/Linux.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Fri, 29 Apr 2011 23:50:02 GMT) Full text and rfc822 format available.

Message #93 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: rms <at> gnu.org
Cc: lekktu <at> gmail.com, 8545 <at> debbugs.gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Fri, 29 Apr 2011 16:49:20 -0700
On 04/29/11 05:28, Richard Stallman wrote:

> We are concerned with standards insofar as they matter in practice
> for the convenience and reliability of our software.

Yes, of course, I should have made that clearer.  Standards are our
tools, not our masters.

>> If you assign i = INT_MAX + 1, the resulting behavior is undefined.
>
> The result is INT_MIN.  We don't try to support any theoretical machine
> where this would not be so.

Those machines used to be theoretical, but they're in common
use now.  Practical C code can no longer assume that integers
always wrap around when doing integer arithmetic.  For example:

   long
   foo (char *p, int i)
   {
     return &p[i + 1] - &p[i];
   }

On typical hosts where int is 32 bits, and long and char * are
both 64 bits, most compilers optimize that "return" statement
to "return 1;", even when I is INT_MAX and I + 1 therefore
overflows.  These compilers are therefore rejecting the notion
that INT_MAX + 1 must always equal INT_MIN.

Although FOO is contrived, a lot of complicated code in Emacs
mixes int and long and pointer arithmetic, and it's inevitable
that compilers are doing optimizations like the above when they
compile Emacs.  We cannot simply declare that INT_MAX + 1 must
always be INT_MIN, because that's not how compilers actually
work these days.

What we need is good advice for programmers in this area, so
that they can write C code that is portable in practice.  This
advice shouldn't be too conservative, because that would
discourage useful programs.  Nor should it be too
loosey-goosey, because that would encourage buggy programs.
Nor should it be complicated, because that would confuse
people and waste their time.  It is not easy to come up with
advice that satisfies all three constraints.





Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Sat, 30 Apr 2011 09:00:04 GMT) Full text and rfc822 format available.

Message #96 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 8545 <at> debbugs.gnu.org
Subject: Re: issues with recent doprnt-related changes
Date: Sat, 30 Apr 2011 11:59:18 +0300
> Date: Fri, 29 Apr 2011 13:32:29 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: 8545 <at> debbugs.gnu.org
> 
> On 04/29/11 12:35, Eli Zaretskii wrote:
> > none of the platforms currently defines HAVE_LONG_LONG_INT or
> > HAVE_UNSIGNED_LONG_LONG_INT, AFAICS.
> 
> It's done automatically, by 'configure'.

You are right, I didn't look in some of the possible places.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Sat, 30 Apr 2011 21:05:02 GMT) Full text and rfc822 format available.

Message #99 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: lekktu <at> gmail.com, 8545 <at> debbugs.gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Sat, 30 Apr 2011 17:03:47 -0400
    >> If you assign i = INT_MAX + 1, the resulting behavior is undefined.
    >
    > The result is INT_MIN.  We don't try to support any theoretical machine
    > where this would not be so.

    Those machines used to be theoretical, but they're in common
    use now.

I assumed we were talking about type `int', but you did not explicitly
say so.  Touché -- but that just means we are talking at cross
purposes.  What I said about addition on type int is still valid.

     printf ("%d", INT_MAX+1);

will output INT_MIN.

      Practical C code can no longer assume that integers
    always wrap around when doing integer arithmetic.

I think that is the wrong interpretation of the facts.

       long
       foo (char *p, int i)
       {
	 return &p[i + 1] - &p[i];
       }
    On typical hosts where int is 32 bits, and long and char * are
    both 64 bits, most compilers optimize that "return" statement
    to "return 1;", even when I is INT_MAX and I + 1 therefore
    overflows.  These compilers are therefore rejecting the notion
    that INT_MAX + 1 must always equal INT_MIN.

i+1 is computed as an integer, but then it gets converted to a long.
What happens here seems to be an issue about type conversion combined
with addition -- not addition itself.

These compilers are taking a strange liberty.
Why isn't that a bug?

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org, www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use free telephony http://directory.fsf.org/category/tel/




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Sun, 01 May 2011 04:26:01 GMT) Full text and rfc822 format available.

Message #102 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Jason Rumney <jasonr <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: lekktu <at> gmail.com, 8545 <at> debbugs.gnu.org, rms <at> gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Sun, 01 May 2011 12:25:45 +0800
Paul Eggert <eggert <at> cs.ucla.edu> writes:

> On 04/29/11 05:28, Richard Stallman wrote:
>
>> We are concerned with standards insofar as they matter in practice
>> for the convenience and reliability of our software.
>
> Yes, of course, I should have made that clearer.  Standards are our
> tools, not our masters.
>
>>> If you assign i = INT_MAX + 1, the resulting behavior is undefined.
>>
>> The result is INT_MIN.  We don't try to support any theoretical machine
>> where this would not be so.
>
>    long
>    foo (char *p, int i)
>    {
>      return &p[i + 1] - &p[i];
>    }
>
> On typical hosts where int is 32 bits, and long and char * are
> both 64 bits, most compilers optimize that "return" statement
> to "return 1;", even when I is INT_MAX and I + 1 therefore
> overflows.

That does not mean that INT_MAX + 1 is undefined.  You are also
involving implicit casts here. Exactly where those casts happen is what
is undefined.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Sun, 01 May 2011 05:42:02 GMT) Full text and rfc822 format available.

Message #105 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: rms <at> gnu.org
Cc: lekktu <at> gmail.com, 8545 <at> debbugs.gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Sat, 30 Apr 2011 22:41:38 -0700
On 04/30/11 14:03, Richard Stallman wrote:
>        long
>        foo (char *p, int i)
>        {
>          return &p[i + 1] - &p[i];
>        }
> ...
> i+1 is computed as an integer, but then it gets converted to a long.

Unfortunately that doesn't explain FOO's behavior.

If i+1 is computed as an int and if wraparound is required, then
when i is INT_MAX, i+1 must be INT_MIN.  (FOO does not convert
i+1 to long; but even if it did, INT_MIN's value would be unchanged by
that conversion.)  If p is pointing into a large array, &p[INT_MIN]
and &p[INT_MAX] can both be valid addresses, and in that case
foo (p, INT_MAX) would have to yield -2**32 + 1 on a typical 64-bit host
where signed integer arithmetic wrapped around.

But in my experience no compiler does it that way.  FOO always returns 1.


> What happens here seems to be an issue about type conversion combined
> with addition -- not addition itself.

I'm afraid not.  First, FOO doesn't have any type conversions.
If I is an int, A[I] doesn't convert I to any other type; it
simply uses I's value without conversion.

Second, it's easy to construct an example that involves only "int":

   int
   bar (int i)
   {
     return i < i + 1;
   }

With many compilers, BAR always returns 1, even when i == INT_MAX.


> These compilers are taking a strange liberty.
> Why isn't that a bug?

Well, for starters, most programmers *expect* FOO to return 1,
and similarly for BAR.  Why would a programmer file a bug report
when the program is behaving as expected?


>     printf ("%d", INT_MAX+1);
> will output INT_MIN.

That's true for all systems I have ready access to, yes.  And I
expect there are other cases where int arithmetic wraps around
reliably.  But there are many practical cases where it doesn't.
We cannot simply advise programmers to assume that adding 1
to INT_MAX always results in INT_MIN, as that assumption is
often incorrect nowadays.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Sun, 01 May 2011 05:57:01 GMT) Full text and rfc822 format available.

Message #108 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jason Rumney <jasonr <at> gnu.org>
Cc: lekktu <at> gmail.com, 8545 <at> debbugs.gnu.org, rms <at> gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Sat, 30 Apr 2011 22:56:45 -0700
On 04/30/11 21:25, Jason Rumney wrote:
> You are also
> involving implicit casts here. Exactly where those casts happen is what
> is undefined.

I'm not sure what is meant here, as the program in question
doesn't have any implicit arithmetic conversions (or at least,
it doesn't on the typical 64-bit host I was talking about).

> That does not mean that INT_MAX + 1 is undefined.

It could well be that the exact expression "INT_MAX + 1"
has well-defined behavior on all platforms we care about,
and that it always returns INT_MIN.  But I'm concerned
about the more general issue, which is whether Emacs code
can always assume that signed integer arithmetic
wraps around.  Unfortunately, there are many practical
Emacs targets where it's incorrect to assume that.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Sun, 01 May 2011 08:13:02 GMT) Full text and rfc822 format available.

Message #111 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Jason Rumney <jasonr <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: lekktu <at> gmail.com, 8545 <at> debbugs.gnu.org, rms <at> gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Sun, 01 May 2011 16:12:38 +0800
Paul Eggert <eggert <at> cs.ucla.edu> writes:

> I'm not sure what is meant here, as the program in question
> doesn't have any implicit arithmetic conversions (or at least,
> it doesn't on the typical 64-bit host I was talking about).

It has implicit type conversion from int to char* and then to long.
Whether those types are different from each other depends on the host
machine, OS and compiler. That is what is undeterministic about your
example, not the fact that in int arithmetic INT_MAX + 1 == INT_MIN.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Sun, 01 May 2011 11:03:02 GMT) Full text and rfc822 format available.

Message #114 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Jason Rumney <jasonr <at> gnu.org>
Cc: lekktu <at> gmail.com, 8545 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>,
	rms <at> gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Sun, 01 May 2011 13:02:13 +0200
Jason Rumney <jasonr <at> gnu.org> writes:

> It has implicit type conversion from int to char* and then to long.

The only implicit conversion in this function is from ptrdiff_t to long.

Andreas.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Mon, 02 May 2011 00:00:04 GMT) Full text and rfc822 format available.

Message #117 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: lekktu <at> gmail.com, 8545 <at> debbugs.gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Sun, 01 May 2011 19:59:06 -0400
    Second, it's easy to construct an example that involves only "int":

       int
       bar (int i)
       {
	 return i < i + 1;
       }
    With many compilers, BAR always returns 1, even when i == INT_MAX.

I see.

This would break many constructions intended to test whether an
operation did overflow.  Is there any reliable way to test for that?

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org, www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use free telephony http://directory.fsf.org/category/tel/




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Mon, 02 May 2011 00:24:01 GMT) Full text and rfc822 format available.

Message #120 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: rms <at> gnu.org
Cc: lekktu <at> gmail.com, 8545 <at> debbugs.gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Sun, 01 May 2011 17:23:37 -0700
On 05/01/11 16:59, Richard Stallman wrote:
> This would break many constructions intended to test whether an
> operation did overflow.  Is there any reliable way to test for that?

Yes.  For example:

   int
   add_overflow (int a, int b)
   {
     if (b < 0)
       return a < INT_MIN - b;
     else
       return INT_MAX - b < a;
   }

add_overflow (a, b) returns 1 if a + b would overflow.
There are similar reliable tests for the other arithmetic operations.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Tue, 03 May 2011 20:25:01 GMT) Full text and rfc822 format available.

Message #123 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: rms <at> gnu.org
Cc: 8545 <at> debbugs.gnu.org
Subject: Re: bug#8545: issues with recent doprnt-related changes
Date: Tue, 03 May 2011 13:24:16 -0700
>>     There are similar reliable tests for the other arithmetic operations.
> 
> Is this documented somewhere?  Is there a list of the standard ways?

CERT has something, here:

https://www.securecoding.cert.org/confluence/display/seccode/INT32-C.+Ensure+that+operations+on+signed+integers+do+not+result+in+overflow

Although the principles in that memo are OK, the actual code is
hard to read and its multiplication overflow checking is buggy.

Here's something better, which I just now wrote.  Also, please see
Emacs Bug#8611 <http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8611>;
its patch uses code like the following.


#include <limits.h>

int
add_overflow (int a, int b)
{
  return (b < 0
	  ? a < INT_MIN - b
	  : INT_MAX - b < a);
}

int
subtract_overflow (int a, int b)
{
  return (b < 0
	  ? INT_MAX + b < a
	  : a < INT_MIN + b);
}

int
unary_minus_overflow (int a)
{
  return a < -INT_MAX;
}

int
multiply_overflow (int a, int b)
{
  return (b < 0
	  ? (a < 0
	     ? a < INT_MAX / b
	     : b != -1 && INT_MIN / b < a)
	  : (b != 0
	     && (a < 0
		 ? a < INT_MIN / b
		 : INT_MAX / b < a)));
}

int
quotient_overflow (int a, int b)
{
  /* This does not check for division by zero.  Add that if you like.  */
  return a < -INT_MAX && b == -1;
}

int
remainder_overflow (int a, int b)
{
  /* Mathematically the remainder should never overflow, but on x86-like
     hosts INT_MIN % -1 traps, and the C standard permits this.  */
  return quotient_overflow (a, b);
}




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Wed, 04 May 2011 07:29:02 GMT) Full text and rfc822 format available.

Message #126 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 8545 <at> debbugs.gnu.org
Subject: Re: issues with recent doprnt-related changes
Date: Wed, 04 May 2011 00:28:18 -0700
On 04/29/11 04:16, Eli Zaretskii wrote:
> I guess so, yes.  I would like to have other opinions, though, so I
> will start a new thread on emacs-devel about that.

It seems from that discussion that strings can contain MOST_POSITIVE_FIXNUM bytes.
Also, that va_arg bug really needs fixing.  So I plan to install the following
patch after some more testing.  This assumes va_copy exists, which may affect
the Windows port, but a one-line macro should suffice if it doesn't have
va_copy already.

=== modified file 'ChangeLog'
--- ChangeLog	2011-05-04 06:11:49 +0000
+++ ChangeLog	2011-05-04 07:19:21 +0000
@@ -1,5 +1,9 @@
 2011-05-04  Paul Eggert  <eggert <at> cs.ucla.edu>
 
+	Use C99's va_copy to avoid undefined behavior on x86-64 GNU/Linux.
+	* Makefile.in (GNULIB_MODULES): Add stdarg, for va_copy.
+	* lib/stdarg.in.h, m4/stdarg.m4: New files, from gnulib.
+
 	* Makefile.in (GNULIB_TOOL_FLAG): Add --conditional-dependencies.
 	This new gnulib-tool option saves 'configure' the trouble of
 	checking for strtoull when strtoumax exists.

=== modified file 'Makefile.in'
--- Makefile.in	2011-05-04 06:11:49 +0000
+++ Makefile.in	2011-05-04 07:19:21 +0000
@@ -333,7 +333,7 @@
 GNULIB_MODULES = \
   careadlinkat crypto/md5 dtoastr filemode getloadavg getopt-gnu \
   ignore-value intprops lstat mktime readlink \
-  socklen stdio strftime strtoumax symlink sys_stat
+  socklen stdarg stdio strftime strtoumax symlink sys_stat
 GNULIB_TOOL_FLAGS = \
  --conditional-dependencies --import --no-changelog --no-vc-files \
  --makefile-name=gnulib.mk

=== modified file 'src/ChangeLog'
--- src/ChangeLog	2011-05-04 06:13:23 +0000
+++ src/ChangeLog	2011-05-04 07:20:46 +0000
@@ -1,5 +1,16 @@
 2011-05-04  Paul Eggert  <eggert <at> cs.ucla.edu>
 
+	* term.c (vfatal): Remove stray call to va_end.
+	It's not needed and the C Standard doesn't allow it here anyway.
+
+	Use C99's va_copy to avoid undefined behavior on x86-64 GNU/Linux.
+	* eval.c (verror): doprnt a copy of ap, not the original.  (Bug#8545)
+
+	* eval.c (verror): OK to create a string of up to MOST_POSITIVE_FIXNUM
+	bytes.
+
+	* term.c: Don't include <stdarg.h>, as <lisp.h> does that.
+
 	Arithmetic overflows now return float rather than wrapping around.
 	(Bug#8611).
 	* data.c: Include <intprops.h>.

=== modified file 'src/eval.c'
--- src/eval.c	2011-04-30 19:00:39 +0000
+++ src/eval.c	2011-05-04 07:19:21 +0000
@@ -1994,7 +1994,7 @@
 {
   char buf[4000];
   size_t size = sizeof buf;
-  size_t size_max = min (MOST_POSITIVE_FIXNUM, SIZE_MAX);
+  size_t size_max = min (MOST_POSITIVE_FIXNUM + 1, SIZE_MAX);
   size_t mlen = strlen (m);
   char *buffer = buf;
   size_t used;
@@ -2002,7 +2002,10 @@
 
   while (1)
     {
-      used = doprnt (buffer, size, m, m + mlen, ap);
+      va_list ap_copy;
+      va_copy (ap_copy, ap);
+      used = doprnt (buffer, size, m, m + mlen, ap_copy);
+      va_end (ap_copy);
 
       /* Note: the -1 below is because `doprnt' returns the number of bytes
 	 excluding the terminating null byte, and it always terminates with a

=== modified file 'src/term.c'
--- src/term.c	2011-04-24 09:00:03 +0000
+++ src/term.c	2011-05-04 07:20:46 +0000
@@ -26,7 +26,6 @@
 #include <sys/file.h>
 #include <unistd.h>
 #include <signal.h>
-#include <stdarg.h>
 #include <setjmp.h>
 
 #include "lisp.h"
@@ -3619,7 +3618,6 @@
   vfprintf (stderr, str, ap);
   if (!(strlen (str) > 0 && str[strlen (str) - 1] == '\n'))
     fprintf (stderr, "\n");
-  va_end (ap);
   fflush (stderr);
   exit (1);
 }





Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Wed, 04 May 2011 09:53:02 GMT) Full text and rfc822 format available.

Message #129 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 8545 <at> debbugs.gnu.org
Subject: Re: issues with recent doprnt-related changes
Date: Wed, 04 May 2011 05:52:10 -0400
> Date: Wed, 04 May 2011 00:28:18 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: 8545 <at> debbugs.gnu.org
> 
> On 04/29/11 04:16, Eli Zaretskii wrote:
> > I guess so, yes.  I would like to have other opinions, though, so I
> > will start a new thread on emacs-devel about that.
> 
> It seems from that discussion that strings can contain MOST_POSITIVE_FIXNUM bytes.

I think the conclusion was that it can contain MOST_POSITIVE_FIXNUM
_including_the_terminating_null_.

> Also, that va_arg bug really needs fixing.  So I plan to install the following
> patch after some more testing.  This assumes va_copy exists, which may affect
> the Windows port, but a one-line macro should suffice if it doesn't have
> va_copy already.

Right, I will take care of that when this is installed.

Thanks.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Wed, 04 May 2011 14:57:02 GMT) Full text and rfc822 format available.

Message #132 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 8545 <at> debbugs.gnu.org, Emacs Development <emacs-devel <at> gnu.org>
Subject: Re: issues with recent doprnt-related changes
Date: Wed, 04 May 2011 07:56:43 -0700
On 05/04/11 02:52, Eli Zaretskii wrote:
>> It seems from that discussion that strings can contain MOST_POSITIVE_FIXNUM bytes.
> I think the conclusion was that it can contain MOST_POSITIVE_FIXNUM
> _including_the_terminating_null_.

Hmm, that's not how I read
<http://lists.gnu.org/archive/html/emacs-devel/2011-04/msg00923.html>.

I understood the argument to be that a buffer must contain at most
MOST_POSITIVE_FIXNUM - 1 bytes due to other reasons, but it's OK
to have "a string whose length is MOST_POSITIVE_FIXNUM", i.e.,
(= (length STRING) most-positive-fixnum), because we already check
buffer sizes before inserting strings.  If you
count the trailing byte, the length of the underlying C character
array would be MOST_POSITIVE_FIXNUM + 1.

I'll CC: this to emacs-devel just in case I misinterpreted that.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Thu, 05 May 2011 20:37:02 GMT) Full text and rfc822 format available.

Message #135 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 8545 <at> debbugs.gnu.org, emacs-devel <at> gnu.org
Subject: Re: issues with recent doprnt-related changes
Date: Thu, 05 May 2011 23:36:49 +0300
> Date: Wed, 04 May 2011 07:56:43 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: 8545 <at> debbugs.gnu.org, Emacs Development <emacs-devel <at> gnu.org>
> 
> On 05/04/11 02:52, Eli Zaretskii wrote:
> >> It seems from that discussion that strings can contain MOST_POSITIVE_FIXNUM bytes.
> > I think the conclusion was that it can contain MOST_POSITIVE_FIXNUM
> > _including_the_terminating_null_.
> 
> Hmm, that's not how I read
> <http://lists.gnu.org/archive/html/emacs-devel/2011-04/msg00923.html>.
> 
> I understood the argument to be that a buffer must contain at most
> MOST_POSITIVE_FIXNUM - 1 bytes due to other reasons, but it's OK
> to have "a string whose length is MOST_POSITIVE_FIXNUM", i.e.,
> (= (length STRING) most-positive-fixnum), because we already check
> buffer sizes before inserting strings.  If you
> count the trailing byte, the length of the underlying C character
> array would be MOST_POSITIVE_FIXNUM + 1.

Stefan, could you please chime in?




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Fri, 06 May 2011 07:31:01 GMT) Full text and rfc822 format available.

Message #138 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: 8545 <at> debbugs.gnu.org, 8600-done <at> debbugs.gnu.org, 
	8601-done <at> debbugs.gnu.org, 8602-done <at> debbugs.gnu.org
Subject: Merged fixes for 8600, 8601, 8602, and (partially) for 8545
Date: Fri, 06 May 2011 00:29:56 -0700
I committed to the Emacs trunk a merge (bzr 104134) that has fixes for
the following bugs:

* Bug#8600 - The fix removes the garbage element of code_space.

* Bug#8601 - Here I assumed that the "* 2" is a typo.

* Bug#8602 - This fixes some large-int-to-float screwups in
             the Lisp reader.

* Bug#8545 - This fixes the bug where the code should have called
             va_copy, but didn't.  Also, I changed a limit so that
	     the MOST_POSITIVE_FIXNUM limit for strings applies to
	     their length, i.e., does not include the null termination
	     byte.  Stefan hasn't had time to chime in, but if this
             last change turns out to be incorrect I will back it out.

This merge doesn't entirely fix Bug#8545, so I'll leave that bug open;
the others I'll close.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Fri, 06 May 2011 13:34:01 GMT) Full text and rfc822 format available.

Message #141 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 8545 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>, emacs-devel <at> gnu.org
Subject: Re: issues with recent doprnt-related changes
Date: Fri, 06 May 2011 10:33:21 -0300
>> >> It seems from that discussion that strings can contain
>> >> MOST_POSITIVE_FIXNUM bytes.
>> > I think the conclusion was that it can contain MOST_POSITIVE_FIXNUM
>> > _including_the_terminating_null_.
>> 
>> Hmm, that's not how I read
>> <http://lists.gnu.org/archive/html/emacs-devel/2011-04/msg00923.html>.
>> 
>> I understood the argument to be that a buffer must contain at most
>> MOST_POSITIVE_FIXNUM - 1 bytes due to other reasons, but it's OK
>> to have "a string whose length is MOST_POSITIVE_FIXNUM", i.e.,
>> (= (length STRING) most-positive-fixnum), because we already check
>> buffer sizes before inserting strings.  If you
>> count the trailing byte, the length of the underlying C character
>> array would be MOST_POSITIVE_FIXNUM + 1.

> Stefan, could you please chime in?

I'm not sure I understand the details more than you do.  AFAIK the
MOST_POSITIVE_FIXNUM limit only comes into play when the number goes
through Lisp_Object somewhere.  And we never show "length, including
terminating nul-byte" to Elisp, so the nul-byte shouldn't matter: while
the C array will have size MOST_POSITIVE_FIXNUM + 1, this will only be
represented in EMACS_INT which can accommodate it just fine.


        Stefan




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Fri, 06 May 2011 14:42:01 GMT) Full text and rfc822 format available.

Message #144 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 8545 <at> debbugs.gnu.org, emacs-devel <at> gnu.org
Subject: Re: issues with recent doprnt-related changes
Date: Fri, 06 May 2011 07:41:42 -0700
On 05/06/11 06:33, Stefan Monnier wrote:
> I'm not sure I understand the details more than you do.  AFAIK the
> MOST_POSITIVE_FIXNUM limit only comes into play when the number goes
> through Lisp_Object somewhere.  And we never show "length, including
> terminating nul-byte" to Elisp, so the nul-byte shouldn't matter: while
> the C array will have size MOST_POSITIVE_FIXNUM + 1, this will only be
> represented in EMACS_INT which can accommodate it just fine.

First, you say you're not sure you understand the details;
then, you explain the situation perfectly!  :-)

Thanks.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Fri, 06 May 2011 15:05:01 GMT) Full text and rfc822 format available.

Message #147 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 8545 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, emacs-devel <at> gnu.org
Subject: Re: issues with recent doprnt-related changes
Date: Fri, 06 May 2011 18:03:46 +0300
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: Paul Eggert <eggert <at> cs.ucla.edu>,  8545 <at> debbugs.gnu.org,  emacs-devel <at> gnu.org
> Date: Fri, 06 May 2011 10:33:21 -0300
> 
> And we never show "length, including
> terminating nul-byte" to Elisp, so the nul-byte shouldn't matter

If you are sure of that, then I agree it's fine to use
MOST_POSITIVE_FIXNUM+1.  I wasn't sure.  I do see in C variables that
explicitly use the string position of the null byte; I was afraid that
they could somehow leak to Lisp.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Fri, 06 May 2011 17:14:01 GMT) Full text and rfc822 format available.

Message #150 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 8545 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, emacs-devel <at> gnu.org
Subject: Re: issues with recent doprnt-related changes
Date: Fri, 06 May 2011 14:13:06 -0300
>> And we never show "length, including
>> terminating nul-byte" to Elisp, so the nul-byte shouldn't matter

> If you are sure of that, then I agree it's fine to use
> MOST_POSITIVE_FIXNUM+1.  I wasn't sure.  I do see in C variables that
> explicitly use the string position of the null byte; I was afraid that
> they could somehow leak to Lisp.

I don't know enough to guarantee you that they can never leak to Lisp,
but I can't think of any reason why they should.  And note that "the
position of the nul byte" is the same as "the length of the list", so
it's still <= MOST_POSITIVE_FIXNUM.  It's only the position after the
nul byte that would overflow.


        Stefan




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Fri, 06 May 2011 19:58:02 GMT) Full text and rfc822 format available.

Message #153 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 8545 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, emacs-devel <at> gnu.org
Subject: Re: issues with recent doprnt-related changes
Date: Fri, 06 May 2011 22:57:30 +0300
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: eggert <at> cs.ucla.edu,  8545 <at> debbugs.gnu.org,  emacs-devel <at> gnu.org
> Date: Fri, 06 May 2011 14:13:06 -0300
> 
> note that "the position of the nul byte" is the same as "the length
> of the list", so it's still <= MOST_POSITIVE_FIXNUM.  It's only the
> position after the nul byte that would overflow.

But what about this code and its commentary (from
next_element_from_c_string):

  /* IT's position can be greater IT->string_nchars in case a field
     width or precision has been specified when the iterator was
     initialized.  */
  if (IT_CHARPOS (*it) >= it->end_charpos)
    {
      /* End of the game.  */
      ...

This happens when the iterator is initialized by reseat_to_string.
Admittedly, it's not very practical to have such huge strings that are
padded to an even larger width...




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Sat, 07 May 2011 03:19:02 GMT) Full text and rfc822 format available.

Message #156 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 8545 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, emacs-devel <at> gnu.org
Subject: Re: issues with recent doprnt-related changes
Date: Sat, 07 May 2011 00:18:47 -0300
>> note that "the position of the nul byte" is the same as "the length
>> of the list", so it's still <= MOST_POSITIVE_FIXNUM.  It's only the
>> position after the nul byte that would overflow.

> But what about this code and its commentary (from
> next_element_from_c_string):

>   /* IT's position can be greater IT->string_nchars in case a field
>      width or precision has been specified when the iterator was
>      initialized.  */
>   if (IT_CHARPOS (*it) >= it->end_charpos)
>     {
>       /* End of the game.  */
>       ...

Do these ever make it into a Lisp_Object?


        Stefan




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Sat, 07 May 2011 07:57:02 GMT) Full text and rfc822 format available.

Message #159 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 8545 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, emacs-devel <at> gnu.org
Subject: Re: issues with recent doprnt-related changes
Date: Sat, 07 May 2011 10:55:52 +0300
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: eggert <at> cs.ucla.edu,  8545 <at> debbugs.gnu.org,  emacs-devel <at> gnu.org
> Date: Sat, 07 May 2011 00:18:47 -0300
> 
> >> note that "the position of the nul byte" is the same as "the length
> >> of the list", so it's still <= MOST_POSITIVE_FIXNUM.  It's only the
> >> position after the nul byte that would overflow.
> 
> > But what about this code and its commentary (from
> > next_element_from_c_string):
> 
> >   /* IT's position can be greater IT->string_nchars in case a field
> >      width or precision has been specified when the iterator was
> >      initialized.  */
> >   if (IT_CHARPOS (*it) >= it->end_charpos)
> >     {
> >       /* End of the game.  */
> >       ...
> 
> Do these ever make it into a Lisp_Object?

Well, the resulting string can be returned by format-mode-line, for
example.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Mon, 14 Sep 2020 12:38:01 GMT) Full text and rfc822 format available.

Message #162 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 8545 <at> debbugs.gnu.org
Subject: Re: bug#8545: Merged fixes for 8600, 8601, 8602, and (partially)
 for 8545
Date: Mon, 14 Sep 2020 14:37:44 +0200
Paul Eggert <eggert <at> cs.ucla.edu> writes:

> * Bug#8545 - This fixes the bug where the code should have called
>              va_copy, but didn't.  Also, I changed a limit so that
> 	     the MOST_POSITIVE_FIXNUM limit for strings applies to
> 	     their length, i.e., does not include the null termination
> 	     byte.  Stefan hasn't had time to chime in, but if this
>              last change turns out to be incorrect I will back it out.
>
> This merge doesn't entirely fix Bug#8545, so I'll leave that bug open;
> the others I'll close.

This was a quite long thread, and had a bunch of related bugs discussed.
Skimming this thread, I'm not sure exactly what the remaining problem in
this bug report is -- the discussion kinda petered out...

This is nine years old -- is there anything more to be done here?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Mon, 14 Sep 2020 18:42:02 GMT) Full text and rfc822 format available.

Notification sent to Paul Eggert <eggert <at> cs.ucla.edu>:
bug acknowledged by developer. (Mon, 14 Sep 2020 18:42:02 GMT) Full text and rfc822 format available.

Message #167 received at 8545-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: eggert <at> cs.ucla.edu, 8545-done <at> debbugs.gnu.org
Subject: Re: bug#8545: Merged fixes for 8600, 8601, 8602,
 and (partially) for 8545
Date: Mon, 14 Sep 2020 21:41:56 +0300
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Date: Mon, 14 Sep 2020 14:37:44 +0200
> Cc: 8545 <at> debbugs.gnu.org
> 
> This was a quite long thread, and had a bunch of related bugs discussed.
> Skimming this thread, I'm not sure exactly what the remaining problem in
> this bug report is -- the discussion kinda petered out...
> 
> This is nine years old -- is there anything more to be done here?

I don't think so, so I'm closing this bug.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#8545; Package emacs. (Wed, 16 Sep 2020 02:02:01 GMT) Full text and rfc822 format available.

Message #170 received at 8545 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 8545 <at> debbugs.gnu.org
Subject: Re: bug#8545: Merged fixes for 8600, 8601, 8602, and (partially) for
 8545
Date: Tue, 15 Sep 2020 19:01:39 -0700
Several issues from Bug#8545 still remain but its commentary had gotten so long 
(and our collective memory so hazy :-) that they were hard to see, so I started 
a new bug report (Bug#43439) with a proposed patch that I hope clarifies and 
addresses most of the issues.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 14 Oct 2020 11:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 205 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.