GNU bug report logs - #43439
[PATCH] doprnt improvements

Package: emacs;

Reported by: Paul Eggert <eggert <at> cs.ucla.edu>

Date: Wed, 16 Sep 2020 01:52:01 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 43439 in the body.
You can then email your comments to 43439 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#43439; Package emacs. (Wed, 16 Sep 2020 01:52:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Paul Eggert <eggert <at> cs.ucla.edu>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 16 Sep 2020 01:52:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: bug-gnu-emacs <at> gnu.org
Cc: Paul Eggert <eggert <at> cs.ucla.edu>
Subject: [PATCH] doprnt improvements
Date: Tue, 15 Sep 2020 18:50:51 -0700

Improve doprnt performance, internal checking, and internal
documentation.  On my platform (Ubuntu 18.04.5 x86-64), this
improved CPU speed of ‘make -C lisp compile-always’ by 6%.
This patch implements some of my suggestions in Bug#8545,
with further changes suggested by Eli Zaretskii.
* src/doprnt.c: Improve comments.
(SIZE_BOUND_EXTRA): Now at top level, for parse_format_integer.
(parse_format_integer): New static function, containing some of
the old doprint.  Fix a bug that caused doprnt to infloop on
formats like "%10s" that Emacs does not use.  We could simplify
doprnt further if we dropped support for these never-used formats.
(doprnt): Omit FORMAT_END argument, since it’s always NULL,
which means doprnt must call strlen on FORMAT; doing this means
doprnt needs just one pass over FORMAT, not two.  All callers changed.
Assume C99 to make code clearer.  Do not use malloc or alloca
to allocate a copy of the format FMTCPY; instead, use a small
fixed-size array FMTSTAR, and use '*' in that array to represent
width and precision, passing them as separate int arguments.
Use eassume to pacify GCC in switch statements.  Drop support for
"%S" which is never used and which would cause GCC to warn anyway.
* src/lisp.h (doprnt): Give it ATTRIBUTE_FORMAT_PRINTF (3, 0),
since GCC can grok doprnt's new API.
---
 src/doprnt.c | 223 ++++++++++++++++++++++++++-------------------------
 src/lisp.h   |   4 +-
 src/sysdep.c |   2 +-
 src/xdisp.c  |   5 +-
 4 files changed, 117 insertions(+), 117 deletions(-)

diff --git a/src/doprnt.c b/src/doprnt.c
index b0ba12552b..f154578c0d 100644
--- a/src/doprnt.c
+++ b/src/doprnt.c
@@ -28,6 +28,7 @@
    . For %s and %c, when field width is specified (e.g., %25s), it accounts for
      the display width of each character, according to char-width-table.  That
      is, it does not assume that each character takes one column on display.
+     Nor does it assume that each character is a single byte.
 
    . If the size of the buffer is not enough to produce the formatted string in
      its entirety, it makes sure that truncation does not chop the last
@@ -42,38 +43,41 @@
      Emacs can handle.
 
    OTOH, this function supports only a small subset of the standard C formatted
-   output facilities.  E.g., %u and %ll are not supported, and precision is
-   ignored %s and %c conversions.  (See below for the detailed documentation of
-   what is supported.)  However, this is okay, as this function is supposed to
-   be called from `error' and similar functions, and thus does not need to
-   support features beyond those in `Fformat_message', which is used
-   by `error' on the Lisp level.  */
+   output facilities.  E.g., %u is not supported, precision is ignored
+   in %s and %c conversions, and although %lld often works it is not
+   supported and code should use something like %"pM"d with intmax_t instead.
+   (See below for the detailed documentation of what is supported.)
+   However, this is okay, as this function is supposed to be called
+   from 'error' and similar C functions, and thus does not need to
+   support all the features of 'Fformat_message', which is used by the
+   Lisp 'error' function.  */
 
 /* In the FORMAT argument this function supports ` and ' as directives
    that output left and right quotes as per ‘text-quoting style’.  It
    also supports the following %-sequences:
 
    %s means print a string argument.
-   %S is treated as %s, for loose compatibility with `Fformat_message'.
    %d means print a `signed int' argument in decimal.
    %o means print an `unsigned int' argument in octal.
    %x means print an `unsigned int' argument in hex.
    %e means print a `double' argument in exponential notation.
    %f means print a `double' argument in decimal-point notation.
    %g means print a `double' argument in exponential notation
-      or in decimal-point notation, whichever uses fewer characters.
+      or in decimal-point notation, depending on the value;
+      this is often (though not always) the shorter of the two notations
    %c means print a `signed int' argument as a single character.
    %% means produce a literal % character.
 
-   A %-sequence may contain optional flag, width, and precision specifiers, and
-   a length modifier, as follows:
+   A %-sequence other than %% may contain optional flags, width, precision,
+   and length, as follows:
 
      %<flags><width><precision><length>character
 
    where flags is [+ -0], width is [0-9]+, precision is .[0-9]+, and length
    is empty or l or the value of the pD or pI or PRIdMAX (sans "d") macros.
-   Also, %% in a format stands for a single % in the output.  A % that
-   does not introduce a valid %-sequence causes undefined behavior.
+   A % that does not introduce a valid %-sequence causes undefined behavior.
+   ASCII bytes in FORMAT other than % are copied through as-is;
+   non-ASCII bytes should not appear in FORMAT.
 
    The + flag character inserts a + before any positive number, while a space
    inserts a space before any positive number; these flags only affect %d, %o,
@@ -99,7 +103,9 @@
 
    For %e, %f, and %g sequences, the number after the "." in the precision
    specifier says how many decimal places to show; if zero, the decimal point
-   itself is omitted.  For %s and %S, the precision specifier is ignored.  */
+   itself is omitted.  For %d, %o, and %x sequences, the precision specifies
+   the minimum number of digits to appear.  Precision specifiers are
+   not supported for other %-sequences.  */
 
 #include <config.h>
 #include <stdio.h>
@@ -115,9 +121,29 @@
    another macro.  */
 #include "character.h"
 
-/* Generate output from a format-spec FORMAT,
-   terminated at position FORMAT_END.
-   (*FORMAT_END is not part of the format, but must exist and be readable.)
+/* Enough to handle floating point formats with large numbers.  */
+enum { SIZE_BOUND_EXTRA = DBL_MAX_10_EXP + 50 };
+
+/* Parse FMT as an unsigned decimal integer, putting its value into *VALUE.
+   Return the address of the first byte after the integer.
+   If FMT is not an integer, return FMT and store zero into *VALUE.  */
+static char const *
+parse_format_integer (char const *fmt, int *value)
+{
+  int n = 0;
+  bool overflow = false;
+  for (; '0' <= *fmt && *fmt <= '9'; fmt++)
+    {
+      overflow |= INT_MULTIPLY_WRAPV (n, 10, &n);
+      overflow |= INT_ADD_WRAPV (n, *fmt - '0', &n);
+    }
+  if (overflow || min (PTRDIFF_MAX, SIZE_MAX) - SIZE_BOUND_EXTRA < n)
+    error ("Format width or precision too large");
+  *value = n;
+  return fmt;
+}
+
+/* Generate output from a format-spec FORMAT.
    Output goes in BUFFER, which has room for BUFSIZE chars.
    BUFSIZE must be positive.  If the output does not fit, truncate it
    to fit and return BUFSIZE - 1; if this truncates a multibyte
@@ -128,15 +154,11 @@
    Integers are passed as C integers.  */
 
 ptrdiff_t
-doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
-	const char *format_end, va_list ap)
+doprnt (char *buffer, ptrdiff_t bufsize, const char *format, va_list ap)
 {
   const char *fmt = format;	/* Pointer into format string.  */
   char *bufptr = buffer;	/* Pointer into output buffer.  */
 
-  /* Enough to handle floating point formats with large numbers.  */
-  enum { SIZE_BOUND_EXTRA = DBL_MAX_10_EXP + 50 };
-
   /* Use this for sprintf unless we need something really big.  */
   char tembuf[SIZE_BOUND_EXTRA + 50];
 
@@ -150,103 +172,91 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
   char *big_buffer = NULL;
 
   enum text_quoting_style quoting_style = text_quoting_style ();
-  ptrdiff_t tem = -1;
-  char *string;
-  char fixed_buffer[20];	/* Default buffer for small formatting. */
-  char *fmtcpy;
-  int minlen;
-  char charbuf[MAX_MULTIBYTE_LENGTH + 1];	/* Used for %c.  */
-  USE_SAFE_ALLOCA;
-
-  if (format_end == 0)
-    format_end = format + strlen (format);
-
-  fmtcpy = (format_end - format < sizeof (fixed_buffer) - 1
-	    ? fixed_buffer
-	    : SAFE_ALLOCA (format_end - format + 1));
 
   bufsize--;
 
   /* Loop until end of format string or buffer full. */
-  while (fmt < format_end && bufsize > 0)
+  while (*fmt && bufsize > 0)
     {
       char const *fmt0 = fmt;
       char fmtchar = *fmt++;
       if (fmtchar == '%')
 	{
-	  ptrdiff_t size_bound = 0;
 	  ptrdiff_t width;  /* Columns occupied by STRING on display.  */
 	  enum {
 	    pDlen = sizeof pD - 1,
 	    pIlen = sizeof pI - 1,
-	    pMlen = sizeof PRIdMAX - 2
+	    pMlen = sizeof PRIdMAX - 2,
+	    maxmlen = max (max (1, pDlen), max (pIlen, pMlen))
 	  };
 	  enum {
 	    no_modifier, long_modifier, pD_modifier, pI_modifier, pM_modifier
 	  } length_modifier = no_modifier;
 	  static char const modifier_len[] = { 0, 1, pDlen, pIlen, pMlen };
-	  int maxmlen = max (max (1, pDlen), max (pIlen, pMlen));
 	  int mlen;
+	  char charbuf[MAX_MULTIBYTE_LENGTH + 1];	/* Used for %c.  */
 
-	  /* Copy this one %-spec into fmtcpy.  */
-	  string = fmtcpy;
+	  /* Width and precision specified by this %-sequence.  */
+	  int wid = 0, prec = -1;
+
+	  /* FMTSTAR will be a "%*.*X"-like version of this %-sequence.
+	     Start by putting '%' into FMTSTAR.  */
+	  char fmtstar[sizeof "%-+ 0*.*d" + maxmlen];
+	  char *string = fmtstar;
 	  *string++ = '%';
-	  while (fmt < format_end)
+
+	  /* Copy at most one instance of each flag into FMTSTAR.  */
+	  bool minusflag = false, plusflag = false, zeroflag = false,
+	    spaceflag = false;
+	  for (;; fmt++)
 	    {
-	      *string++ = *fmt;
-	      if ('0' <= *fmt && *fmt <= '9')
+	      *string = *fmt;
+	      switch (*fmt)
 		{
-		  /* Get an idea of how much space we might need.
-		     This might be a field width or a precision; e.g.
-		     %1.1000f and %1000.1f both might need 1000+ bytes.
-		     Parse the width or precision, checking for overflow.  */
-		  int n = *fmt - '0';
-		  bool overflow = false;
-		  while (fmt + 1 < format_end
-			 && '0' <= fmt[1] && fmt[1] <= '9')
-		    {
-		      overflow |= INT_MULTIPLY_WRAPV (n, 10, &n);
-		      overflow |= INT_ADD_WRAPV (n, fmt[1] - '0', &n);
-		      *string++ = *++fmt;
-		    }
-
-		  if (overflow
-		      || min (PTRDIFF_MAX, SIZE_MAX) - SIZE_BOUND_EXTRA < n)
-		    error ("Format width or precision too large");
-		  if (size_bound < n)
-		    size_bound = n;
+		case '-': string += !minusflag; minusflag = true; continue;
+		case '+': string += !plusflag; plusflag = true; continue;
+		case ' ': string += !spaceflag; spaceflag = true; continue;
+		case '0': string += !zeroflag; zeroflag = true; continue;
 		}
-	      else if (! (*fmt == '-' || *fmt == ' ' || *fmt == '.'
-			  || *fmt == '+'))
-		break;
-	      fmt++;
+	      break;
 	    }
 
+	  /* Parse width and precision, putting "*.*" into FMTSTAR.  */
+	  if ('1' <= *fmt && *fmt <= '9')
+	    fmt = parse_format_integer (fmt, &wid);
+	  if (*fmt == '.')
+	    fmt = parse_format_integer (fmt + 1, &prec);
+	  *string++ = '*';
+	  *string++ = '.';
+	  *string++ = '*';
+
 	  /* Check for the length modifiers in textual length order, so
 	     that longer modifiers override shorter ones.  */
 	  for (mlen = 1; mlen <= maxmlen; mlen++)
 	    {
-	      if (format_end - fmt < mlen)
-		break;
 	      if (mlen == 1 && *fmt == 'l')
 		length_modifier = long_modifier;
-	      if (mlen == pDlen && memcmp (fmt, pD, pDlen) == 0)
+	      if (mlen == pDlen && strncmp (fmt, pD, pDlen) == 0)
 		length_modifier = pD_modifier;
-	      if (mlen == pIlen && memcmp (fmt, pI, pIlen) == 0)
+	      if (mlen == pIlen && strncmp (fmt, pI, pIlen) == 0)
 		length_modifier = pI_modifier;
-	      if (mlen == pMlen && memcmp (fmt, PRIdMAX, pMlen) == 0)
+	      if (mlen == pMlen && strncmp (fmt, PRIdMAX, pMlen) == 0)
 		length_modifier = pM_modifier;
 	    }
 
+	  /* Copy optional length modifier and conversion specifier
+	     character into FMTSTAR, and append a NUL.  */
 	  mlen = modifier_len[length_modifier];
-	  memcpy (string, fmt + 1, mlen);
-	  string += mlen;
+	  string = mempcpy (string, fmt, mlen + 1);
 	  fmt += mlen;
 	  *string = 0;
 
-	  /* Make the size bound large enough to handle floating point formats
+	  /* An idea of how much space we might need.
+	     This might be a field width or a precision; e.g.
+	     %1.1000f and %1000.1f both might need 1000+ bytes.
+	     Make it large enough to handle floating point formats
 	     with large numbers.  */
-	  size_bound += SIZE_BOUND_EXTRA;
+	  ptrdiff_t size_bound = max (wid, prec) + SIZE_BOUND_EXTRA;
 
 	  /* Make sure we have that much.  */
 	  if (size_bound > size_allocated)
@@ -257,48 +267,49 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 	      sprintf_buffer = big_buffer;
 	      size_allocated = size_bound;
 	    }
-	  minlen = 0;
+	  int minlen = 0;
+	  ptrdiff_t tem;
 	  switch (*fmt++)
 	    {
 	    default:
-	      error ("Invalid format operation %s", fmtcpy);
+	      error ("Invalid format operation %s", fmt0);
 
-/*	    case 'b': */
-	    case 'l':
 	    case 'd':
 	      switch (length_modifier)
 		{
 		case no_modifier:
 		  {
 		    int v = va_arg (ap, int);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case long_modifier:
 		  {
 		    long v = va_arg (ap, long);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pD_modifier:
 		signed_pD_modifier:
 		  {
 		    ptrdiff_t v = va_arg (ap, ptrdiff_t);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pI_modifier:
 		  {
 		    EMACS_INT v = va_arg (ap, EMACS_INT);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pM_modifier:
 		  {
 		    intmax_t v = va_arg (ap, intmax_t);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
+		default:
+		  eassume (false);
 		}
 	      /* Now copy into final output, truncating as necessary.  */
 	      string = sprintf_buffer;
@@ -311,13 +322,13 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 		case no_modifier:
 		  {
 		    unsigned v = va_arg (ap, unsigned);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case long_modifier:
 		  {
 		    unsigned long v = va_arg (ap, unsigned long);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pD_modifier:
@@ -325,15 +336,17 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 		case pI_modifier:
 		  {
 		    EMACS_UINT v = va_arg (ap, EMACS_UINT);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pM_modifier:
 		  {
 		    uintmax_t v = va_arg (ap, uintmax_t);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
+		default:
+		  eassume (false);
 		}
 	      /* Now copy into final output, truncating as necessary.  */
 	      string = sprintf_buffer;
@@ -344,22 +357,18 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 	    case 'g':
 	      {
 		double d = va_arg (ap, double);
-		tem = sprintf (sprintf_buffer, fmtcpy, d);
+		tem = sprintf (sprintf_buffer, fmtstar, wid, prec, d);
 		/* Now copy into final output, truncating as necessary.  */
 		string = sprintf_buffer;
 		goto doit;
 	      }
 
-	    case 'S':
-	      string[-1] = 's';
-	      FALLTHROUGH;
 	    case 's':
-	      if (fmtcpy[1] != 's')
-		minlen = atoi (&fmtcpy[1]);
+	      minlen = minusflag ? -wid : wid;
 	      string = va_arg (ap, char *);
 	      tem = strnlen (string, STRING_BYTES_BOUND + 1);
 	      if (tem == STRING_BYTES_BOUND + 1)
-		error ("String for %%s or %%S format is too long");
+		error ("String for %%s format is too long");
 	      width = strwidth (string, tem);
 	      goto doit1;
 
@@ -432,14 +441,12 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 		string = charbuf;
 		string[tem] = 0;
 		width = strwidth (string, tem);
-		if (fmtcpy[1] != 'c')
-		  minlen = atoi (&fmtcpy[1]);
+		minlen = minusflag ? -wid : wid;
 		goto doit1;
 	      }
 
 	    case '%':
 	      /* Treat this '%' as normal.  */
-	      fmt0 = fmt - 1;
 	      break;
 	    }
 	}
@@ -450,13 +457,13 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 	src = uLSQM, srclen = sizeof uLSQM - 1;
       else if (quoting_style == CURVE_QUOTING_STYLE && fmtchar == '\'')
 	src = uRSQM, srclen = sizeof uRSQM - 1;
-      else if (quoting_style == STRAIGHT_QUOTING_STYLE && fmtchar == '`')
-	src = "'", srclen = 1;
       else
 	{
-	  while (fmt < format_end && !CHAR_HEAD_P (*fmt))
-	    fmt++;
-	  src = fmt0, srclen = fmt - fmt0;
+	  if (quoting_style == STRAIGHT_QUOTING_STYLE && fmtchar == '`')
+	    fmtchar = '\'';
+	  eassert (ASCII_CHAR_P (fmtchar));
+	  *bufptr++ = fmtchar;
+	  continue;
 	}
 
       if (bufsize < srclen)
@@ -479,8 +486,6 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
   xfree (big_buffer);
 
   *bufptr = 0;		/* Make sure our string ends with a '\0' */
-
-  SAFE_FREE ();
   return bufptr - buffer;
 }
 
@@ -495,10 +500,9 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 ptrdiff_t
 esprintf (char *buf, char const *format, ...)
 {
-  ptrdiff_t nbytes;
   va_list ap;
   va_start (ap, format);
-  nbytes = doprnt (buf, TYPE_MAXIMUM (ptrdiff_t), format, 0, ap);
+  ptrdiff_t nbytes = doprnt (buf, TYPE_MAXIMUM (ptrdiff_t), format, ap);
   va_end (ap);
   return nbytes;
 }
@@ -534,10 +538,9 @@ evxprintf (char **buf, ptrdiff_t *bufsize,
 {
   for (;;)
     {
-      ptrdiff_t nbytes;
       va_list ap_copy;
       va_copy (ap_copy, ap);
-      nbytes = doprnt (*buf, *bufsize, format, 0, ap_copy);
+      ptrdiff_t nbytes = doprnt (*buf, *bufsize, format, ap_copy);
       va_end (ap_copy);
       if (nbytes < *bufsize - 1)
 	return nbytes;
diff --git a/src/lisp.h b/src/lisp.h
index a24898004d..957ca41702 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -4034,8 +4034,8 @@ #define FLOAT_TO_STRING_BUFSIZE 350
 extern void syms_of_print (void);
 
 /* Defined in doprnt.c.  */
-extern ptrdiff_t doprnt (char *, ptrdiff_t, const char *, const char *,
-			 va_list);
+extern ptrdiff_t doprnt (char *, ptrdiff_t, const char *, va_list)
+  ATTRIBUTE_FORMAT_PRINTF (3, 0);
 extern ptrdiff_t esprintf (char *, char const *, ...)
   ATTRIBUTE_FORMAT_PRINTF (2, 3);
 extern ptrdiff_t exprintf (char **, ptrdiff_t *, char const *, ptrdiff_t,
diff --git a/src/sysdep.c b/src/sysdep.c
index e161172a79..790ae084d3 100644
--- a/src/sysdep.c
+++ b/src/sysdep.c
@@ -2192,7 +2192,7 @@ snprintf (char *buf, size_t bufsize, char const *format, ...)
   if (size)
     {
       va_start (ap, format);
-      nbytes = doprnt (buf, size, format, 0, ap);
+      nbytes = doprnt (buf, size, format, ap);
       va_end (ap);
     }
 
diff --git a/src/xdisp.c b/src/xdisp.c
index 615f0ca7cf..213c2a464a 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -11269,13 +11269,10 @@ vmessage (const char *m, va_list ap)
 	{
 	  if (m)
 	    {
-	      ptrdiff_t len;
 	      ptrdiff_t maxsize = FRAME_MESSAGE_BUF_SIZE (f);
 	      USE_SAFE_ALLOCA;
 	      char *message_buf = SAFE_ALLOCA (maxsize + 1);
-
-	      len = doprnt (message_buf, maxsize, m, 0, ap);
-
+	      ptrdiff_t len = doprnt (message_buf, maxsize, m, ap);
 	      message3 (make_string (message_buf, len));
 	      SAFE_FREE ();
 	    }
-- 
2.17.1

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#43439; Package emacs. (Wed, 16 Sep 2020 14:59:01 GMT) Full text and rfc822 format available.

Message #8 received at 43439 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 43439 <at> debbugs.gnu.org
Subject: Re: bug#43439: [PATCH] doprnt improvements
Date: Wed, 16 Sep 2020 17:58:24 +0300

> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Tue, 15 Sep 2020 18:50:51 -0700
> Cc: Paul Eggert <eggert <at> cs.ucla.edu>
> 
> (doprnt): Omit FORMAT_END argument, since it’s always NULL,
> which means doprnt must call strlen on FORMAT; doing this means
> doprnt needs just one pass over FORMAT, not two.  All callers changed.

This loses a feature.  Emacs traditionally supports strings with
embedded null characters, and this feature is in line with that.  It
is true that it is currently unused, but why is it a good idea to
remove it?

If the problem is the slight inefficiency caused by the call to
strlen, we could instead solve it in the callers: all the formats I've
seen are const strings, so the value of FORMAT_END can be computed at
compile time, and used instead of passing NULL.

> Drop support for
> "%S" which is never used and which would cause GCC to warn anyway.

This is an old compatibility feature, I'd rather not drop it.  Who
knows what code relies on the fact that 'message' and 'format-message'
support it?

Thanks.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#43439; Package emacs. (Wed, 16 Sep 2020 22:10:01 GMT) Full text and rfc822 format available.

Message #11 received at 43439 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 43439 <at> debbugs.gnu.org
Subject: Re: bug#43439: [PATCH] doprnt improvements
Date: Wed, 16 Sep 2020 15:09:50 -0700

On 9/16/20 7:58 AM, Eli Zaretskii wrote:

> Emacs traditionally supports strings with
> embedded null characters, and this feature is in line with that.  It
> is true that it is currently unused, but why is it a good idea to
> remove it?

It's a good idea not only because the feature is currently unused and its 
support complicates and adds bugs to the code, but also because it would be a 
bad idea to ever use the feature.

The Emacs feature is for Lisp strings. Emacs does not (and for API reasons, it 
cannot practically) rely on embedded NULs in C strings. Among other things, if 
we tried to use C-style printf formats with embedded NULs, GCC's warnings about 
formats not matching their arguments would stop working. These GCC warnings are 
quite useful for preventing bugs in Emacs's C code and have helped to catch many 
such bugs, and we should not give them up.

More generally, the vestigial support for NULs and %S in doprnt's C formats 
dates back to long ago, before GCC warned about these features. It may have made 
sense back then but it does not make sense now. Any C-level formatting facility 
that supports NULs and %S should not attempt to use the longstanding printf API 
that is incompatible with such support - it should be a separate facility. And 
Emacs C code already has a non-printf facility for formatting with NULs and %S - 
e.g., Fformat and Fformat_message - so it doesn't need yet another one.

> If the problem is the slight inefficiency caused by the call to
> strlen, we could instead solve it in the callers: all the formats I've
> seen are const strings, so the value of FORMAT_END can be computed at
> compile time, and used instead of passing NULL.

This would require unnecessary complication of the code and runtime overhead. 
doprnt is called directly only in few places: esprintf, evxprintf, vmessage, and 
the (rarely-if-ever used) snprintf replacement. If we modified callers to do 
what you're suggesting, we'd need to modify the callers of these functions, and 
their callers too, all the way back to the original ancestor call that specifies 
the format. This would clutter and bloat the code and would add runtime cost to 
all the callers. For example, we'd have to change all calls to the 'error' 
function from something like this:

  if (ret < GNUTLS_E_SUCCESS)
    error ("GnuTLS AEAD cipher %s/%s initialization failed: %s",
	   gnutls_cipher_get_name (gca), desc, emacs_gnutls_strerror (ret));

to something like this:

  if (ret < GNUTLS_E_SUCCESS)
    {
      char const msg[] = "GnuTLS AEAD cipher %s/%s initialization failed: %s";
      error (msg, sizeof msg - 1,
	     gnutls_cipher_get_name (gca), desc, emacs_gnutls_strerror (ret));
    }

Of course we could invent a new macro ERROR to package this up, but such a macro 
would still be less efficient than what we have, and worse it would not always 
work, for cases like this:

  if (NILP (tem) || (XBUFFER (tem) != current_buffer))
    error (for_region ? "The mark is not set now, so there is no region"
	   : "The mark is not set now");

or like this:

      error (format, string); // in x_check_errors

And of course this could be gotten around as well, but we're now talking about a 
reasonably large amount of code surgery that will hurt code readability and 
runtime performance, all to support a low-level C feature that Emacs does not 
use and won't ever reasonably use.

Instead, we should leave most of the C code alone and just adjust 'doprnt' and 
its very few callers.

>> Drop support for
>> "%S" which is never used and which would cause GCC to warn anyway.
> 
> This is an old compatibility feature, I'd rather not drop it.  Who
> knows what code relies on the fact that 'message' and 'format-message'
> support it?

I know because I checked all the code. No Emacs C code uses %S. And none is 
likely to use it in the future because GCC warns about it if you try. (To be 
specific: GCC warns unless you use %S compatibly with its standard C meaning, 
which differs from that of Emacs doprnt - which is yet another compatibility 
minefield if we insist on keeping doprnt's unused %S feature.)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#43439; Package emacs. (Fri, 18 Sep 2020 07:31:02 GMT) Full text and rfc822 format available.

Message #14 received at 43439 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 43439 <at> debbugs.gnu.org
Subject: Re: bug#43439: [PATCH] doprnt improvements
Date: Fri, 18 Sep 2020 10:30:44 +0300

> Cc: 43439 <at> debbugs.gnu.org
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Wed, 16 Sep 2020 15:09:50 -0700
> 
> On 9/16/20 7:58 AM, Eli Zaretskii wrote:
> 
> > Emacs traditionally supports strings with
> > embedded null characters, and this feature is in line with that.  It
> > is true that it is currently unused, but why is it a good idea to
> > remove it?
> 
> It's a good idea not only because the feature is currently unused and its 
> support complicates and adds bugs to the code, but also because it would be a 
> bad idea to ever use the feature.
> 
> The Emacs feature is for Lisp strings. Emacs does not (and for API reasons, it 
> cannot practically) rely on embedded NULs in C strings. Among other things, if 
> we tried to use C-style printf formats with embedded NULs, GCC's warnings about 
> formats not matching their arguments would stop working. These GCC warnings are 
> quite useful for preventing bugs in Emacs's C code and have helped to catch many 
> such bugs, and we should not give them up.

How about a compromise: we modify doprint to exit when either it finds
NUL or reaches the character specified by FORMAT_END?  This will allow
us to keep some of the feature, and I think the amount of changes will
be smaller.  It should also not be much slower than what you propose.

> More generally, the vestigial support for NULs and %S in doprnt's C formats 
> dates back to long ago, before GCC warned about these features.

I understand your desire to have GCC warnings about this, but GCC is a
tool, it shouldn't dictate what features we keep and which ones we
drop.  We should do it the other way around.  doprnt callers don't
change much, so the GCC diagnostic features are not very important in
this case.

Thanks.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#43439; Package emacs. (Thu, 15 Oct 2020 18:00:02 GMT) Full text and rfc822 format available.

Message #17 received at 43439 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 43439 <at> debbugs.gnu.org
Subject: Re: bug#43439: [PATCH] doprnt improvements
Date: Thu, 15 Oct 2020 10:58:49 -0700

[Message part 1 (text/plain, inline)]

On 9/18/20 12:30 AM, Eli Zaretskii wrote:

> How about ... we modify doprint to exit when either it finds
> NUL or reaches the character specified by FORMAT_END?  This will allow
> us to keep some of the feature, and I think the amount of changes will
> be smaller.  It should also not be much slower than what you propose.

Better yet, let's leave doprnt's API unchanged, and add a function evsnprintf 
(named by analogy from esprintf) whose API is like C vsnprintf but which does 
formatting the Emacs way. We can avoid duplication of code by implementing 
doprnt in terms of evsnprintf. This fixes the performance issue with current 
Emacs, and avoids the need for evsnprintf having to check for both NULs and 
FORMAT_END etc. Updated patch attached.

[0001-New-function-evsnprintf-to-speed-clean-up-doprnt.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#43439; Package emacs. (Thu, 15 Oct 2020 18:13:01 GMT) Full text and rfc822 format available.

Message #20 received at 43439 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 43439 <at> debbugs.gnu.org
Subject: Re: bug#43439: [PATCH] doprnt improvements
Date: Thu, 15 Oct 2020 21:12:05 +0300

> Cc: 43439 <at> debbugs.gnu.org
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Thu, 15 Oct 2020 10:58:49 -0700
> 
> On 9/18/20 12:30 AM, Eli Zaretskii wrote:
> 
> > How about ... we modify doprint to exit when either it finds
> > NUL or reaches the character specified by FORMAT_END?  This will allow
> > us to keep some of the feature, and I think the amount of changes will
> > be smaller.  It should also not be much slower than what you propose.
> 
> Better yet, let's leave doprnt's API unchanged, and add a function evsnprintf 
> (named by analogy from esprintf) whose API is like C vsnprintf but which does 
> formatting the Emacs way.

No, let's not, please.  I didn't agree to modifying doprnt in
significant ways, so you are now suggesting to do an even more radical
modification, just under another name?  This is moving away of a
potential compromise point, not towards it.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#43439; Package emacs. (Thu, 15 Oct 2020 18:51:01 GMT) Full text and rfc822 format available.

Message #23 received at 43439 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 43439 <at> debbugs.gnu.org
Subject: Re: bug#43439: [PATCH] doprnt improvements
Date: Thu, 15 Oct 2020 11:50:48 -0700

On 10/15/20 11:12 AM, Eli Zaretskii wrote:
> I didn't agree to modifying doprnt in
> significant ways, so you are now suggesting to do an even more radical
> modification, just under another name?

If you'd rather have the patch keep doprnt entirely as-is (i.e., not change 
doprnt's implementation at all), I can easily modify the patch to do that.

All current Emacs code that calls doprnt would benefit from switching to the 
proposed evsnprintf function, an API that is simpler and faster and that has 
better static checking with GCC.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#43439; Package emacs. (Thu, 15 Oct 2020 19:06:01 GMT) Full text and rfc822 format available.

Message #26 received at 43439 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 43439 <at> debbugs.gnu.org
Subject: Re: bug#43439: [PATCH] doprnt improvements
Date: Thu, 15 Oct 2020 22:05:16 +0300

> Cc: 43439 <at> debbugs.gnu.org
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Thu, 15 Oct 2020 11:50:48 -0700
> 
> On 10/15/20 11:12 AM, Eli Zaretskii wrote:
> > I didn't agree to modifying doprnt in
> > significant ways, so you are now suggesting to do an even more radical
> > modification, just under another name?
> 
> If you'd rather have the patch keep doprnt entirely as-is (i.e., not change 
> doprnt's implementation at all), I can easily modify the patch to do that.

No, I want to continue calling doprnt directly, not replace all its
calls with a call to another function.  doprnt by itself is useless
unless it is used by the relevant primitives.  I see no need to
replace it with another function, because doprnt works and works well.

> All current Emacs code that calls doprnt would benefit from switching to the 
> proposed evsnprintf function, an API that is simpler and faster and that has 
> better static checking with GCC.

Yes, that's exactly where we disagree.  I made my proposal to find
some kind of middle ground, and was disappointed to see you suggesting
to move even farther from a potential agreement.

In general, I'm against messing with code that has been stable for
ages, for ephemeral benefits or minor stylistic reasons.  If nothing
else, it gets in the way of maintaining Emacs because code I've known
for years and could find with my eyes closed constantly shifts and
changes under my feet.  Another example of this is that src/lisp.h
macros look nowadays completely different from what they were several
years ago.  This need to constantly unlearn that which was burned into
my muscle memory is not pleasant at all.  Changes that take us forward
because they are needed for new and improved features are welcome and
justified, but there are no new features in all those changes,
including in the doprnt patch.  I wish this fever would stop.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#43439; Package emacs. (Thu, 15 Oct 2020 20:07:01 GMT) Full text and rfc822 format available.

Message #29 received at 43439 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 43439 <at> debbugs.gnu.org
Subject: Re: bug#43439: [PATCH] doprnt improvements
Date: Thu, 15 Oct 2020 13:06:26 -0700

[Message part 1 (text/plain, inline)]

On 10/15/20 12:05 PM, Eli Zaretskii wrote:
> I want to continue calling doprnt directly, not replace all its
> calls with a call to another function.

OK, then attached is a patch that does things that way. This patch affects only 
the implementation of doprnt; no doprnt callers are affected. The code should 
have most of the proposed performance benefits of the earlier patches I proposed.

[0001-Improve-doprnt-performance.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#43439; Package emacs. (Sat, 17 Oct 2020 18:33:01 GMT) Full text and rfc822 format available.

Message #32 received at 43439 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 43439 <at> debbugs.gnu.org
Subject: Re: bug#43439: [PATCH] doprnt improvements
Date: Sat, 17 Oct 2020 21:32:44 +0300

> Cc: 43439 <at> debbugs.gnu.org
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Thu, 15 Oct 2020 13:06:26 -0700
> 
> On 10/15/20 12:05 PM, Eli Zaretskii wrote:
> > I want to continue calling doprnt directly, not replace all its
> > calls with a call to another function.
> 
> OK, then attached is a patch that does things that way. This patch affects only 
> the implementation of doprnt; no doprnt callers are affected. The code should 
> have most of the proposed performance benefits of the earlier patches I proposed.

I'm sorry, but this is still nowhere near the compromise I suggested.
And it loses %S, something I didn't agree to.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#43439; Package emacs. (Sun, 18 Oct 2020 02:25:02 GMT) Full text and rfc822 format available.

Message #35 received at 43439 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 43439 <at> debbugs.gnu.org
Subject: Re: bug#43439: [PATCH] doprnt improvements
Date: Sat, 17 Oct 2020 19:24:04 -0700

[Message part 1 (text/plain, inline)]

On 10/17/20 11:32 AM, Eli Zaretskii wrote:
> this is still nowhere near the compromise I suggested.
> And it loses %S, something I didn't agree to.

Attached is a revised patch that implements what you suggested, and it does not 
lose %S.

Here is what you suggested in 
<https://debbugs.gnu.org/cgi/bugreport.cgi?bug=43439#14>:

> How about a compromise: we modify doprint to exit when either it finds
> NUL or reaches the character specified by FORMAT_END?

and the attached patch implements this proposed API change.

[0001-Improve-doprnt-performance.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#43439; Package emacs. (Sat, 24 Oct 2020 10:40:01 GMT) Full text and rfc822 format available.

Message #38 received at 43439 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 43439 <at> debbugs.gnu.org
Subject: Re: bug#43439: [PATCH] doprnt improvements
Date: Sat, 24 Oct 2020 13:39:37 +0300

> Cc: 43439 <at> debbugs.gnu.org
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Sat, 17 Oct 2020 19:24:04 -0700
> 
> On 10/17/20 11:32 AM, Eli Zaretskii wrote:
> > this is still nowhere near the compromise I suggested.
> > And it loses %S, something I didn't agree to.
> 
> Attached is a revised patch that implements what you suggested, and it does not 
> lose %S.
> 
> Here is what you suggested in 
> <https://debbugs.gnu.org/cgi/bugreport.cgi?bug=43439#14>:
> 
> > How about a compromise: we modify doprint to exit when either it finds
> > NUL or reaches the character specified by FORMAT_END?
> 
> and the attached patch implements this proposed API change.

Thanks, I'm okay with installing this on master.

Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Sat, 24 Oct 2020 21:03:01 GMT) Full text and rfc822 format available.

Notification sent to Paul Eggert <eggert <at> cs.ucla.edu>:
bug acknowledged by developer. (Sat, 24 Oct 2020 21:03:02 GMT) Full text and rfc822 format available.

Message #43 received at 43439-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 43439-done <at> debbugs.gnu.org
Subject: Re: bug#43439: [PATCH] doprnt improvements
Date: Sat, 24 Oct 2020 14:02:25 -0700

[Message part 1 (text/plain, inline)]

OK, I installed the patch, followed by the attached minor cleanups. Closing the 
bug report.

[0001-Rename-doprnt_nul-to-doprnt_non_null_end.patch (text/x-patch, attachment)]

[0002-Minor-doprnt-cleanup-remove-memchr-call.patch (text/x-patch, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 22 Nov 2020 12:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 149 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #43439 [PATCH] doprnt improvements

GNU bug report logs - #43439
[PATCH] doprnt improvements