GNU bug report logs - #65659
RFC: changing printf(1) behavior on %b

Previous Next

Package: coreutils;

Reported by: Eric Blake <eblake <at> redhat.com>

Date: Thu, 31 Aug 2023 15:37:02 UTC

Severity: normal

To reply to this bug, email your comments to 65659 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#65659; Package coreutils. (Thu, 31 Aug 2023 15:37:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Eric Blake <eblake <at> redhat.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Thu, 31 Aug 2023 15:37:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: bug-coreutils <at> gnu.org, bug-bash <at> gnu.org
Cc: austin-group-l <at> opengroup.org
Subject: RFC: changing printf(1) behavior on %b
Date: Thu, 31 Aug 2023 10:35:59 -0500
In today's Austin Group call, we discussed the fact that printf(1) has
mandated behavior for %b (escape sequence processing similar to XSI
echo) that will eventually conflict with C2x's desire to introduce %b
to printf(3) (to produce 0b000... binary literals).

For POSIX Issue 8, we plan to mark the current semantics of %b in
printf(1) as obsolescent (it would continue to work, because Issue 8
targets C17 where there is no conflict with C2x), but with a Future
Directions note that for Issue 9, we could remove %b entirely, or
(more likely) make %b output binary literals just like C.  But that
raises the question of whether the escape-sequence processing
semantics of %b should still remain available under the standard,
under some other spelling, since relying on XSI echo is still not
portable.

One of the observations made in the meeting was that currently, both
the POSIX spec for printf(1) as seen at [1], and the POSIX and C
standard (including the upcoming C2x standard) for printf(3) as seen
at [3] state that both the ' and # flag modifiers are currently
undefined when applied to %s.

[1] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/printf.html
"The format operand shall be used as the format string described in
XBD File Format Notation[2] with the following exceptions:..."

[2] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap05.html#tag_05
"The flag characters and their meanings are: ...
# The value shall be converted to an alternative form. For c, d, i, u,
  and s conversion specifiers, the behavior is undefined.
[and no mention of ']"

[3] https://pubs.opengroup.org/onlinepubs/9699919799/functions/printf.html
"The flag characters and their meanings are:
' [CX] [Option Start] (The <apostrophe>.) The integer portion of the
  result of a decimal conversion ( %i, %d, %u, %f, %F, %g, or %G )
  shall be formatted with thousands' grouping characters. For other
  conversions the behavior is undefined. The non-monetary grouping
  character is used. [Option End]
...
# Specifies that the value is to be converted to an alternative
  form. For o conversion, it shall increase the precision, if and only
  if necessary, to force the first digit of the result to be a zero
  (if the value and precision are both 0, a single 0 is printed). For
  x or X conversion specifiers, a non-zero result shall have 0x (or
  0X) prefixed to it. For a, A, e, E, f, F, g, and G conversion
  specifiers, the result shall always contain a radix character, even
  if no digits follow the radix character. Without this flag, a radix
  character appears in the result of these conversions only if a digit
  follows it. For g and G conversion specifiers, trailing zeros shall
  not be removed from the result as they normally are. For other
  conversion specifiers, the behavior is undefined."

Thus, it appears that both %#s and %'s are available for use for
future standardization.  Typing-wise, %#s as a synonym for %b is
probably going to be easier (less shell escaping needed).  Is there
any interest in a patch to coreutils or bash that would add such a
synonym, to make it easier to leave that functionality in place for
POSIX Issue 9 even when %b is repurposed to align with C2x?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org





Information forwarded to bug-coreutils <at> gnu.org:
bug#65659; Package coreutils. (Thu, 31 Aug 2023 18:33:02 GMT) Full text and rfc822 format available.

Message #8 received at 65659 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: coreutils <at> gnu.org
Cc: 65659 <at> debbugs.gnu.org
Subject: [PATCH] printf: add %#s alias to %b
Date: Thu, 31 Aug 2023 13:31:57 -0500
POSIX Issue 8 will be obsoleting %b (escape sequence interpolation) so
that future Issue 9 can change to having %b (binary literal output)
that aligns with C2x.  But since escape interpolation may still remain
useful, POSIX suggested %#s (which is undefined in all versions of C)
as a possible alias for the older %b behavior.

* src/printf.c (print_formatted, usage): Support %#s as an alias
for %b, in order to open doors to future repurposing of %b to
binary output while still allowing access to its old behavior.
* doc/coreutils.texi (printf invocation): Document it.
* NEWS: Likewise.
* tests/printf/printf-quote.sh: Add unit test coverage.
Fixes: https://bugs.gnu.org/65659
---
 NEWS                         | 9 +++++++++
 doc/coreutils.texi           | 3 ++-
 src/printf.c                 | 9 ++++++---
 tests/printf/printf-quote.sh | 4 +++-
 4 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/NEWS b/NEWS
index c00ff0cf2..2e2acc7cd 100644
--- a/NEWS
+++ b/NEWS
@@ -7,6 +7,15 @@ GNU coreutils NEWS                                    -*- outline -*-
   numfmt options like --suffix no longer have an arbitrary 127-byte limit.
   [bug introduced with numfmt in coreutils-8.21]

+** Changes in behavior
+
+  'printf' now treats the format sequence '%#s' as an alias for the
+  older '%b' meaning escape sequence interpolation.  It is anticipated
+  that a future version of coreutils will change the meaning of '%b'
+  to output binary literals, comparable to the meaning being added to
+  printf(3) by C23, which will leave '%#s' (which C does not specify)
+  as the only way to access the older behavior of escape
+  interpolation.

 * Noteworthy changes in release 9.4 (2023-08-29) [stable]

diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 373a407ed..81a8ca4ff 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -13331,7 +13331,8 @@ printf invocation

 @item
 @kindex %b
-An additional directive @samp{%b}, prints its
+@kindex %#s
+An additional directive @samp{%#s}, or its older alias @samp{%b}, prints its
 argument string with @samp{\} escapes interpreted in the same way as in
 the @var{format} string, except that octal escapes are of the form
 @samp{\0 <at> var{ooo}} where @var{ooo} is 0 to 3 octal digits.  If
diff --git a/src/printf.c b/src/printf.c
index 9670d8043..e8866d50e 100644
--- a/src/printf.c
+++ b/src/printf.c
@@ -38,7 +38,7 @@

    Additional directive:

-   %b = print an argument string, interpreting backslash escapes,
+   %b, %#s = print an argument string, interpreting backslash escapes,
      except that octal escapes are of the form \0 or \0ooo.

    %q = print an argument string in a format that can be
@@ -126,8 +126,9 @@ FORMAT controls the output as in C printf.  Interpreted sequences are:\n\
 "), stdout);
       fputs (_("\
   %%      a single %\n\
-  %b      ARGUMENT as a string with '\\' escapes interpreted,\n\
+  %#s     ARGUMENT as a string with '\\' escapes interpreted,\n\
           except that octal escapes are of the form \\0 or \\0NNN\n\
+  %b      obsolescent alias for %#s\n\
   %q      ARGUMENT is printed in a format that can be reused as shell input,\n\
           escaping non-printable characters with the proposed POSIX $'' syntax.\
 \n\n\
@@ -509,7 +510,7 @@ print_formatted (char const *format, int argc, char **argv)
               putchar ('%');
               break;
             }
-          if (*f == 'b')
+          if (*f == 'b' || (*f == '#' && f[1] == 's'))
             {
               /* FIXME: Field width and precision are not supported
                  for %b, even though POSIX requires it.  */
@@ -519,6 +520,8 @@ print_formatted (char const *format, int argc, char **argv)
                   ++argv;
                   --argc;
                 }
+              if (*f == '#')
+                f++;
               break;
             }

diff --git a/tests/printf/printf-quote.sh b/tests/printf/printf-quote.sh
index d1671bd9d..33ce3018e 100755
--- a/tests/printf/printf-quote.sh
+++ b/tests/printf/printf-quote.sh
@@ -22,7 +22,8 @@ print_ver_ printf
 prog='env printf'

 # Equivalent output to ls --quoting=shell-escape
-$prog '%q\n' '' "'" a 'a b' '~a' 'a~' "$($prog %b 'a\r')" > out
+$prog '%q\n' '' "'" a 'a b' '~a' 'a~' "$($prog %b 'a\r')" \
+      "$($prog %#s 'a\r')" > out
 cat <<\EOF > exp || framework_failure_
 ''
 "'"
@@ -31,6 +32,7 @@ a
 '~a'
 a~
 'a'$'\r'
+'a'$'\r'
 EOF
 compare exp out || fail=1

-- 
2.41.0





Information forwarded to bug-coreutils <at> gnu.org:
bug#65659; Package coreutils. (Thu, 31 Aug 2023 19:12:02 GMT) Full text and rfc822 format available.

Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Chet Ramey <chet.ramey <at> case.edu>
To: Eric Blake <eblake <at> redhat.com>, bug-coreutils <at> gnu.org, bug-bash <at> gnu.org
Cc: austin-group-l <at> opengroup.org, chet.ramey <at> case.edu
Subject: Re: RFC: changing printf(1) behavior on %b
Date: Thu, 31 Aug 2023 15:10:58 -0400
On 8/31/23 11:35 AM, Eric Blake wrote:
> In today's Austin Group call, we discussed the fact that printf(1) has
> mandated behavior for %b (escape sequence processing similar to XSI
> echo) that will eventually conflict with C2x's desire to introduce %b
> to printf(3) (to produce 0b000... binary literals).
> 
> For POSIX Issue 8, we plan to mark the current semantics of %b in
> printf(1) as obsolescent (it would continue to work, because Issue 8
> targets C17 where there is no conflict with C2x), but with a Future
> Directions note that for Issue 9, we could remove %b entirely, or
> (more likely) make %b output binary literals just like C.

I doubt I'd ever remove %b, even in posix mode -- it's already been there
for 25 years.

> But that
> raises the question of whether the escape-sequence processing
> semantics of %b should still remain available under the standard,
> under some other spelling, since relying on XSI echo is still not
> portable.
> 
> One of the observations made in the meeting was that currently, both
> the POSIX spec for printf(1) as seen at [1], and the POSIX and C
> standard (including the upcoming C2x standard) for printf(3) as seen
> at [3] state that both the ' and # flag modifiers are currently
> undefined when applied to %s.

Neither one is a very good choice, but `#' is the better one. It at least
has a passing resemblence to the desired functionality.

Why not standardize another character, like %B? I suppose I'll have to look
at the etherpad for the discussion. I think that came up on the mailing
list, but I can't remember the details.

> Is there
> any interest in a patch to coreutils or bash that would add such a
> synonym, to make it easier to leave that functionality in place for
> POSIX Issue 9 even when %b is repurposed to align with C2x?

It's maybe a two or three line change at most.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet <at> case.edu    http://tiswww.cwru.edu/~chet/





Information forwarded to bug-coreutils <at> gnu.org:
bug#65659; Package coreutils. (Thu, 31 Aug 2023 19:36:01 GMT) Full text and rfc822 format available.

Message #14 received at 65659 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eric Blake <eblake <at> redhat.com>, 65659 <at> debbugs.gnu.org, bug-bash <at> gnu.org
Cc: austin-group-l <at> opengroup.org
Subject: Re: bug#65659: RFC: changing printf(1) behavior on %b
Date: Thu, 31 Aug 2023 12:34:48 -0700
On 2023-08-31 08:35, Eric Blake wrote:
> Typing-wise, %#s as a synonym for %b is
> probably going to be easier (less shell escaping needed).  Is there
> any interest in a patch to coreutils or bash that would add such a
> synonym, to make it easier to leave that functionality in place for
> POSIX Issue 9 even when %b is repurposed to align with C2x?

Sounds good to me for coreutils.




Information forwarded to bug-coreutils <at> gnu.org:
bug#65659; Package coreutils. (Thu, 31 Aug 2023 20:03:02 GMT) Full text and rfc822 format available.

Message #17 received at 65659 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Chet Ramey <chet.ramey <at> case.edu>
Cc: 65659 <at> debbugs.gnu.org, bug-bash <at> gnu.org, austin-group-l <at> opengroup.org
Subject: Re: bug#65659: RFC: changing printf(1) behavior on %b
Date: Thu, 31 Aug 2023 15:02:22 -0500
On Thu, Aug 31, 2023 at 03:10:58PM -0400, Chet Ramey wrote:
> On 8/31/23 11:35 AM, Eric Blake wrote:
> > In today's Austin Group call, we discussed the fact that printf(1) has
> > mandated behavior for %b (escape sequence processing similar to XSI
> > echo) that will eventually conflict with C2x's desire to introduce %b
> > to printf(3) (to produce 0b000... binary literals).
> > 
> > For POSIX Issue 8, we plan to mark the current semantics of %b in
> > printf(1) as obsolescent (it would continue to work, because Issue 8
> > targets C17 where there is no conflict with C2x), but with a Future
> > Directions note that for Issue 9, we could remove %b entirely, or
> > (more likely) make %b output binary literals just like C.
> 
> I doubt I'd ever remove %b, even in posix mode -- it's already been there
> for 25 years.

But the longer that printf(3) supports "%b" to output binary values,
the more surprised new shell coders will be that printf(1) %b does not
behave the same.  What's more, other languages have already started
using %b for binary output (python, for example), so it is definitely
gaining in mindshare.

That said, I also agree with your desire to keep the functionality in
place.  The current POSIX says that %b was added so that on a non-XSI
system, you could do:

my_echo() {
  printf %b\\n "$*"
}

and then call my_echo everywhere that a script used to depend on XSI
echo (perhaps by 'alias echo=my_echo' with aliases enabled), for a
much quicker portability hack than a tedious search-and-replace of
every echo call that requires manual inspection of its arguments for
translation of any XSI escape sequences into printf format
specifications.  In particular, code like [var='...\c'; echo "$var"]
cannot be changed to use printf by a mere s/echo/printf %s\\n/.  Thus,
when printf was invented and standardized for the shell, the solution
at the time was to create [printf %b\\n "$var"] as a drop-in
replacement for XSI [echo "$var"], even for platforms without XSI
echo.

Nowadays, I personally have not seen very many scripts like this in
the wild (for example, autoconf scripts prefer to directly use printf,
rather than trying to shoe-horn behavior into echo).  But assuming
such legacy scripts still exist, it is still much easier to rewrite
just the my_echo wrapper to now use %#s\\n instead of %b\\n, than it
would be to find every callsite of my_echo.

Bash already has shopt -s xpg_echo; I could easily see this being a
case where you toggle between the old or new behavior of %b (while
keeping %#s always at the old behavior) by either this or some other
shopt in bash, so that newer script writers that want binary output
for %b can do so with one setting, while scripts that must continue to
run under old semantics can likewise do so.

> 
> > But that
> > raises the question of whether the escape-sequence processing
> > semantics of %b should still remain available under the standard,
> > under some other spelling, since relying on XSI echo is still not
> > portable.
> > 
> > One of the observations made in the meeting was that currently, both
> > the POSIX spec for printf(1) as seen at [1], and the POSIX and C
> > standard (including the upcoming C2x standard) for printf(3) as seen
> > at [3] state that both the ' and # flag modifiers are currently
> > undefined when applied to %s.
> 
> Neither one is a very good choice, but `#' is the better one. It at least
> has a passing resemblence to the desired functionality.

Indeed, that's what the Austin Group settled on today after I first
wrote my initial email, and what I wrote up in a patch to GNU
Coreutils (https://debbugs.gnu.org/65659)

> 
> Why not standardize another character, like %B? I suppose I'll have to look
> at the etherpad for the discussion. I think that came up on the mailing
> list, but I can't remember the details.

Yes, https://austingroupbugs.net/view.php?id=1771 has a good
discussion of the various ideas.

%B is out for the same reason as %b: although the current C2x draft
wording says that %<capital> is reserved for implementation use, other
than [AEFGX] which already have a history of use by C (as it was, when
C99 added %A, that caused problems for some folks), it goes on to
_highly_ encourage any implementation that adds %b for "0b0" binary
output also add %B for "0B0" binary output (to match the x/X
dichotomy).  Burning %B to retain the old behavior while repurposing
%b to output lower-case binary values is thus a non-starter, while
burning %#s (which C says is undefined) felt nicer.

The Austin Group also felt that standardizing bash's behavior of %q/%Q
for outputting quoted text, while too late for Issue 8, has a good
chance of success, even though C says %q is reserved for
standardization by C. Our reasoning there is that lots of libc over
the years have used %qi as a synonym for %lli, and C would be foolish
to burn %q for anything that does not match those semantics at the C
language level; which means it will likely never be claimed by C and
thus free for use by shell in the way that bash has already done.

> 
> > Is there
> > any interest in a patch to coreutils or bash that would add such a
> > synonym, to make it easier to leave that functionality in place for
> > POSIX Issue 9 even when %b is repurposed to align with C2x?
> 
> It's maybe a two or three line change at most.

Yeah, creating an alias proved to be pretty simple in coreutils; I
spent more time documenting it than I did writing the code changes.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org





Information forwarded to bug-coreutils <at> gnu.org:
bug#65659; Package coreutils. (Thu, 31 Aug 2023 20:49:02 GMT) Full text and rfc822 format available.

Message #20 received at 65659 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Eric Blake <eblake <at> redhat.com>, coreutils <at> gnu.org
Cc: 65659 <at> debbugs.gnu.org
Subject: Re: [PATCH] printf: add %#s alias to %b
Date: Thu, 31 Aug 2023 21:48:09 +0100
[Message part 1 (text/plain, inline)]
On 31/08/2023 19:31, Eric Blake wrote:
> POSIX Issue 8 will be obsoleting %b (escape sequence interpolation) so
> that future Issue 9 can change to having %b (binary literal output)
> that aligns with C2x.  But since escape interpolation may still remain
> useful, POSIX suggested %#s (which is undefined in all versions of C)
> as a possible alias for the older %b behavior.
> 
> * src/printf.c (print_formatted, usage): Support %#s as an alias
> for %b, in order to open doors to future repurposing of %b to
> binary output while still allowing access to its old behavior.
> * doc/coreutils.texi (printf invocation): Document it.
> * NEWS: Likewise.
> * tests/printf/printf-quote.sh: Add unit test coverage.

Patch looks good thanks.
I'd add in the attached test addition.

As for compat, I notice that existing coreutils will reject %#s,
while bash 5.2.15, ksh 1.0.4, dash 0.5.12 will treat as %s.

cheers,
Pádraig
[printf-esc-test.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#65659; Package coreutils. (Thu, 31 Aug 2023 21:12:01 GMT) Full text and rfc822 format available.

Message #23 received at 65659 <at> debbugs.gnu.org (full text, mbox):

From: Emanuele Torre <torreemanuele6 <at> gmail.com>
To: Eric Blake <eblake <at> redhat.com>
Cc: 65659 <at> debbugs.gnu.org, bug-bash <at> gnu.org, austin-group-l <at> opengroup.org,
 Chet Ramey <chet.ramey <at> case.edu>
Subject: Re: bug#65659: RFC: changing printf(1) behavior on %b
Date: Thu, 31 Aug 2023 23:11:39 +0200
On Thu, Aug 31, 2023 at 03:02:22PM -0500, Eric Blake wrote:
> On Thu, Aug 31, 2023 at 03:10:58PM -0400, Chet Ramey wrote:
> > Why not standardize another character, like %B? I suppose I'll have to look
> > at the etherpad for the discussion. I think that came up on the mailing
> > list, but I can't remember the details.
> 
> Yes, https://austingroupbugs.net/view.php?id=1771 has a good
> discussion of the various ideas.
> 
> %B is out for the same reason as %b: although the current C2x draft
> wording says that %<capital> is reserved for implementation use, other
> than [AEFGX] which already have a history of use by C (as it was, when
> C99 added %A, that caused problems for some folks), it goes on to
> _highly_ encourage any implementation that adds %b for "0b0" binary
> output also add %B for "0B0" binary output (to match the x/X
> dichotomy).  Burning %B to retain the old behavior while repurposing
> %b to output lower-case binary values is thus a non-starter, while
> burning %#s (which C says is undefined) felt nicer.

Also note that, in ksh93, %B is already used for something else.
It interprets its argument as a variable name, and dereferences it:
`printf %B PWD' is similar to `printf %s "$PWD"' (assuming PWD is a
string variable).

o/
 emanuele6




Information forwarded to bug-coreutils <at> gnu.org:
bug#65659; Package coreutils. (Fri, 01 Sep 2023 06:15:02 GMT) Full text and rfc822 format available.

Message #26 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stephane Chazelas <stephane <at> chazelas.org>
To: Eric Blake <eblake <at> redhat.com>
Cc: bug-coreutils <at> gnu.org, bug-bash <at> gnu.org, austin-group-l <at> opengroup.org
Subject: Re: RFC: changing printf(1) behavior on %b
Date: Fri, 1 Sep 2023 07:13:36 +0100
2023-08-31 10:35:59 -0500, Eric Blake via austin-group-l at The Open Group:
> In today's Austin Group call, we discussed the fact that printf(1) has
> mandated behavior for %b (escape sequence processing similar to XSI
> echo) that will eventually conflict with C2x's desire to introduce %b
> to printf(3) (to produce 0b000... binary literals).
[...]

Is C2x's %b already set in stone?

ksh93's printf (and I'd  expect ast's standalone printf) has
%<width>[,<precision>[,<base>]d to output a number in an
arbitrary base which IMO seems like a better approach than
introducing a new specifier for every base.

$ printf '%..2d\n' 63
111111
$ printf '0b%.8.2d\n' 63
0b00111111
$ printf '%#.8.2d\n' 63
2#00111111

The one thing it can't do though is left-space-padding of 0b1111.

printf %b is used in countless scripts especially the more
correct/portable ones that use it to work around the portability
fiasco that is echo's escape sequence expansion. I can't imagine
it going away. Hard to imagine the C folks overlooked it, I'd
expect printf %b to be known by any shell scripter.

-- 
Stephane




Information forwarded to bug-coreutils <at> gnu.org:
bug#65659; Package coreutils. (Fri, 01 Sep 2023 06:31:04 GMT) Full text and rfc822 format available.

Message #29 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Phi Debian <phi.debian <at> gmail.com>
To: chet.ramey <at> case.edu
Cc: bug-coreutils <at> gnu.org, Eric Blake <eblake <at> redhat.com>, bug-bash <at> gnu.org,
 austin-group-l <at> opengroup.org
Subject: Re: RFC: changing printf(1) behavior on %b
Date: Fri, 1 Sep 2023 06:40:54 +0200
[Message part 1 (text/plain, inline)]
On Thu, Aug 31, 2023 at 9:11 PM Chet Ramey <chet.ramey <at> case.edu> wrote:

>
> I doubt I'd ever remove %b, even in posix mode -- it's already been there
> for 25 years.
>
>
> Neither one is a very good choice, but `#' is the better one. It at least
> has a passing resemblence to the desired functionality.
>
> Why not standardize another character, like %B? I suppose I'll have to look
> at the etherpad for the discussion. I think that came up on the mailing
> list, but I can't remember the details.
>
>
Glad I red this thread before replying to the other one dealing with the
same issue.

I once worked on an issue on ksh93 regarding printf discrepency vs libc
printf, and got replied that "ksh is not C". I Think we got to admit that
shell's printf have departed from libc since long and now if a feature in
libc appears and collide with printf(1) then we got to get yet another %
exception char. In bash docco I see %b %q and %(datefmt...), so for a new
feature we should get something that we think libc as little chance to
target.

My vote is for posix_printf %B mapping to libc_printf %b, with the idea
that libc has little chance to have %B meaning UPPERCASE BINARY :-),  as %x
%X do.

And yet one more line in the docco explaining this divergence.
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#65659; Package coreutils. (Fri, 01 Sep 2023 06:31:04 GMT) Full text and rfc822 format available.

Message #32 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Phi Debian <phi.debian <at> gmail.com>
To: chet.ramey <at> case.edu
Cc: bug-coreutils <at> gnu.org, Eric Blake <eblake <at> redhat.com>, bug-bash <at> gnu.org,
 austin-group-l <at> opengroup.org
Subject: Re: RFC: changing printf(1) behavior on %b
Date: Fri, 1 Sep 2023 07:19:13 +0200
[Message part 1 (text/plain, inline)]
Well after reading yet another thread regarding libc_printf() I got to
admit that even %B is crossed out, (Yet already choosen by ksh93)

The other thread also speak about libc_printf() documentting %# as
undefined for things other than  a, A, e, E, f, F, g, and G, yet the same
thread also talk about a A comming late (citing C99) in the dance, meaning
what is undefined today become defined tomorow, so %#b is no safer.

My guess is that printf(1) is now doomed to follow its route, keep its old
format exception, and then may be implement something like c_printf like
printf but the format string follow libc semantic, or may be a -C option to
printf(1)...

Well in all case %b can not change semantic in the bash script, since it is
there for so long, even if it depart from python, perl, libc, it is
unfortunate but that's the way it is, nobody want a semantic change, and on
next routers update, see the all internet falling appart :-)
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#65659; Package coreutils. (Fri, 01 Sep 2023 06:44:02 GMT) Full text and rfc822 format available.

Message #35 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stephane Chazelas <stephane <at> chazelas.org>
To: Eric Blake <eblake <at> redhat.com>, bug-coreutils <at> gnu.org, bug-bash <at> gnu.org,
 austin-group-l <at> opengroup.org
Subject: Re: RFC: changing printf(1) behavior on %b
Date: Fri, 1 Sep 2023 07:42:45 +0100
2023-09-01 07:13:36 +0100, Stephane Chazelas via austin-group-l at The Open Group:
> 2023-08-31 10:35:59 -0500, Eric Blake via austin-group-l at The Open Group:
> > In today's Austin Group call, we discussed the fact that printf(1) has
> > mandated behavior for %b (escape sequence processing similar to XSI
> > echo) that will eventually conflict with C2x's desire to introduce %b
> > to printf(3) (to produce 0b000... binary literals).
> [...]
> 
> Is C2x's %b already set in stone?
> 
> ksh93's printf (and I'd  expect ast's standalone printf) has
> %<width>[,<precision>[,<base>]d to output a number in an
> arbitrary base which IMO seems like a better approach than
> introducing a new specifier for every base.
[...]

For completeness, several shells also support expanding integers
in arbitrary bases.

Like ksh's

typeset -i2 binary=123

already there in ksh85, possibly earlier, also available in
pdksh and derivatives and zsh.

Originally with the base number not specified the output base
was derived from the first assignment like typeset -i var;
var='2#111' would get you a $var that expands in binary. Looks
like that was discontinued in ksh93, but it's still there in
mksh or zsh.

And there's also:

$ echo $(( [#2] 16 )) $(( [##2] 16 ))
2#10000 10000

In zsh (note that you don't get 0b10000 upon $(( [#2] 16 ))
after set -o cbases).

If bash added:

printf -v var %..2 16

à la ksh93, that would bridge that gap.

How to output/expand numbers in bases other thn 8, 10, 16 is a
recurring question for bash, with people generally surprised
that it can *input* numbers in any base, but not *output* in any
base.

See
https://unix.stackexchange.com/questions/415077/how-to-add-two-hexadecimal-numbers-in-a-bash-script/415107#415107
https://unix.stackexchange.com/questions/616215/bash-arithmetic-outputs-result-in-decimal
https://unix.stackexchange.com/questions/749988/arbitrary-base-conversion-from-base-10-using-only-builtins-in-bash
to list only a few.

-- 
Stephane




Information forwarded to bug-coreutils <at> gnu.org:
bug#65659; Package coreutils. (Fri, 01 Sep 2023 06:45:02 GMT) Full text and rfc822 format available.

Message #38 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Oğuz <oguzismailuysal <at> gmail.com>
To: Phi Debian <phi.debian <at> gmail.com>
Cc: bug-coreutils <at> gnu.org, Eric Blake <eblake <at> redhat.com>, bug-bash <at> gnu.org,
 austin-group-l <at> opengroup.org, chet.ramey <at> case.edu
Subject: Re: RFC: changing printf(1) behavior on %b
Date: Fri, 1 Sep 2023 09:44:08 +0300
On Fri, Sep 1, 2023 at 7:41 AM Phi Debian <phi.debian <at> gmail.com> wrote:
> My vote is for posix_printf %B mapping to libc_printf %b

In the shell we already have bc for base conversion. Does POSIX really
have to support C2x %b in the first place?




Information forwarded to bug-coreutils <at> gnu.org:
bug#65659; Package coreutils. (Fri, 01 Sep 2023 07:13:01 GMT) Full text and rfc822 format available.

Message #41 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stephane Chazelas <stephane <at> chazelas.org>
To: Oğuz <oguzismailuysal <at> gmail.com>
Cc: chet.ramey <at> case.edu, bug-coreutils <at> gnu.org, austin-group-l <at> opengroup.org,
 Phi Debian <phi.debian <at> gmail.com>, bug-bash <at> gnu.org,
 Eric Blake <eblake <at> redhat.com>
Subject: Re: RFC: changing printf(1) behavior on %b
Date: Fri, 1 Sep 2023 08:12:33 +0100
2023-09-01 09:44:08 +0300, Oğuz via austin-group-l at The Open Group:
> On Fri, Sep 1, 2023 at 7:41 AM Phi Debian <phi.debian <at> gmail.com> wrote:
> > My vote is for posix_printf %B mapping to libc_printf %b
> 
> In the shell we already have bc for base conversion. Does POSIX really
> have to support C2x %b in the first place?

Yes, though note:

- that implies forking a process and loading an external
  executable and its libraries
- bc is not always available. It's not installed by default on
  Debian for instance.
- for bases over 16, it uses some unusual representation that
  can't be used anywhere.

A summary of some options for some common POSIX-like shells at
https://unix.stackexchange.com/questions/191205/bash-base-conversion-from-decimal-to-hex/191209#191209

-- 
Stephane




Information forwarded to bug-coreutils <at> gnu.org:
bug#65659; Package coreutils. (Fri, 01 Sep 2023 08:00:02 GMT) Full text and rfc822 format available.

Message #44 received at 65659 <at> debbugs.gnu.org (full text, mbox):

From: Stephane Chazelas <stephane <at> chazelas.org>
To: Eric Blake <eblake <at> redhat.com>
Cc: 65659 <at> debbugs.gnu.org, bug-bash <at> gnu.org, austin-group-l <at> opengroup.org,
 Chet Ramey <chet.ramey <at> case.edu>
Subject: Re: bug#65659: RFC: changing printf(1) behavior on %b
Date: Fri, 1 Sep 2023 08:59:19 +0100
2023-08-31 15:02:22 -0500, Eric Blake via austin-group-l at The Open Group:
[...]
> The current POSIX says that %b was added so that on a non-XSI
> system, you could do:
> 
> my_echo() {
>   printf %b\\n "$*"
> }

That is dependant on the current value of $IFS. You'd need:

xsi_echo() (
  IFS=' '
  printf '%b\n' "$*"
)

Or the other alternatives listed at
https://unix.stackexchange.com/questions/65803/why-is-printf-better-than-echo/65819#65819

[...]
> Bash already has shopt -s xpg_echo

Note that in bash, you need both

shopt -s xpg_echo
set -o posix

To get a XSI echo. Without the latter, options are still
recognised. You can get a XSI echo without those options with:

xsi_echo() {
  local IFS=' ' -
  set +o posix
  echo -e "$*\n\c"
}

The addition of those \n\c (noop) avoids arguments being treated as
options if they start with -.


[...]
> The Austin Group also felt that standardizing bash's behavior of %q/%Q
> for outputting quoted text, while too late for Issue 8, has a good
> chance of success, even though C says %q is reserved for
> standardization by C. Our reasoning there is that lots of libc over
> the years have used %qi as a synonym for %lli, and C would be foolish
> to burn %q for anything that does not match those semantics at the C
> language level; which means it will likely never be claimed by C and
> thus free for use by shell in the way that bash has already done.
[...]

Note that %q is from ksh93, not bash and is not portable across
implementations and with most including bash's gives an output
that is not safe for reinput in arbitrary locales (as it uses
$'...' in some cases), not sure  it's a good idea to add it to
the standard, or at least it should come with fat warnings about
the risk in using it.

See also:

https://unix.stackexchange.com/questions/379181/escape-a-variable-for-use-as-content-of-another-script/600214#600214

-- 
Stephane




Information forwarded to bug-coreutils <at> gnu.org:
bug#65659; Package coreutils. (Fri, 01 Sep 2023 12:16:02 GMT) Full text and rfc822 format available.

Message #47 received at 65659 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Stephane Chazelas <stephane <at> chazelas.org>
Cc: 65659 <at> debbugs.gnu.org, bug-bash <at> gnu.org, austin-group-l <at> opengroup.org,
 Chet Ramey <chet.ramey <at> case.edu>
Subject: Re: bug#65659: RFC: changing printf(1) behavior on %b
Date: Fri, 1 Sep 2023 07:15:14 -0500
On Fri, Sep 01, 2023 at 08:59:19AM +0100, Stephane Chazelas wrote:
> 2023-08-31 15:02:22 -0500, Eric Blake via austin-group-l at The Open Group:
> [...]
> > The current POSIX says that %b was added so that on a non-XSI
> > system, you could do:
> > 
> > my_echo() {
> >   printf %b\\n "$*"
> > }
> 
> That is dependant on the current value of $IFS. You'd need:
> 
> xsi_echo() (
>   IFS=' '
>   printf '%b\n' "$*"
> )

Let's read the standard in context (Issue 8 draft 3 page 2793 line 92595):

"
The printf utility can be used portably to emulate any of the traditional behaviors of the echo
utility as follows (assuming that IFS has its standard value or is unset):
• The historic System V echo and the requirements on XSI implementations in this volume of
  POSIX.1-202x are equivalent to:
    printf "%b\n" "$*"
"

So yes, the standard does mention the requirement to have a sane IFS,
and I failed to include that in my one-off implementation of
my_echo().  Thank you for pointing out a more robust version.

> 
> Or the other alternatives listed at
> https://unix.stackexchange.com/questions/65803/why-is-printf-better-than-echo/65819#65819
> 
> [...]
> > Bash already has shopt -s xpg_echo
> 
> Note that in bash, you need both
> 
> shopt -s xpg_echo
> set -o posix
> 
> To get a XSI echo. Without the latter, options are still
> recognised. You can get a XSI echo without those options with:
> 
> xsi_echo() {
>   local IFS=' ' -
>   set +o posix
>   echo -e "$*\n\c"
> }
> 
> The addition of those \n\c (noop) avoids arguments being treated as
> options if they start with -.

As an extension, Bash (and Coreutils) happen to honor \c always, and
not just for %b.  But POSIX only requires \c handling for %b.

And while Issue 8 has taken steps to allow implementations to support
'echo -e', it is still not standardized behavior; so your xsi_echo()
is bash-specific (which is not necessarily a problem, as long as you
are aware it is not portable).

> [...]
> > The Austin Group also felt that standardizing bash's behavior of %q/%Q
> > for outputting quoted text, while too late for Issue 8, has a good
> > chance of success, even though C says %q is reserved for
> > standardization by C. Our reasoning there is that lots of libc over
> > the years have used %qi as a synonym for %lli, and C would be foolish
> > to burn %q for anything that does not match those semantics at the C
> > language level; which means it will likely never be claimed by C and
> > thus free for use by shell in the way that bash has already done.
> [...]
> 
> Note that %q is from ksh93, not bash and is not portable across
> implementations and with most including bash's gives an output
> that is not safe for reinput in arbitrary locales (as it uses
> $'...' in some cases), not sure  it's a good idea to add it to
> the standard, or at least it should come with fat warnings about
> the risk in using it.

%q is NOT being added to Issue 8, but $'...' is.  Bug 1771 asked if %q
could be added to Issue 8, but it came it past the deadline for
feature requests, so the best we could do is add a FUTURE DIRECTIONS
blurb that mentions the idea.  But since FUTURE DIRECTIONS is
non-normative, we can always change our mind in Issue 9 and delete
that text if it turns out we can't get consensus to standardize some
form of %q/%Q after all.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org





Information forwarded to bug-coreutils <at> gnu.org:
bug#65659; Package coreutils. (Fri, 01 Sep 2023 12:55:01 GMT) Full text and rfc822 format available.

Message #50 received at 65659 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Phi Debian <phi.debian <at> gmail.com>
Cc: 65659 <at> debbugs.gnu.org, bug-bash <at> gnu.org, austin-group-l <at> opengroup.org,
 chet.ramey <at> case.edu
Subject: Re: bug#65659: RFC: changing printf(1) behavior on %b
Date: Fri, 1 Sep 2023 07:54:02 -0500
On Fri, Sep 01, 2023 at 07:19:13AM +0200, Phi Debian wrote:
> Well after reading yet another thread regarding libc_printf() I got to
> admit that even %B is crossed out, (Yet already choosen by ksh93)
> 
> The other thread also speak about libc_printf() documentting %# as
> undefined for things other than  a, A, e, E, f, F, g, and G, yet the same
> thread also talk about a A comming late (citing C99) in the dance, meaning
> what is undefined today become defined tomorow, so %#b is no safer.
>

Caution: The proposal here is for %#s (an alternative string), not %#b
(which C2x wants to be similar to %#x, in that it outputs a '0b'
prefix for all values except bare '0').

Yes, there is a slight risk that C may decide to define %#s.  But as
the Austin Group includes a member of WG14, we are able to advise the
C committee that such an addition is not wise.

> My guess is that printf(1) is now doomed to follow its route, keep its old
> format exception, and then may be implement something like c_printf like
> printf but the format string follow libc semantic, or may be a -C option to
> printf(1)...

Adding an option to printf is also a possibility, if there is
wide-spread implementation practice to standardize.  If someone wants
to implement 'printf -C' right now, that could help feed such a future
standardization.  But it is somewhat orthogonal to the request in this
thread, which is how to allow users to still access the old %b
behavior even if %b gets repurposed in the future; if we can get
multiple implementations to add a %#s alias now, it makes the future
decisions easier (even if it is too late for Issue 8 to add any new
features, or for that matter, to make any normative changes other than
marking %b obsolescent as a way to be able to revisit it in the future
for Issue 9).


> 
> Well in all case %b can not change semantic in the bash script, since it is
> there for so long, even if it depart from python, perl, libc, it is
> unfortunate but that's the way it is, nobody want a semantic change, and on
> next routers update, see the all internet falling appart :-)

How many scripts in the wild actually use %b, though?  And if there
are such scripts, anything we can do to make it easy to do a drop-in
replacement that still preserves the old behavior (such as changing %b
to %#s) is going to be easier to audit than the only other
currently-portable alternative of actually analyzing the string to see
if it uses any octal or \c escapes that have to be re-written to
portably function as a printf format argument.

POSIX is not mandating %#s at this time, so much as suggesting that if
implementations are willing to implement it now, it will make Issue 9
easier to reason about.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org





Information forwarded to bug-coreutils <at> gnu.org:
bug#65659; Package coreutils. (Fri, 01 Sep 2023 15:33:01 GMT) Full text and rfc822 format available.

Message #53 received at 65659 <at> debbugs.gnu.org (full text, mbox):

From: Stephane Chazelas <stephane <at> chazelas.org>
To: Eric Blake <eblake <at> redhat.com>
Cc: 65659 <at> debbugs.gnu.org, bug-bash <at> gnu.org, austin-group-l <at> opengroup.org,
 Chet Ramey <chet.ramey <at> case.edu>
Subject: Re: bug#65659: RFC: changing printf(1) behavior on %b
Date: Fri, 1 Sep 2023 16:31:55 +0100
2023-09-01 07:15:14 -0500, Eric Blake:
[...]
> > Note that in bash, you need both
> > 
> > shopt -s xpg_echo
> > set -o posix
> > 
> > To get a XSI echo. Without the latter, options are still
> > recognised. You can get a XSI echo without those options with:
> > 
> > xsi_echo() {
> >   local IFS=' ' -
> >   set +o posix
> >   echo -e "$*\n\c"
> > }
> > 
> > The addition of those \n\c (noop) avoids arguments being treated as
> > options if they start with -.
> 
> As an extension, Bash (and Coreutils) happen to honor \c always, and
> not just for %b.  But POSIX only requires \c handling for %b.
> 
> And while Issue 8 has taken steps to allow implementations to support
> 'echo -e', it is still not standardized behavior; so your xsi_echo()
> is bash-specific (which is not necessarily a problem, as long as you
> are aware it is not portable).
[...]

Yes, none of local (from ash I believe), the posix option
(several shells have an option called posix all used to improve
POSIX conformance, bash may have been the first) nor -e (from
Research Unix v8) are standard, that part was about bash
specifically (as the thread is also posted on gnu.bash.bug).

BTW, that xsi_echo is not strictly equivalent to a XSI echo in
the case where the last character of the last argument is an unescaped
backslash or a character whose encoding ends in the same byte as
the encoding of backslash.

-- 
Stephane




Information forwarded to bug-coreutils <at> gnu.org:
bug#65659; Package coreutils. (Fri, 01 Sep 2023 17:36:02 GMT) Full text and rfc822 format available.

Message #56 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Martin D Kealey <martin <at> kurahaupo.gen.nz>
To: Eric Blake <eblake <at> redhat.com>
Cc: bug-coreutils <at> gnu.org, bug-bash <bug-bash <at> gnu.org>,
 austin-group-l <at> opengroup.org
Subject: Re: RFC: changing printf(1) behavior on %b
Date: Sat, 2 Sep 2023 01:28:55 +1000
[Message part 1 (text/plain, inline)]
<devils_advocate>If compatibility with C is really that important,
shouldn't we be fixing %c? Its current behaviour as a synonym for %.1s
doesn't provide significant utility, and arguably differs from C's "take an
int and output the corresponding single byte", not "take the first byte of
a string and output that".
</devils_advocate>

Whilst I wouldn't object to adding %#s (or %#b for that matter), I'm
uncomfortable about changing existing behaviour, especially when it's just
for the sake of linguistic simplicity in the standard.)

Plenty of projects have functions that accept a format string and pass it
through to printf (sometimes with names like warnf, errorf, panicf); it
would be non trivial to locate indirect format string parameters. An
estimate of "a few years" is WAY short of the timeframe needed to weed out
old usage; embedded devices typically run the same version of bash from the
time they leave the factory until they reach the scrap disassembly plant
(or landfill) a decade or more later.

One of the benefits of printf over echo is that there aren't two mutually
incompatible ways of interpreting the data; this would take us back to the
bad old days of having to dynamically select the format string depending on
which version of the Shell the script is running under.

Please no.

-Martin

On Fri, 1 Sept 2023 at 01:35, Eric Blake <eblake <at> redhat.com> wrote:

> In today's Austin Group call, we discussed the fact that printf(1) has
> mandated behavior for %b (escape sequence processing similar to XSI
> echo) that will eventually conflict with C2x's desire to introduce %b
> to printf(3) (to produce 0b000... binary literals).
>
> For POSIX Issue 8, we plan to mark the current semantics of %b in
> printf(1) as obsolescent (it would continue to work, because Issue 8
> targets C17 where there is no conflict with C2x), but with a Future
> Directions note that for Issue 9, we could remove %b entirely, or
> (more likely) make %b output binary literals just like C.  But that
> raises the question of whether the escape-sequence processing
> semantics of %b should still remain available under the standard,
> under some other spelling, since relying on XSI echo is still not
> portable.
>
> One of the observations made in the meeting was that currently, both
> the POSIX spec for printf(1) as seen at [1], and the POSIX and C
> standard (including the upcoming C2x standard) for printf(3) as seen
> at [3] state that both the ' and # flag modifiers are currently
> undefined when applied to %s.
>
> [1] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/printf.html
> "The format operand shall be used as the format string described in
> XBD File Format Notation[2] with the following exceptions:..."
>
> [2]
> https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap05.html#tag_05
> "The flag characters and their meanings are: ...
> # The value shall be converted to an alternative form. For c, d, i, u,
>   and s conversion specifiers, the behavior is undefined.
> [and no mention of ']"
>
> [3] https://pubs.opengroup.org/onlinepubs/9699919799/functions/printf.html
> "The flag characters and their meanings are:
> ' [CX] [Option Start] (The <apostrophe>.) The integer portion of the
>   result of a decimal conversion ( %i, %d, %u, %f, %F, %g, or %G )
>   shall be formatted with thousands' grouping characters. For other
>   conversions the behavior is undefined. The non-monetary grouping
>   character is used. [Option End]
> ...
> # Specifies that the value is to be converted to an alternative
>   form. For o conversion, it shall increase the precision, if and only
>   if necessary, to force the first digit of the result to be a zero
>   (if the value and precision are both 0, a single 0 is printed). For
>   x or X conversion specifiers, a non-zero result shall have 0x (or
>   0X) prefixed to it. For a, A, e, E, f, F, g, and G conversion
>   specifiers, the result shall always contain a radix character, even
>   if no digits follow the radix character. Without this flag, a radix
>   character appears in the result of these conversions only if a digit
>   follows it. For g and G conversion specifiers, trailing zeros shall
>   not be removed from the result as they normally are. For other
>   conversion specifiers, the behavior is undefined."
>
> Thus, it appears that both %#s and %'s are available for use for
> future standardization.  Typing-wise, %#s as a synonym for %b is
> probably going to be easier (less shell escaping needed).  Is there
> any interest in a patch to coreutils or bash that would add such a
> synonym, to make it easier to leave that functionality in place for
> POSIX Issue 9 even when %b is repurposed to align with C2x?
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.
> Virtualization:  qemu.org | libguestfs.org
>
>
>
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#65659; Package coreutils. (Fri, 01 Sep 2023 18:11:01 GMT) Full text and rfc822 format available.

Message #59 received at 65659 <at> debbugs.gnu.org (full text, mbox):

From: Stephane Chazelas <stephane <at> chazelas.org>
To: Eric Blake <eblake <at> redhat.com>
Cc: Phi Debian <phi.debian <at> gmail.com>, 65659 <at> debbugs.gnu.org, bug-bash <at> gnu.org,
 austin-group-l <at> opengroup.org, chet.ramey <at> case.edu
Subject: Re: bug#65659: RFC: changing printf(1) behavior on %b
Date: Fri, 1 Sep 2023 19:10:24 +0100
2023-09-01 07:54:02 -0500, Eric Blake via austin-group-l at The Open Group:
[...]
> > Well in all case %b can not change semantic in the bash script, since it is
> > there for so long, even if it depart from python, perl, libc, it is
> > unfortunate but that's the way it is, nobody want a semantic change, and on
> > next routers update, see the all internet falling appart :-)
> 
> How many scripts in the wild actually use %b, though?  And if there
> are such scripts, anything we can do to make it easy to do a drop-in
> replacement that still preserves the old behavior (such as changing %b
> to %#s) is going to be easier to audit than the only other
> currently-portable alternative of actually analyzing the string to see
> if it uses any octal or \c escapes that have to be re-written to
> portably function as a printf format argument.
[...]

FWIW, a "printf %b" github shell code search returns ~ 29k
entries
(https://github.com/search?q=printf+%25b+language%3AShell&type=code&l=Shell)

That likely returns only a small subset of the code that uses
printf with %b inside the format and probably a few false
positives, but that gives many examples of how printf %b is used
in practice.

printf %b is also what all serious literature about shell
scripting has been recommending to use in place of the
unportable echo -e (or XSI echo, or print without -r). That
includes the POSIX standard which has been recommending using
printf instead of the non-portable echo for 30 years.

So that change will also invalidate all those. It will take a
while before %#s is supported widely enough that %b can be
safely replaced with %#s

-- 
Stephane




Information forwarded to bug-coreutils <at> gnu.org:
bug#65659; Package coreutils. (Sat, 02 Sep 2023 08:25:03 GMT) Full text and rfc822 format available.

Message #62 received at 65659 <at> debbugs.gnu.org (full text, mbox):

From: Steffen Nurpmeso <steffen <at> sdaoden.eu>
To: "Stephane Chazelas via austin-group-l at The Open Group"
 <austin-group-l <at> opengroup.org>
Cc: chet.ramey <at> case.edu, Phi Debian <phi.debian <at> gmail.com>, bug-bash <at> gnu.org,
 65659 <at> debbugs.gnu.org, Eric Blake <eblake <at> redhat.com>,
 Steffen Nurpmeso <steffen <at> sdaoden.eu>
Subject: Re: bug#65659: RFC: changing printf(1) behavior on %b
Date: Fri, 01 Sep 2023 23:28:50 +0200
Stephane Chazelas via austin-group-l at The Open Group wrote in
 <20230901181024.pwx4plwclz7ijv5a <at> chazelas.org>:
 |2023-09-01 07:54:02 -0500, Eric Blake via austin-group-l at The Open Group:
 ...
 |> How many scripts in the wild actually use %b, though?  And if there
 |> are such scripts, anything we can do to make it easy to do a drop-in
 |> replacement that still preserves the old behavior (such as changing %b
 |> to %#s) is going to be easier to audit than the only other
 |> currently-portable alternative of actually analyzing the string to see
 |> if it uses any octal or \c escapes that have to be re-written to
 |> portably function as a printf format argument.
 |[...]
 |
 |FWIW, a "printf %b" github shell code search returns ~ 29k
 |entries
 |(https://github.com/search?q=printf+%25b+language%3AShell&type=code&l=Sh\
 |ell)
 |
 |That likely returns only a small subset of the code that uses
 |printf with %b inside the format and probably a few false
 |positives, but that gives many examples of how printf %b is used
 |in practice.

Actually this returns a huge amount of false positives where
printf(1) and %b are not on the same line, let alone the same
command, if you just scroll down a bit it starts like neovim match

 pr_title="${pr_title// /,}" # Replace spaces with commas.
 pr_title="$(printf 'vim-patch:%s' "${pr_title#,}")"

(bash only btw).
Furthermore it shows a huge amount of false use cases like

 printf >&2 "%b\n" "The following warnings and non-fatal errors were encountered during the installation process:"

This is only the first result page.
It seems people think you need this to get colours mostly, which
then, it has to be said, is also practically mislead.  (To the
best of *my* knowledge that is.)

Ah it is a copy&paste world, and for one Stephane at stackoverflow
there are 99 that fool and mislead you, or do not know for sure
themselves, but also copy and paste!

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)




Information forwarded to bug-coreutils <at> gnu.org:
bug#65659; Package coreutils. (Sat, 02 Sep 2023 08:25:03 GMT) Full text and rfc822 format available.

Message #65 received at 65659 <at> debbugs.gnu.org (full text, mbox):

From: Phi Debian <phi.debian <at> gmail.com>
To: Stephane Chazelas <stephane <at> chazelas.org>
Cc: 65659 <at> debbugs.gnu.org, Eric Blake <eblake <at> redhat.com>, bug-bash <at> gnu.org,
 austin-group-l <at> opengroup.org, chet.ramey <at> case.edu
Subject: Re: bug#65659: RFC: changing printf(1) behavior on %b
Date: Sat, 2 Sep 2023 07:46:25 +0200
[Message part 1 (text/plain, inline)]
On Fri, Sep 1, 2023 at 8:10 PM Stephane Chazelas <stephane <at> chazelas.org>
wrote:

> 2023-09-01 07:54:02 -0500, Eric Blake via austin-group-l at The Open Group:
>
>
> FWIW, a "printf %b" github shell code search returns ~ 29k
> entries
> (
> https://github.com/search?q=printf+%25b+language%3AShell&type=code&l=Shell
> )
>
>
Ha super, at least some numbers :-), I didn't knew we could make this kind
of request... thanx for that.
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#65659; Package coreutils. (Sat, 02 Sep 2023 08:50:02 GMT) Full text and rfc822 format available.

Message #68 received at 65659 <at> debbugs.gnu.org (full text, mbox):

From: Stephane Chazelas <stephane <at> chazelas.org>
To: Stephane Chazelas via austin-group-l at The Open Group
 <austin-group-l <at> opengroup.org>, 
 Eric Blake <eblake <at> redhat.com>, Phi Debian <phi.debian <at> gmail.com>,
 chet.ramey <at> case.edu, 65659 <at> debbugs.gnu.org, bug-bash <at> gnu.org,
 Steffen Nurpmeso <steffen <at> sdaoden.eu>
Subject: Re: bug#65659: RFC: changing printf(1) behavior on %b
Date: Sat, 2 Sep 2023 09:49:12 +0100
2023-09-01 23:28:50 +0200, Steffen Nurpmeso via austin-group-l at The Open Group:
[...]
>  |FWIW, a "printf %b" github shell code search returns ~ 29k
>  |entries
>  |(https://github.com/search?q=printf+%25b+language%3AShell&type=code&l=Sh\
>  |ell)
>  |
>  |That likely returns only a small subset of the code that uses
>  |printf with %b inside the format and probably a few false
>  |positives, but that gives many examples of how printf %b is used
>  |in practice.
> 
> Actually this returns a huge amount of false positives where
> printf(1) and %b are not on the same line, let alone the same
> command, if you just scroll down a bit it starts like neovim match
[...]

You're right, I only looked at the first few results and saw
that already gave interesting ones.

Apparently, we can also search with regexps and searching for
printf.*%b
(https://github.com/search?q=%2Fprintf.*%25b%2F+language%3AShell&type=code)
It's probably a lot more accurate. It returns ~ 19k.

(still FWIW, that's still just a sample of random code on the
internet)

[...]
> Furthermore it shows a huge amount of false use cases like
> 
>  printf >&2 "%b\n" "The following warnings and non-fatal errors were encountered during the installation process:"
[...]

Yes, I also see a lot of echo -e stuff that should have been
echo -E stuff (or echo alone in those (many) implementations
that don't expand by default or use the more reliable printf
with %s (not %b)).

> It seems people think you need this to get colours mostly, which
> then, it has to be said, is also practically mislead.  (To the
> best of *my* knowledge that is.)
[...]

Incidentally, ANSI terminal colour escape sequences are somewhat
connecting those two %b's as they are RGB (well BGR) in binary
(white is 7 = 0b111, red 0b001, green 0b010, blue 0b100), with:

R=0 G=1 B=1
printf '%bcyan%b\n' "\033[3$(( 2#$B$G$R ))m" '\033[m'

(with Korn-like shells, also $(( 0b$B$G$R )) in zsh though zsh
has builtin colour output support including RGB-based).

Speaking of stackexchange, on the June data dump of
unix.stackexchange.com:

stackexchange/unix.stackexchange.com$ xml2 < Posts.xml | grep -c 'printf.*%b'
494

(FWIW)

Compared with %d (though that will have entries for printf(3) as well):

stackexchange/unix.stackexchange.com$ xml2 < Posts.xml | grep -c 'printf.*%d'
3444

-- 
Stephane




Information forwarded to bug-coreutils <at> gnu.org:
bug#65659; Package coreutils. (Sat, 02 Sep 2023 18:15:01 GMT) Full text and rfc822 format available.

Message #71 received at 65659 <at> debbugs.gnu.org (full text, mbox):

From: Steffen Nurpmeso <steffen <at> sdaoden.eu>
To: Stephane Chazelas <stephane <at> chazelas.org>
Cc: chet.ramey <at> case.edu, Stephane Chazelas via austin-group-l at The Open Group
 <austin-group-l <at> opengroup.org>, Phi Debian <phi.debian <at> gmail.com>,
 bug-bash <at> gnu.org, 65659 <at> debbugs.gnu.org, Eric Blake <eblake <at> redhat.com>,
 Steffen Nurpmeso <steffen <at> sdaoden.eu>
Subject: Re: bug#65659: RFC: changing printf(1) behavior on %b
Date: Sat, 02 Sep 2023 19:57:11 +0200
Stephane Chazelas wrote in
 <20230902084912.vdfedsgbnat2w25n <at> chazelas.org>:
 |2023-09-01 23:28:50 +0200, Steffen Nurpmeso via austin-group-l at The \
 |Open Group:
 ...
 |>|FWIW, a "printf %b" github shell code search returns ~ 29k
 |>|entries
 |>|(https://github.com/search?q=printf+%25b+language%3AShell&type=code&l=Sh\
 |>|ell)
 ...
 |> Actually this returns a huge amount of false positives where
 |> printf(1) and %b are not on the same line, let alone the same
 ...
 |Apparently, we can also search with regexps and searching for
 |printf.*%b
 |(https://github.com/search?q=%2Fprintf.*%25b%2F+language%3AShell&type=code)
 |It's probably a lot more accurate. It returns ~ 19k.
 ...
 |> Furthermore it shows a huge amount of false use cases like
 ...
 |Yes, I also see a lot of echo -e stuff that should have been
 |echo -E stuff (or echo alone in those (many) implementations
 |that don't expand by default or use the more reliable printf
 |with %s (not %b)).
 |
 |> It seems people think you need this to get colours mostly, which
 ...
 |Incidentally, ANSI terminal colour escape sequences are somewhat
 |connecting those two %b's as they are RGB (well BGR) in binary
 |(white is 7 = 0b111, red 0b001, green 0b010, blue 0b100), with:
 |
 |R=0 G=1 B=1
 |printf '%bcyan%b\n' "\033[3$(( 2#$B$G$R ))m" '\033[m'
 |
 |(with Korn-like shells, also $(( 0b$B$G$R )) in zsh though zsh
 |has builtin colour output support including RGB-based).

..and, off-topic, but in my opinion that is also false usage, one
should use tput(1) instead, and then simply printf(1) (or echo(1)
(or cat(1))) the output, something like, fwiw :),

  color_init() {
          [ -n "${NO_COLOUR}" ] && return
          # We do not want color for "make test > .LOG"!
          if [ -t 1 ] && command -v tput >/dev/null 2>&1; then
                  { sgr0=$(tput sgr0); } 2>/dev/null
                  [ $? -eq 0 ] || return
                  { saf1=$(tput setaf 1); } 2>/dev/null
                  [ $? -eq 0 ] || return
                  { saf2=$(tput setaf 2); } 2>/dev/null
                  [ $? -eq 0 ] || return
                  { saf3=$(tput setaf 3); } 2>/dev/null
                  [ $? -eq 0 ] || return
                  { saf5=$(tput setaf 5); } 2>/dev/null
                  [ $? -eq 0 ] || return
                  { b=$(tput bold); } 2>/dev/null
                  [ $? -eq 0 ] || return

                  COLOR_ERR_ON=${saf1}${b} COLOR_ERR_OFF=${sgr0}
                  COLOR_DBGERR_ON=${saf5} COLOR_DBGERR_OFF=${sgr0}
                  COLOR_WARN_ON=${saf3}${b} COLOR_WARN_OFF=${sgr0}
                  COLOR_OK_ON=${saf2} COLOR_OK_OFF=${sgr0}
                  unset saf1 saf2 saf3 b
          fi
  }

  ...

  printf '%s%s%s' "${COLOR_WARN_ON}" "$SOME_MSG" "${COLOR_WARN_OFF}"

Of course this is also only ANSI via sgr0 (:-|

 |Speaking of stackexchange, on the June data dump of
 |unix.stackexchange.com:
 |
 |stackexchange/unix.stackexchange.com$ xml2 < Posts.xml | grep -c 'printf\
 |.*%b'
 |494
 |
 |(FWIW)
 |
 |Compared with %d (though that will have entries for printf(3) as well):
 |
 |stackexchange/unix.stackexchange.com$ xml2 < Posts.xml | grep -c 'printf\
 |.*%d'
 |3444

I am totally stunned by the ratio.  I myself have never used %b
(like this, aka for printf).

 --End of <20230902084912.vdfedsgbnat2w25n <at> chazelas.org>

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)




This bug report was last modified 250 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.