GNU bug report logs - #63398
28.2; Doc or behavior of replacement commands (e.g. `replace-string')

Previous Next

Package: emacs;

Reported by: Drew Adams <drew.adams <at> oracle.com>

Date: Tue, 9 May 2023 20:14:01 UTC

Severity: normal

Found in version 28.2

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 63398 in the body.
You can then email your comments to 63398 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#63398; Package emacs. (Tue, 09 May 2023 20:14:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Drew Adams <drew.adams <at> oracle.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Tue, 09 May 2023 20:14:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: "bug-gnu-emacs <at> gnu.org" <bug-gnu-emacs <at> gnu.org>
Subject: 28.2; Doc or behavior of replacement commands (e.g. `replace-string')
Date: Tue, 9 May 2023 20:13:04 +0000
emacs -Q

;; So `search-upper-case' is `not-yanks', and `case-fold-search' and
;; `case-replace' are both `t'.

In *scratch* enter this text:

Test 0
test 0

At bob, use `M-x replace-string RET test 0 RET test 1 RET'

(emacs) `Replacement and Lax Matches' seems to say that the result
should be this:

Test 1
test 1

But the result is this:

test 1
test 1

That doc says this:

  If the first argument of a replace command is all lower case, the
  command ignores case while searching for occurrences to replace-provided
  'case-fold-search' is non-'nil' and 'search-upper-case' is also
  non-'nil'.

OK, that's respected; both lines are found during the search.  Good.
___

BTW, it's unfortunate that we use an em dash char here, with no
preceding or following space chars.  Why?  Because it reads as if it
were a hyphen, producing adjective "replace-provided" modifying noun
`case-fold-search'.  Since we use fixed-width fonts by default, this is
all the more apparent.  Please reword or surround the em dash with space
chars.
___

The doc also says this, however, regarding replacement:

  In addition, when the NEWSTRING argument is all or partly lower case,
  replacement commands try to preserve the case pattern of each
  occurrence.  Thus, the command

     M-x replace-string <RET> foo <RET> bar <RET>

  replaces a lower case 'foo' with a lower case 'bar', an all-caps 'FOO'
  with 'BAR', and a capitalized 'Foo' with 'Bar'.  (These three
  alternatives-lower case, all caps, and capitalized, are the only ones
  that 'replace-string' can distinguish.)

My reading of this is that, since "test 1" is lower-case, the
replacement should "try" (meaning what, exactly? under what
circumstances does such a trial "fail"?) to preserve the case pattern of
the first occurrence, chaning "Test 0" to "Test 1".  That doesn't
happen.

Is the doc wrong?  Is my reading of it wrong?  If my reading and the doc
are right, is the behavior wrong (bugged)?
___

[It's also not very good to refer to argument NEWSTRING in a topic/node
that doesn't define it. Users have to look backward through the doc to
see if they can find out which argument this is talking about.]\
___


In GNU Emacs 28.2 (build 2, x86_64-w64-mingw32)
 of 2022-09-13 built on AVALON
Windowing system distributor 'Microsoft Corp.', version 10.0.19045
System Description: Microsoft Windows 10 Pro (v10.0.2009.19045.2846)

Configured using:
 'configure --with-modules --without-dbus --with-native-compilation
 --without-compress-install CFLAGS=-O2'

Configured features:
ACL GIF GMP GNUTLS HARFBUZZ JPEG JSON LCMS2 LIBXML2 MODULES NATIVE_COMP
NOTIFY W32NOTIFY PDUMPER PNG RSVG SOUND THREADS TIFF TOOLKIT_SCROLL_BARS
XPM ZLIB

(NATIVE_COMP present but libgccjit not available)





Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Wed, 10 May 2023 13:37:02 GMT) Full text and rfc822 format available.

Notification sent to Drew Adams <drew.adams <at> oracle.com>:
bug acknowledged by developer. (Wed, 10 May 2023 13:37:02 GMT) Full text and rfc822 format available.

Message #10 received at 63398-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: 63398-done <at> debbugs.gnu.org
Subject: Re: bug#63398: 28.2;
 Doc or behavior of replacement commands (e.g. `replace-string')
Date: Wed, 10 May 2023 16:37:09 +0300
> From: Drew Adams <drew.adams <at> oracle.com>
> Date: Tue, 9 May 2023 20:13:04 +0000
> 
> The doc also says this, however, regarding replacement:
> 
>   In addition, when the NEWSTRING argument is all or partly lower case,
>   replacement commands try to preserve the case pattern of each
>   occurrence.  Thus, the command
> 
>      M-x replace-string <RET> foo <RET> bar <RET>
> 
>   replaces a lower case 'foo' with a lower case 'bar', an all-caps 'FOO'
>   with 'BAR', and a capitalized 'Foo' with 'Bar'.  (These three
>   alternatives-lower case, all caps, and capitalized, are the only ones
>   that 'replace-string' can distinguish.)
> 
> My reading of this is that, since "test 1" is lower-case, the
> replacement should "try" (meaning what, exactly? under what
> circumstances does such a trial "fail"?) to preserve the case pattern of
> the first occurrence, chaning "Test 0" to "Test 1".  That doesn't
> happen.
> 
> Is the doc wrong?  Is my reading of it wrong?  If my reading and the doc
> are right, is the behavior wrong (bugged)?

The manual says "try", and for a good reason.  There's a heuristics
involved that tries to DTRT.  The "when the NEWSTRING argument is all
or partly lower case" part is relevant.  What you expect will happen
if the original text doesn't include digits, as in

  Testing
  testing

  M-x replace-string RET testing RET foobar RET

> [It's also not very good to refer to argument NEWSTRING in a topic/node
> that doesn't define it. Users have to look backward through the doc to
> see if they can find out which argument this is talking about.]\

Fixed.

> BTW, it's unfortunate that we use an em dash char here, with no
> preceding or following space chars.  Why?  Because it reads as if it
> were a hyphen, producing adjective "replace-provided" modifying noun
> `case-fold-search'.  Since we use fixed-width fonts by default, this is
> all the more apparent.  Please reword or surround the em dash with space
> chars.

In your post the em dash was the ASCII character '-', but on my system
it is an actual em dash -- a much longer character, thus the confusion
is unlikely.  As for why there are no spaces -- that's our style.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63398; Package emacs. (Wed, 10 May 2023 14:21:02 GMT) Full text and rfc822 format available.

Message #13 received at 63398-done <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: "63398-done <at> debbugs.gnu.org" <63398-done <at> debbugs.gnu.org>
Subject: RE: [External] : Re: bug#63398: 28.2; Doc or behavior of replacement
 commands (e.g. `replace-string')
Date: Wed, 10 May 2023 14:20:10 +0000
> The manual says "try", and for a good reason.  There's a heuristics
> involved that tries to DTRT. The "when the NEWSTRING argument is all
> or partly lower case" part is relevant.

Yes, I assumed that.  But seeing "try" still can make
a reader wonder.

> What you expect will happen
> if the original text doesn't include digits, as in
> 
>   Testing
>   testing
> 
>   M-x replace-string RET testing RET foobar RET

Yes, I know.  That's why I included the digits - it's
this case that seems not to follow what the doc says.

Are you perhaps connecting this with your previous
sentence, about success of a "trial" depending on
NEWSTRING being partly or all lower case?  Are you
saying that if there are non-letter chars then what
the doc says might not happen because trying doesn't
succeed?

I guess it's not clear to me whether you're saying
that the behavior isn't what it should be (per the
doc) in this case, but that's unavoidable or OK, or
you're saying that the behavior does follow the doc,
and the doc is trying to say that the behavior
follows what it says only if there are no digits?

There are several variables that can affect the
behavior, which makes trying to describe (doc) and
trying to understand (reader) the behavior not so
easy.

FWIW, this was raised by a user question on reddit:

https://www.reddit.com/r/emacs/comments/13d1a5x/replacestring_keeping_case_of_the_matched_string

You closed this as fixed, but I still find the doc
- or the behavior - unclear wrt this example.
Could you maybe (e.g. here) explain a bit more how
the behavior fits the description?

> > [It's also not very good to refer to argument NEWSTRING in a topic/node
> > that doesn't define it. Users have to look backward through the doc to
> > see if they can find out which argument this is talking about.]\
> 
> Fixed.

Thx.

> In your post the em dash was the ASCII character '-',

Dunno how that happened; sorry.  It's an em dash in
the Emacs text.  And with a fixed-width font (default,
emacs -Q) the problem I cited is real.

> but on my system it is an actual em dash -- a much
> longer character, thus the confusion
> is unlikely.

How can it be a longer char, if the font is fixed width?

> As for why there are no spaces -- that's our style.

That's fine, provide the em space is actually longer
than the fixed width.

(Typographic practice varies, but a thin space is often
or typically used on each side of an em dash.)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63398; Package emacs. (Wed, 10 May 2023 15:27:01 GMT) Full text and rfc822 format available.

Message #16 received at 63398-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: 63398-done <at> debbugs.gnu.org
Subject: Re: [External] : Re: bug#63398: 28.2; Doc or behavior of replacement
 commands (e.g. `replace-string')
Date: Wed, 10 May 2023 18:27:03 +0300
> From: Drew Adams <drew.adams <at> oracle.com>
> CC: "63398-done <at> debbugs.gnu.org" <63398-done <at> debbugs.gnu.org>
> Date: Wed, 10 May 2023 14:20:10 +0000
> 
> > What you expect will happen
> > if the original text doesn't include digits, as in
> > 
> >   Testing
> >   testing
> > 
> >   M-x replace-string RET testing RET foobar RET
> 
> Yes, I know.  That's why I included the digits - it's
> this case that seems not to follow what the doc says.

It's too bad you kept silent about that, because it took me some time
to discover the reason.  Why posting riddles if you already know part
of the answer?

> Are you perhaps connecting this with your previous
> sentence, about success of a "trial" depending on
> NEWSTRING being partly or all lower case?  Are you
> saying that if there are non-letter chars then what
> the doc says might not happen because trying doesn't
> succeed?

Yes.

> I guess it's not clear to me whether you're saying
> that the behavior isn't what it should be (per the
> doc) in this case, but that's unavoidable or OK, or
> you're saying that the behavior does follow the doc,
> and the doc is trying to say that the behavior
> follows what it says only if there are no digits?

The latter.

> You closed this as fixed, but I still find the doc
> - or the behavior - unclear wrt this example.
> Could you maybe (e.g. here) explain a bit more how
> the behavior fits the description?

I don't know what exactly happens and when, and thus cannot say more.
Feel free to study the code and find out.  Or maybe someone else will
be able to describe the behavior in more detail.

> > but on my system it is an actual em dash -- a much
> > longer character, thus the confusion
> > is unlikely.
> 
> How can it be a longer char, if the font is fixed width?

The ASCII dash has whitespace around it, which em dash lacks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63398; Package emacs. (Wed, 10 May 2023 15:51:01 GMT) Full text and rfc822 format available.

Message #19 received at 63398-done <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: "63398-done <at> debbugs.gnu.org" <63398-done <at> debbugs.gnu.org>
Subject: RE: [External] : Re: bug#63398: 28.2; Doc or behavior of replacement
 commands (e.g. `replace-string')
Date: Wed, 10 May 2023 15:50:47 +0000
[Message part 1 (text/plain, inline)]
> > > but on my system it is an actual em dash -- a much
> > > longer character, thus the confusion
> > > is unlikely.
> >
> > How can it be a longer char, if the font is fixed width?
> 
> The ASCII dash has whitespace around it, which em dash lacks.

In any case, what I see with emacs -Q (in Emacs 28.2)
is that the em dash, with no surrounding space chars,
seems to have the same width as all of the other
fixed-width chars.  Hence it _appears_ as if it were
adjective "replace-provided".  See attached screenshot.
___

Also, "lower case" is better as "lowercase".

https://english.stackexchange.com/a/59413
[throw-em-space-in-Info.png (image/png, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63398; Package emacs. (Wed, 10 May 2023 16:58:02 GMT) Full text and rfc822 format available.

Message #22 received at 63398 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63398 <at> debbugs.gnu.org, Drew Adams <drew.adams <at> oracle.com>
Subject: Re: bug#63398: 28.2; Doc or behavior of replacement commands (e.g.
 `replace-string')
Date: Wed, 10 May 2023 19:46:35 +0300
>> You closed this as fixed, but I still find the doc
>> - or the behavior - unclear wrt this example.
>> Could you maybe (e.g. here) explain a bit more how
>> the behavior fits the description?
>
> I don't know what exactly happens and when, and thus cannot say more.
> Feel free to study the code and find out.  Or maybe someone else will
> be able to describe the behavior in more detail.

The rules of replacement case-folding are more complex than documented.
`replace-match' checks if the initial is a caseless word constituent
like "0", and treats that like a lowercase initial.

So  "test a → test b" replaces "Test A" with "Test B",
but "test 0 → test 1" replaces "Test 0" with "test 1".




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63398; Package emacs. (Wed, 10 May 2023 17:05:02 GMT) Full text and rfc822 format available.

Message #25 received at 63398 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Juri Linkov <juri <at> linkov.net>, Eli Zaretskii <eliz <at> gnu.org>
Cc: "63398 <at> debbugs.gnu.org" <63398 <at> debbugs.gnu.org>
Subject: RE: [External] : Re: bug#63398: 28.2; Doc or behavior of replacement
 commands (e.g. `replace-string')
Date: Wed, 10 May 2023 17:03:57 +0000
> The rules of replacement case-folding are more complex than documented.
> `replace-match' checks if the initial is a caseless word constituent
> like "0", and treats that like a lowercase initial.
> 
> So  "test a → test b" replaces "Test A" with "Test B",
> but "test 0 → test 1" replaces "Test 0" with "test 1".

Can we fix this?  Should the behavior be changed?
Should the behavior remain like this and the doc
be changed?
	

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63398; Package emacs. (Thu, 11 May 2023 06:31:01 GMT) Full text and rfc822 format available.

Message #28 received at 63398 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>,
 "63398 <at> debbugs.gnu.org" <63398 <at> debbugs.gnu.org>
Subject: Re: [External] : Re: bug#63398: 28.2; Doc or behavior of
 replacement commands (e.g. `replace-string')
Date: Thu, 11 May 2023 09:23:00 +0300
>> The rules of replacement case-folding are more complex than documented.
>> `replace-match' checks if the initial is a caseless word constituent
>> like "0", and treats that like a lowercase initial.
>>
>> So  "test a → test b" replaces "Test A" with "Test B",
>> but "test 0 → test 1" replaces "Test 0" with "test 1".
>
> Can we fix this?  Should the behavior be changed?

I guess the default should never change.
But maybe the rules could be customized.

> Should the behavior remain like this and the doc
> be changed?

The current implementation of rules is quite complex.
No sure if all details can be documented succinctly.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 08 Jun 2023 11:24:10 GMT) Full text and rfc822 format available.

This bug report was last modified 314 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.