GNU bug report logs -
#63398
28.2; Doc or behavior of replacement commands (e.g. `replace-string')
Previous Next
Reported by: Drew Adams <drew.adams <at> oracle.com>
Date: Tue, 9 May 2023 20:14:01 UTC
Severity: normal
Found in version 28.2
Done: Eli Zaretskii <eliz <at> gnu.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 63398 in the body.
You can then email your comments to 63398 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#63398
; Package
emacs
.
(Tue, 09 May 2023 20:14:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Drew Adams <drew.adams <at> oracle.com>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Tue, 09 May 2023 20:14:01 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
emacs -Q
;; So `search-upper-case' is `not-yanks', and `case-fold-search' and
;; `case-replace' are both `t'.
In *scratch* enter this text:
Test 0
test 0
At bob, use `M-x replace-string RET test 0 RET test 1 RET'
(emacs) `Replacement and Lax Matches' seems to say that the result
should be this:
Test 1
test 1
But the result is this:
test 1
test 1
That doc says this:
If the first argument of a replace command is all lower case, the
command ignores case while searching for occurrences to replace-provided
'case-fold-search' is non-'nil' and 'search-upper-case' is also
non-'nil'.
OK, that's respected; both lines are found during the search. Good.
___
BTW, it's unfortunate that we use an em dash char here, with no
preceding or following space chars. Why? Because it reads as if it
were a hyphen, producing adjective "replace-provided" modifying noun
`case-fold-search'. Since we use fixed-width fonts by default, this is
all the more apparent. Please reword or surround the em dash with space
chars.
___
The doc also says this, however, regarding replacement:
In addition, when the NEWSTRING argument is all or partly lower case,
replacement commands try to preserve the case pattern of each
occurrence. Thus, the command
M-x replace-string <RET> foo <RET> bar <RET>
replaces a lower case 'foo' with a lower case 'bar', an all-caps 'FOO'
with 'BAR', and a capitalized 'Foo' with 'Bar'. (These three
alternatives-lower case, all caps, and capitalized, are the only ones
that 'replace-string' can distinguish.)
My reading of this is that, since "test 1" is lower-case, the
replacement should "try" (meaning what, exactly? under what
circumstances does such a trial "fail"?) to preserve the case pattern of
the first occurrence, chaning "Test 0" to "Test 1". That doesn't
happen.
Is the doc wrong? Is my reading of it wrong? If my reading and the doc
are right, is the behavior wrong (bugged)?
___
[It's also not very good to refer to argument NEWSTRING in a topic/node
that doesn't define it. Users have to look backward through the doc to
see if they can find out which argument this is talking about.]\
___
In GNU Emacs 28.2 (build 2, x86_64-w64-mingw32)
of 2022-09-13 built on AVALON
Windowing system distributor 'Microsoft Corp.', version 10.0.19045
System Description: Microsoft Windows 10 Pro (v10.0.2009.19045.2846)
Configured using:
'configure --with-modules --without-dbus --with-native-compilation
--without-compress-install CFLAGS=-O2'
Configured features:
ACL GIF GMP GNUTLS HARFBUZZ JPEG JSON LCMS2 LIBXML2 MODULES NATIVE_COMP
NOTIFY W32NOTIFY PDUMPER PNG RSVG SOUND THREADS TIFF TOOLKIT_SCROLL_BARS
XPM ZLIB
(NATIVE_COMP present but libgccjit not available)
Reply sent
to
Eli Zaretskii <eliz <at> gnu.org>
:
You have taken responsibility.
(Wed, 10 May 2023 13:37:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Drew Adams <drew.adams <at> oracle.com>
:
bug acknowledged by developer.
(Wed, 10 May 2023 13:37:02 GMT)
Full text and
rfc822 format available.
Message #10 received at 63398-done <at> debbugs.gnu.org (full text, mbox):
> From: Drew Adams <drew.adams <at> oracle.com>
> Date: Tue, 9 May 2023 20:13:04 +0000
>
> The doc also says this, however, regarding replacement:
>
> In addition, when the NEWSTRING argument is all or partly lower case,
> replacement commands try to preserve the case pattern of each
> occurrence. Thus, the command
>
> M-x replace-string <RET> foo <RET> bar <RET>
>
> replaces a lower case 'foo' with a lower case 'bar', an all-caps 'FOO'
> with 'BAR', and a capitalized 'Foo' with 'Bar'. (These three
> alternatives-lower case, all caps, and capitalized, are the only ones
> that 'replace-string' can distinguish.)
>
> My reading of this is that, since "test 1" is lower-case, the
> replacement should "try" (meaning what, exactly? under what
> circumstances does such a trial "fail"?) to preserve the case pattern of
> the first occurrence, chaning "Test 0" to "Test 1". That doesn't
> happen.
>
> Is the doc wrong? Is my reading of it wrong? If my reading and the doc
> are right, is the behavior wrong (bugged)?
The manual says "try", and for a good reason. There's a heuristics
involved that tries to DTRT. The "when the NEWSTRING argument is all
or partly lower case" part is relevant. What you expect will happen
if the original text doesn't include digits, as in
Testing
testing
M-x replace-string RET testing RET foobar RET
> [It's also not very good to refer to argument NEWSTRING in a topic/node
> that doesn't define it. Users have to look backward through the doc to
> see if they can find out which argument this is talking about.]\
Fixed.
> BTW, it's unfortunate that we use an em dash char here, with no
> preceding or following space chars. Why? Because it reads as if it
> were a hyphen, producing adjective "replace-provided" modifying noun
> `case-fold-search'. Since we use fixed-width fonts by default, this is
> all the more apparent. Please reword or surround the em dash with space
> chars.
In your post the em dash was the ASCII character '-', but on my system
it is an actual em dash -- a much longer character, thus the confusion
is unlikely. As for why there are no spaces -- that's our style.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#63398
; Package
emacs
.
(Wed, 10 May 2023 14:21:02 GMT)
Full text and
rfc822 format available.
Message #13 received at 63398-done <at> debbugs.gnu.org (full text, mbox):
> The manual says "try", and for a good reason. There's a heuristics
> involved that tries to DTRT. The "when the NEWSTRING argument is all
> or partly lower case" part is relevant.
Yes, I assumed that. But seeing "try" still can make
a reader wonder.
> What you expect will happen
> if the original text doesn't include digits, as in
>
> Testing
> testing
>
> M-x replace-string RET testing RET foobar RET
Yes, I know. That's why I included the digits - it's
this case that seems not to follow what the doc says.
Are you perhaps connecting this with your previous
sentence, about success of a "trial" depending on
NEWSTRING being partly or all lower case? Are you
saying that if there are non-letter chars then what
the doc says might not happen because trying doesn't
succeed?
I guess it's not clear to me whether you're saying
that the behavior isn't what it should be (per the
doc) in this case, but that's unavoidable or OK, or
you're saying that the behavior does follow the doc,
and the doc is trying to say that the behavior
follows what it says only if there are no digits?
There are several variables that can affect the
behavior, which makes trying to describe (doc) and
trying to understand (reader) the behavior not so
easy.
FWIW, this was raised by a user question on reddit:
https://www.reddit.com/r/emacs/comments/13d1a5x/replacestring_keeping_case_of_the_matched_string
You closed this as fixed, but I still find the doc
- or the behavior - unclear wrt this example.
Could you maybe (e.g. here) explain a bit more how
the behavior fits the description?
> > [It's also not very good to refer to argument NEWSTRING in a topic/node
> > that doesn't define it. Users have to look backward through the doc to
> > see if they can find out which argument this is talking about.]\
>
> Fixed.
Thx.
> In your post the em dash was the ASCII character '-',
Dunno how that happened; sorry. It's an em dash in
the Emacs text. And with a fixed-width font (default,
emacs -Q) the problem I cited is real.
> but on my system it is an actual em dash -- a much
> longer character, thus the confusion
> is unlikely.
How can it be a longer char, if the font is fixed width?
> As for why there are no spaces -- that's our style.
That's fine, provide the em space is actually longer
than the fixed width.
(Typographic practice varies, but a thin space is often
or typically used on each side of an em dash.)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#63398
; Package
emacs
.
(Wed, 10 May 2023 15:27:01 GMT)
Full text and
rfc822 format available.
Message #16 received at 63398-done <at> debbugs.gnu.org (full text, mbox):
> From: Drew Adams <drew.adams <at> oracle.com>
> CC: "63398-done <at> debbugs.gnu.org" <63398-done <at> debbugs.gnu.org>
> Date: Wed, 10 May 2023 14:20:10 +0000
>
> > What you expect will happen
> > if the original text doesn't include digits, as in
> >
> > Testing
> > testing
> >
> > M-x replace-string RET testing RET foobar RET
>
> Yes, I know. That's why I included the digits - it's
> this case that seems not to follow what the doc says.
It's too bad you kept silent about that, because it took me some time
to discover the reason. Why posting riddles if you already know part
of the answer?
> Are you perhaps connecting this with your previous
> sentence, about success of a "trial" depending on
> NEWSTRING being partly or all lower case? Are you
> saying that if there are non-letter chars then what
> the doc says might not happen because trying doesn't
> succeed?
Yes.
> I guess it's not clear to me whether you're saying
> that the behavior isn't what it should be (per the
> doc) in this case, but that's unavoidable or OK, or
> you're saying that the behavior does follow the doc,
> and the doc is trying to say that the behavior
> follows what it says only if there are no digits?
The latter.
> You closed this as fixed, but I still find the doc
> - or the behavior - unclear wrt this example.
> Could you maybe (e.g. here) explain a bit more how
> the behavior fits the description?
I don't know what exactly happens and when, and thus cannot say more.
Feel free to study the code and find out. Or maybe someone else will
be able to describe the behavior in more detail.
> > but on my system it is an actual em dash -- a much
> > longer character, thus the confusion
> > is unlikely.
>
> How can it be a longer char, if the font is fixed width?
The ASCII dash has whitespace around it, which em dash lacks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#63398
; Package
emacs
.
(Wed, 10 May 2023 15:51:01 GMT)
Full text and
rfc822 format available.
Message #19 received at 63398-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
> > > but on my system it is an actual em dash -- a much
> > > longer character, thus the confusion
> > > is unlikely.
> >
> > How can it be a longer char, if the font is fixed width?
>
> The ASCII dash has whitespace around it, which em dash lacks.
In any case, what I see with emacs -Q (in Emacs 28.2)
is that the em dash, with no surrounding space chars,
seems to have the same width as all of the other
fixed-width chars. Hence it _appears_ as if it were
adjective "replace-provided". See attached screenshot.
___
Also, "lower case" is better as "lowercase".
https://english.stackexchange.com/a/59413
[throw-em-space-in-Info.png (image/png, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#63398
; Package
emacs
.
(Wed, 10 May 2023 16:58:02 GMT)
Full text and
rfc822 format available.
Message #22 received at 63398 <at> debbugs.gnu.org (full text, mbox):
>> You closed this as fixed, but I still find the doc
>> - or the behavior - unclear wrt this example.
>> Could you maybe (e.g. here) explain a bit more how
>> the behavior fits the description?
>
> I don't know what exactly happens and when, and thus cannot say more.
> Feel free to study the code and find out. Or maybe someone else will
> be able to describe the behavior in more detail.
The rules of replacement case-folding are more complex than documented.
`replace-match' checks if the initial is a caseless word constituent
like "0", and treats that like a lowercase initial.
So "test a → test b" replaces "Test A" with "Test B",
but "test 0 → test 1" replaces "Test 0" with "test 1".
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#63398
; Package
emacs
.
(Wed, 10 May 2023 17:05:02 GMT)
Full text and
rfc822 format available.
Message #25 received at 63398 <at> debbugs.gnu.org (full text, mbox):
> The rules of replacement case-folding are more complex than documented.
> `replace-match' checks if the initial is a caseless word constituent
> like "0", and treats that like a lowercase initial.
>
> So "test a → test b" replaces "Test A" with "Test B",
> but "test 0 → test 1" replaces "Test 0" with "test 1".
Can we fix this? Should the behavior be changed?
Should the behavior remain like this and the doc
be changed?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#63398
; Package
emacs
.
(Thu, 11 May 2023 06:31:01 GMT)
Full text and
rfc822 format available.
Message #28 received at 63398 <at> debbugs.gnu.org (full text, mbox):
>> The rules of replacement case-folding are more complex than documented.
>> `replace-match' checks if the initial is a caseless word constituent
>> like "0", and treats that like a lowercase initial.
>>
>> So "test a → test b" replaces "Test A" with "Test B",
>> but "test 0 → test 1" replaces "Test 0" with "test 1".
>
> Can we fix this? Should the behavior be changed?
I guess the default should never change.
But maybe the rules could be customized.
> Should the behavior remain like this and the doc
> be changed?
The current implementation of rules is quite complex.
No sure if all details can be documented succinctly.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 08 Jun 2023 11:24:10 GMT)
Full text and
rfc822 format available.
This bug report was last modified 1 year and 338 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.