GNU bug report logs - #81169
32.0.50; [PATCH] speed up replace-region-contents

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: emacs; Reported by: Pip Cet <pipcet@HIDDEN>; Keywords: patch; dated Tue, 2 Jun 2026 15:13:02 UTC; Maintainer for emacs is bug-gnu-emacs@HIDDEN.

Message received at 81169 <at> debbugs.gnu.org:


Received: (at 81169) by debbugs.gnu.org; 6 Jun 2026 11:23:42 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Jun 06 07:23:42 2026
Received: from localhost ([127.0.0.1]:37867 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1wVp7t-00029D-GG
	for submit <at> debbugs.gnu.org; Sat, 06 Jun 2026 07:23:42 -0400
Received: from mail-4316.protonmail.ch ([185.70.43.16]:17647)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <pipcet@HIDDEN>)
 id 1wVp7q-00028j-1s
 for 81169 <at> debbugs.gnu.org; Sat, 06 Jun 2026 07:23:39 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com;
 s=protonmail3; t=1780745008; x=1781004208;
 bh=BPZ1fPBlptu8PWpKuBPtGXIjmDhXY7jkXZHuMzTWo54=;
 h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References:
 Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID:
 Message-ID:BIMI-Selector;
 b=MaC4Uzvp3lnwMeexV7F0al72DukrJNEVsSAO2OtSW8C90DRVwCF18cz8Fjx7UGOUQ
 hOP5kAlYMhiNeThU7cmSCFcKK9Eyb8IACxjJePJWc2mtbHZ24qYtnSlN+jmtVeDSbZ
 yb2GjcnPyxPm+9f0l9HKUkiQ0z1scJfXpq1TpztorO9E4sQJWbmfPS8uwzuwEX4Z/k
 Jd7IZqVDhkpsEVwN/AhHV7w6R6APRhYNZSOH003seEhSthlwz+QwWgiTzHJd+qPl+J
 mqX2XVYFd0iLhDSlQSbhzYDXH/3U2kwCXs3E0oi7WVuJMhqZQb23YLj5k8Yeq2PuBK
 GgIKA4a5eaM/g==
Date: Sat, 06 Jun 2026 11:23:25 +0000
To: Eli Zaretskii <eliz@HIDDEN>
From: Pip Cet <pipcet@HIDDEN>
Subject: Re: bug#81169: 32.0.50; [PATCH] speed up replace-region-contents
Message-ID: <87y0grx675.fsf@HIDDEN>
In-Reply-To: <86y0gwvo94.fsf@HIDDEN>
References: <87h5nlnffj.fsf@HIDDEN> <8633z5vtsu.fsf@HIDDEN>
 <875x40oq9y.fsf@HIDDEN> <86y0gwvo94.fsf@HIDDEN>
Feedback-ID: 112775352:user:proton
X-Pm-Message-ID: e3d85b86241a7baff2c0a92f5261150da54803b3
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -0.7 (/)
X-Debbugs-Envelope-To: 81169
Cc: eggert@HIDDEN, 81169 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.7 (-)

"Eli Zaretskii" <eliz@HIDDEN> writes:

>> Date: Tue, 02 Jun 2026 16:33:04 +0000
>> From: Pip Cet <pipcet@HIDDEN>
>> Cc: 81169 <at> debbugs.gnu.org, Paul Eggert <eggert@HIDDEN>
>>
>> "Eli Zaretskii" <eliz@HIDDEN> writes:
>>
>> > Thanks, but please also show the relevant benchmarks, including
>> > replacing regions of different sizes.
>>
>> That's a little difficult: it's a minutes-to-seconds change, and I'm
>> currently on a system where the "seconds" part may well have happened at
>> a different (higher) clock speed than the "minutes" part, which would
>> bias the results.  So I decided not to send any numbers rather than
>> send invalid ones.
>
> Where there are minutes we can probably have benchmarks that run
> seconds instead with smaller regions, no?

If a benchmark runs for a few seconds only, we cannot sensibly compare
it to a benchmark which runs for minutes.  Of course, the solution is
already implemented: passing a float argument to benchmark-run will keep
repeating the benchmark for a reasonable amount of time, which is better
than not having a number at all.

>> I think it may be best to triple-check correctness first, by adding more
>> tests.
>
> Sure.  But eventually to judge speedup we'd like numbers nevertheless.

I have convinced myself that the extremely significant performance
improvements I've observed are not flukes. Numbers below.

>> >=C2=A0One problem with allocating additional memory is that the functio=
n
>> >could run out of available memory where the original slow algorithm
>> >would not, so we IMO also need to think about fallbacks.
>>
>> Do you mean OOM, or running out of stack space?
>
> The former.

I agree it would be nice to avoid allocating so much memory for large
regions in Freplace_region_contents, but the problem is made only mildly
worse by increasing the allocation from 129 bits per character to 161
bits: the problem is clearly the two ptrdiff_t values which are
allocated for each character.

In a way, Paul's work is simply too good here: IIUC, it's only because
the low-density heuristic optimization works so well that memory
allocation can become a problem again.

>> In either case, a 50% increase in memory used (on 32-bit systems)
>> doesn't justify a fallback to a version that is so much slower, and the
>> current version isn't particularly stingy with its memory usage
>> anyway...
>
> Yeah, and =E2=80=9C640K ought to be enough for anybody.=E2=80=9D

I don't see the analogy, sorry.

> Please don't ignore this aspect.  Allocating memory that can be a
> significant percentage of the buffer size should never be taken
> lightly in Emacs, even in 64-bit builds.

I agree, but I just don't see anyone taking the time to rewrite
compareseq to reduce the constant factor of memory consumption to
something (much) closer to 1.

Here are the numbers I promised.  Note that this benchmark includes a
call to 'garbage-collect' before 'replace-region-contents', but no
further calls.  This means the charpos<->bytepos cache should work but
might be slightly slower than in real sessions which might have left
markers in the buffer.

The left column is the (natural) logarithm of the execution speed
improvement when applying the patch.  The right column contains the
argument lists, of the form (N K): we create a multibyte buffer with N
random characters in the 0..255 range, and perform K single-character
modifications (each of which deletes one character and adds another at a
different position).

-0.004766 (2 0)
0.023365 (16 0)
2.016212 (256 0)
2.808499 (256 2)
2.997201 (256 12)
3.082077 (1024 1)
3.358036 (1024 0)
4.405364 (1024 10)
4.440131 (1024 51)
4.817975 (65536 0)
5.113652 (262144 0)
6.025153 (65536 6)

The general pattern is that larger buffers (up to the point where the
bytepos<->charpos cache starts working, at least) and more modifications
both significantly increase the effects of the patch.  The (262144 0)
case corresponds to a factor of 165, which sufficiently dramatic to
justify using the new code in such cases.

IMHO, we should apply the patch now but have a look at other diffing
mechanisms as discussed in the Fmarkers_in thread; for large regions, an
external program which identifies sparse differences might simply be
faster or avoid running out of memory.

Maybe there is a better one already available?

Pip

Here's the benchmark code I used:

;; -*- lexical-binding: t; -*-

(require 'benchmark)

(defun fill-with-bytes (n)
  "Insert N random ASCII/ISO-8859-1 characters into the current buffer."
  (random "seed")
  (dotimes (_ n)
    (insert (random 256))))

(defun randomize (k)
  "Edit the current buffer a little, deleting a character here and adding
one there."
  (dotimes (_ k)
    (goto-char (+ (point-min) (random (- (point-max) (point-min)))))
    (delete-char 1)
    (unless (=3D (point-min) (point-max))
      (goto-char (+ (point-min) (random (- (point-max) (point-min))))))
    (insert (random 256))))

(defun run-benchmark-test (f argument-lists initial-timeout)
  "Run a benchmark test by attempting to apply F to each element of
ARGUMENT-LISTS, slowly increasing a timeout which starts as
INITIAL-TIMEOUT."
  (cl-assert noninteractive)
  (let ((timeout initial-timeout)
        ;; double timeout every 15 minutes
        (increment (/ (log 2.0) 900))
        (ht (make-hash-table :test #'equal))
        (warmup 10))
    (while argument-lists
      (random t)
      (let* ((i (random (length argument-lists)))
             (argument-list (elt argument-lists i))
             proc
             (result
              (condition-case _
                  (unwind-protect
                      (let ((kill-emacs-on-sigint nil))
                        (setq proc (start-process "background killer" nil
                                                  "sh" "-c" (format "sleep =
%f && kill -INT %s"
                                                                    timeout
                                                                    (shell-=
command-to-string
                                                                     "echo =
$PPID"))))
                        (apply f argument-list))
                    (when proc
                      (kill-process proc)
                      (while (process-live-p proc)
                        (accept-process-output))))
                (quit (message "timed out after %f seconds, running functio=
n with arguments %S"
                               timeout argument-list)
                      (setq timeout (* timeout (exp (* timeout increment)))=
)
                      nil))))
        (cond
         ((not result))
         ((> warmup 0)
          (decf warmup))
         (t
          (puthash argument-list result ht)
          (setq argument-lists (delete argument-list argument-lists))
          (with-temp-buffer
            (insert (format "%S" ht))
            (write-file (format "tmp-hash-%S.el" (hash-table-count ht)))
            (write-file (format "tmp-hash-latest.el"))
            )))))
    ht))

(defun rrc-test ()
  (let ((argument-lists nil))
    (dolist (kprop (list 0.0 0.0001 0.001 0.01 0.05))
      (dolist (n (list 2 16 256 1024 65536 (ash 1 18)
                       (ash 1 20) (ash 1 22) (ash 1 24)
                       (ash 1 26) (ash 1 28) (ash 1 30)))
        (push (list n (floor (* kprop n)))
              argument-lists)))
    (run-benchmark-test
     (lambda (n k)
       (let (source dest results)
         (with-temp-buffer
           (setq source (current-buffer))
           (fill-with-bytes n)
           (with-temp-buffer
             (setq dest (current-buffer))
             (fill-with-bytes n)
             (randomize k)
             (with-temp-buffer
               (insert (with-current-buffer source (buffer-string)))
               (garbage-collect)
               (setq
                results
                (benchmark-call
                 (lambda ()
                   (replace-region-contents (point-min) (point-max) dest)
                   (replace-region-contents (point-min) (point-max) source)
                   )
                 1.0))
               (message "%S %S %e" n k (/ (cadr results) (car results)))
               (/ (cadr results) (car results)))))))
     argument-lists
     2.0)))

(byte-compile #'rrc-test)
(byte-compile #'run-benchmark-test)

(rrc-test)





Information forwarded to bug-gnu-emacs@HIDDEN:
bug#81169; Package emacs. Full text available.

Message received at 81169 <at> debbugs.gnu.org:


Received: (at 81169) by debbugs.gnu.org; 2 Jun 2026 17:36:12 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Jun 02 13:36:11 2026
Received: from localhost ([127.0.0.1]:59213 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1wUT27-0001uo-9y
	for submit <at> debbugs.gnu.org; Tue, 02 Jun 2026 13:36:11 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:37978)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1wUT20-0001tm-NN
 for 81169 <at> debbugs.gnu.org; Tue, 02 Jun 2026 13:36:04 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1wUT1u-0006B3-0x; Tue, 02 Jun 2026 13:35:54 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From:
 Date; bh=v5mfscAnfSPZ6i3UmHEF+fDjoR8ll0M8Clo0dE9D8oI=; b=EFfsmHHYeIy+FQDdtN55
 Rp0FwcTlFyy0sv14466m02WIYaOwaQpQIopsNlzSw/JeA+wtnZ4XabS5hwbisJipCtaCE63loMrhv
 0OlgeaY9ujEm1vgsPkjkz5uSG5d6WQ7N+fiyp5jEVMkt+gMlCp0z/Z1qi7tZatapPt44FQfxxaD1R
 /cPHm+VXV73xfm5rMYMvCmRkPnA24Jz1yiF+tZsO2hfBZI2h7JBaC82gGqyAe+YHeucCYLYwkcK3p
 lGjHrckfp1DsHmNr3QIZ4HYjxUgJHb4ilZphdhs9okiX+TNdMNjzmreCt6w8EkUUgljzxm7W3Wa3y
 ZXIxDOUYOSN4zA==;
Date: Tue, 02 Jun 2026 20:35:51 +0300
Message-Id: <86y0gwvo94.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Pip Cet <pipcet@HIDDEN>
In-Reply-To: <875x40oq9y.fsf@HIDDEN> (message from Pip Cet on Tue, 02
 Jun 2026 16:33:04 +0000)
Subject: Re: bug#81169: 32.0.50; [PATCH] speed up replace-region-contents
References: <87h5nlnffj.fsf@HIDDEN> <8633z5vtsu.fsf@HIDDEN>
 <875x40oq9y.fsf@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 81169
Cc: eggert@HIDDEN, 81169 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> Date: Tue, 02 Jun 2026 16:33:04 +0000
> From: Pip Cet <pipcet@HIDDEN>
> Cc: 81169 <at> debbugs.gnu.org, Paul Eggert <eggert@HIDDEN>
> 
> "Eli Zaretskii" <eliz@HIDDEN> writes:
> 
> > Thanks, but please also show the relevant benchmarks, including
> > replacing regions of different sizes.
> 
> That's a little difficult: it's a minutes-to-seconds change, and I'm
> currently on a system where the "seconds" part may well have happened at
> a different (higher) clock speed than the "minutes" part, which would
> bias the results.  So I decided not to send any numbers rather than
> send invalid ones.

Where there are minutes we can probably have benchmarks that run
seconds instead with smaller regions, no?

> I think it may be best to triple-check correctness first, by adding more
> tests.

Sure.  But eventually to judge speedup we'd like numbers nevertheless.

> > One problem with allocating additional memory is that the function
> >could run out of available memory where the original slow algorithm
> >would not, so we IMO also need to think about fallbacks.
> 
> Do you mean OOM, or running out of stack space?

The former.

> In either case, a 50% increase in memory used (on 32-bit systems)
> doesn't justify a fallback to a version that is so much slower, and the
> current version isn't particularly stingy with its memory usage
> anyway...

Yeah, and “640K ought to be enough for anybody.”

Please don't ignore this aspect.  Allocating memory that can be a
significant percentage of the buffer size should never be taken
lightly in Emacs, even in 64-bit builds.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#81169; Package emacs. Full text available.

Message received at 81169 <at> debbugs.gnu.org:


Received: (at 81169) by debbugs.gnu.org; 2 Jun 2026 16:33:22 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Jun 02 12:33:22 2026
Received: from localhost ([127.0.0.1]:58672 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1wUS3L-0004O4-Cu
	for submit <at> debbugs.gnu.org; Tue, 02 Jun 2026 12:33:21 -0400
Received: from mail-24417.protonmail.ch ([109.224.244.17]:17261)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <pipcet@HIDDEN>)
 id 1wUS3H-0004MX-B2
 for 81169 <at> debbugs.gnu.org; Tue, 02 Jun 2026 12:33:17 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com;
 s=protonmail3; t=1780417988; x=1780677188;
 bh=+nVdtZ1nrFBw9Rvy3JZ/KxmqtyFhg+MjBn/cpegfzkE=;
 h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References:
 Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID:
 Message-ID:BIMI-Selector;
 b=vQGI0Ceu/z7O0i7RX/YE/lgbCJjkkNr7KFfR3BFYf6n2YMDyAViml3yTyeh+jA7Fz
 uLtZnijRi66AAF0J+H6mYBLe6aTp1PwlE8K5sJm/GyrHXgNYenYVB9jJFtCGxlnPCS
 WPePOJg06aXC3kn286LPfbsc8ylPfgQqZHThSpQSCOmAG7UHS2tpWqYSt7fVm3tAiR
 Rb1vx4rQhokMMsMRPReQj8mzzMLd1dhx7yPT3WDqOon4lWRQxXjtbpC5n98J4Qk2al
 1vQrZocHH9AIpRRp2t4UUh7crv5NIKhO75Y+9goB96VeUbbDHxkoR5TC5+uBLVRovR
 UVAg4/rUcVenQ==
Date: Tue, 02 Jun 2026 16:33:04 +0000
To: Eli Zaretskii <eliz@HIDDEN>
From: Pip Cet <pipcet@HIDDEN>
Subject: Re: bug#81169: 32.0.50; [PATCH] speed up replace-region-contents
Message-ID: <875x40oq9y.fsf@HIDDEN>
In-Reply-To: <8633z5vtsu.fsf@HIDDEN>
References: <87h5nlnffj.fsf@HIDDEN> <8633z5vtsu.fsf@HIDDEN>
Feedback-ID: 112775352:user:proton
X-Pm-Message-ID: c868f953f6ef421589d77cf31606a5d6d89f9598
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -0.7 (/)
X-Debbugs-Envelope-To: 81169
Cc: Paul Eggert <eggert@HIDDEN>, 81169 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.7 (-)

"Eli Zaretskii" <eliz@HIDDEN> writes:

>> Date: Tue, 02 Jun 2026 15:12:36 +0000
>> From: Pip Cet via "Bug reports for GNU Emacs,
>>  the Swiss army knife of text editors" <bug-gnu-emacs@HIDDEN>
>>
>> In multibyte buffers, replace-region-contents is unbearably slow. The
>> problem is fixable: we call buf_charpos_to_bytepos in the innermost
>> loop, in buffer_chars_equal.
>>
>> We should avoid doing that: keeping our characters in an array of C ints
>> (essentially converting to UTF-32 in the machine's native endianness)
>> simplifies the code and avoids the performance problem.
>>
>> The allocation size grows by 25% on 64-bit systems (we already store two
>> ptrdiff_t integers for each character), which seems acceptable given the
>> speedup.  If compareseq is changed to use less memory for very long
>> regions, that tradeoff will have to be revisited.
>
> Thanks, but please also show the relevant benchmarks, including
> replacing regions of different sizes.

That's a little difficult: it's a minutes-to-seconds change, and I'm
currently on a system where the "seconds" part may well have happened at
a different (higher) clock speed than the "minutes" part, which would
bias the results.  So I decided not to send any numbers rather than
send invalid ones.

I think it may be best to triple-check correctness first, by adding more
tests.

>=C2=A0One problem with allocating additional memory is that the function
>could run out of available memory where the original slow algorithm
>would not, so we IMO also need to think about fallbacks.

Do you mean OOM, or running out of stack space?

In either case, a 50% increase in memory used (on 32-bit systems)
doesn't justify a fallback to a version that is so much slower, and the
current version isn't particularly stingy with its memory usage
anyway...

Pip





Information forwarded to bug-gnu-emacs@HIDDEN:
bug#81169; Package emacs. Full text available.

Message received at 81169 <at> debbugs.gnu.org:


Received: (at 81169) by debbugs.gnu.org; 2 Jun 2026 15:36:12 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Jun 02 11:36:12 2026
Received: from localhost ([127.0.0.1]:58153 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1wURA4-0008WL-1i
	for submit <at> debbugs.gnu.org; Tue, 02 Jun 2026 11:36:12 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:33348)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1wURA2-0008W4-As
 for 81169 <at> debbugs.gnu.org; Tue, 02 Jun 2026 11:36:10 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1wUR9w-000080-UH; Tue, 02 Jun 2026 11:36:04 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=Y6O1MyCTBQylGrADCtv7wyPxYltcYS7nfN/vqWIYWrA=; b=qSARE0JVSOH5
 H5T6XP1Vd/bfwK6HKMWRj+8m65wBl/LUhzPtVU8Vpcdm4QQ/qySU5c7tqLYIbqWiJ58aL7Den6UA5
 7sivo8a3Zko4PUrGeUdlF2I1YuGV7zXhB5G0zqRYhs6rhKf/D8331T+HMbY4qVyJbEQm3EwWseKo+
 igBlibCqwklyVwAptHTS6bqH5MXTT0xwYlYt4RIvYt/hi9cC5zVXdJMS3MJN8EfFcEWJXTTd88Uk6
 lpt4fSQj+l6pgmR66wf0EZLcmzG/GdGb7oH6hZxfLiC7G5rzUf+aDII14b/ERCnA2/MV1WK/CYQHZ
 HFtNUlTpw6TuTWCO+l7uuA==;
Date: Tue, 02 Jun 2026 18:36:01 +0300
Message-Id: <8633z5vtsu.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Pip Cet <pipcet@HIDDEN>
In-Reply-To: <87h5nlnffj.fsf@HIDDEN> (bug-gnu-emacs@HIDDEN)
Subject: Re: bug#81169: 32.0.50; [PATCH] speed up replace-region-contents
References: <87h5nlnffj.fsf@HIDDEN>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 81169
Cc: 81169 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> Date: Tue, 02 Jun 2026 15:12:36 +0000
> From: Pip Cet via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@HIDDEN>
> 
> In multibyte buffers, replace-region-contents is unbearably slow. The
> problem is fixable: we call buf_charpos_to_bytepos in the innermost
> loop, in buffer_chars_equal.
> 
> We should avoid doing that: keeping our characters in an array of C ints
> (essentially converting to UTF-32 in the machine's native endianness)
> simplifies the code and avoids the performance problem.
> 
> The allocation size grows by 25% on 64-bit systems (we already store two
> ptrdiff_t integers for each character), which seems acceptable given the
> speedup.  If compareseq is changed to use less memory for very long
> regions, that tradeoff will have to be revisited.

Thanks, but please also show the relevant benchmarks, including
replacing regions of different sizes.  One problem with allocating
additional memory is that the function could run out of available
memory where the original slow algorithm would not, so we IMO also
need to think about fallbacks.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#81169; Package emacs. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 2 Jun 2026 15:12:57 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Jun 02 11:12:57 2026
Received: from localhost ([127.0.0.1]:57928 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1wUQnY-0006mM-DL
	for submit <at> debbugs.gnu.org; Tue, 02 Jun 2026 11:12:56 -0400
Received: from lists1p.gnu.org ([2001:470:142::17]:42034)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <pipcet@HIDDEN>)
 id 1wUQnW-0006lz-02
 for submit <at> debbugs.gnu.org; Tue, 02 Jun 2026 11:12:55 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <pipcet@HIDDEN>)
 id 1wUQnQ-0002Z5-9r
 for bug-gnu-emacs@HIDDEN; Tue, 02 Jun 2026 11:12:48 -0400
Received: from mail-4316.protonmail.ch ([185.70.43.16])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <pipcet@HIDDEN>)
 id 1wUQnO-0000Uu-50
 for bug-gnu-emacs@HIDDEN; Tue, 02 Jun 2026 11:12:48 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com;
 s=protonmail3; t=1780413162; x=1780672362;
 bh=V/CLwKlSfOPxutTNFp3Ji8SiMv8nOg6I4EV+enJGYHs=;
 h=Date:To:From:Subject:Message-ID:Feedback-ID:From:To:Cc:Date:
 Subject:Reply-To:Feedback-ID:Message-ID:BIMI-Selector;
 b=OBJbwh4SG8PFQBpGMsT2ani7bnjICMEFJ+/mulGLiH8xcrrgBBWDtcF4Hby6oyA2t
 U6J/3yMCEKp4+n7ncT7T9IMpIvdcsdjEGtzhADFkB2d9Hm2fkrpg7vZvjCbVo8+bqF
 eXJPStP16jKtx5NUvgRUvSuphhPFI5HHAlJU9R4gUbXFV+/NyplUWdPWvH7jbG3otb
 HBn/5Zulrx3BBgTevWQn9P5uB3sPbKI2Uv/O61NCu+UCJL7UxjKnz5UfIL/dhVEXhY
 i5srt0ZfNxLV+KDUycetzRmPKNlstT3EWfTvtoLUttjDvvcDT1750EOpNRayaUNJT5
 z5z7zI78NafCg==
Date: Tue, 02 Jun 2026 15:12:36 +0000
To: bug-gnu-emacs@HIDDEN
From: Pip Cet <pipcet@HIDDEN>
Subject: 32.0.50; [PATCH] speed up replace-region-contents
Message-ID: <87h5nlnffj.fsf@HIDDEN>
Feedback-ID: 112775352:user:proton
X-Pm-Message-ID: 58c960b93c12cfa6e261e6d88c5081dd0fb31e49
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Received-SPF: pass client-ip=185.70.43.16; envelope-from=pipcet@HIDDEN;
 helo=mail-4316.protonmail.ch
X-Spam_score_int: -27
X-Spam_score: -2.8
X-Spam_bar: --
X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001,
 RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001,
 SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: 1.7 (+)
X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org",
 has NOT identified this incoming email as spam.  The original
 message has been attached to this so you can view it or label
 similar future email.  If you have any questions, see
 the administrator of that system for details.
 Content preview:  In multibyte buffers, replace-region-contents is unbearably
 slow. The problem is fixable: we call buf_charpos_to_bytepos in the innermost
 loop, in buffer_chars_equal. We should avoid doing that: keeping our characters
 in an array of C ints (essentially converting to UTF-32 in the machine's
 native endianness) simplifies the code and avoids the performance problem.
 Content analysis details:   (1.7 points, 10.0 required)
 pts rule name              description
 ---- ---------------------- --------------------------------------------------
 -0.0 RCVD_IN_DNSWL_NONE     RBL: Sender listed at https://www.dnswl.org/,
 no trust [2001:470:142:0:0:0:0:17 listed in] [list.dnswl.org]
 1.0 SPF_SOFTFAIL           SPF: sender does not match SPF record (softfail)
 -0.0 SPF_HELO_PASS          SPF: HELO matches SPF record
 0.0 FREEMAIL_FROM          Sender email is commonly abused enduser mail
 provider (pipcet[at]protonmail.com)
 0.7 SPOOFED_FREEMAIL       No description available.
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.0 (/)

In multibyte buffers, replace-region-contents is unbearably slow. The
problem is fixable: we call buf_charpos_to_bytepos in the innermost
loop, in buffer_chars_equal.

We should avoid doing that: keeping our characters in an array of C ints
(essentially converting to UTF-32 in the machine's native endianness)
simplifies the code and avoids the performance problem.

The allocation size grows by 25% on 64-bit systems (we already store two
ptrdiff_t integers for each character), which seems acceptable given the
speedup.  If compareseq is changed to use less memory for very long
regions, that tradeoff will have to be revisited.

From 7637a1357b5f85a85ce04c9d34d75b0ff0b17c95 Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@HIDDEN>
Date: Tue, 2 Jun 2026 14:58:04 +0000
Subject: [PATCH] Simplify character comparisons in Freplace_region_contents

We expect 'compareseq' to call 'buffer_chars_equal' very often, so
it's worth it to use some memory and make it as fast as possible.

* src/editfns.c (EXTRA_CONTEXT_FIELDS): Drop buffer, beg, and unibyte
fields; add arrays of character codes stored as C ints.
(Freplace_region_contents): Generate such arrays from the two regions
to be compared.
(buffer_chars_equal): Compare array elements instead of buffer
characters.
---
 src/editfns.c | 56 +++++++++++++++++----------------------------------
 1 file changed, 18 insertions(+), 38 deletions(-)

diff --git a/src/editfns.c b/src/editfns.c
index 84f1e5cef03..28fb7adbb3e 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -1895,15 +1895,9 @@ #define XVECREF_YVECREF_EQUAL(ctx, xoff, yoff)  \
 #define OFFSET ptrdiff_t
=20
 #define EXTRA_CONTEXT_FIELDS                    \
-  /* Buffers to compare.  */                    \
-  struct buffer *buffer_a;                      \
-  struct buffer *buffer_b;                      \
-  /* BEGV of each buffer */=09=09=09\
-  ptrdiff_t beg_a;=09=09=09=09\
-  ptrdiff_t beg_b;=09=09=09=09\
-  /* Whether each buffer is unibyte/plain-ASCII or not.  */ \
-  bool a_unibyte;=09=09=09=09\
-  bool b_unibyte;=09=09=09=09\
+  /* Arrays of character codes to compare.  */=09\
+  int *chars_a;=09=09=09=09=09\
+  int *chars_b;=09=09=09=09=09\
   /* Bit vectors recording for each character whether it was deleted
      or inserted.  */                           \
   unsigned char *deletions;                     \
@@ -2098,17 +2092,25 @@ DEFUN ("replace-region-contents", Freplace_region_c=
ontents,
     }
   Lisp_Object source_buffer =3D make_lisp_ptr (b, Lisp_Vectorlike);
=20
+  /* Copy the characters to arrays of C integers.  This speeds up
+     comparison dramatically in multibyte buffers.  */
+  int *chars_a =3D SAFE_ALLOCA (sizeof (chars_a[0]) * size_a);
+  for (ptrdiff_t p =3D min_a; p < min_a + size_a; p++)
+    chars_a[p - min_a]
+      =3D BUF_FETCH_CHAR_AS_MULTIBYTE (a, buf_charpos_to_bytepos (a, p));
+
+  int *chars_b =3D SAFE_ALLOCA (sizeof (chars_b[0]) * size_b);
+  for (ptrdiff_t p =3D min_b; p < min_b + size_b; p++)
+    chars_b[p - min_b]
+      =3D BUF_FETCH_CHAR_AS_MULTIBYTE (b, buf_charpos_to_bytepos (b, p));
+
   /* FIXME: It is not documented how to initialize the contents of the
      context structure.  This code cargo-cults from the existing
      caller in src/analyze.c of GNU Diffutils, which appears to
      work.  */
   struct context ctx =3D {
-    .buffer_a =3D a,
-    .buffer_b =3D b,
-    .beg_a =3D min_a,
-    .beg_b =3D min_b,
-    .a_unibyte =3D BUF_ZV (a) =3D=3D BUF_ZV_BYTE (a),
-    .b_unibyte =3D BUF_ZV (b) =3D=3D BUF_ZV_BYTE (b),
+    .chars_a =3D chars_a,
+    .chars_b =3D chars_b,
     .deletions =3D deletions_insertions,
     .insertions =3D deletions_insertions + del_bytes,
     .fdiag =3D buffer + size_b + 1,
@@ -2242,29 +2244,7 @@ buffer_chars_equal (struct context *ctx,
 =09sys_longjmp (ctx->jmp, 1);
     }
=20
-  pos_a +=3D ctx->beg_a;
-  pos_b +=3D ctx->beg_b;
-
-  ptrdiff_t bpos_a =3D
-    ctx->a_unibyte ? pos_a : buf_charpos_to_bytepos (ctx->buffer_a, pos_a)=
;
-  ptrdiff_t bpos_b =3D
-    ctx->b_unibyte ? pos_b : buf_charpos_to_bytepos (ctx->buffer_b, pos_b)=
;
-
-  /* We make the below a series of specific test to avoid using
-     BUF_FETCH_CHAR_AS_MULTIBYTE, which references Lisp symbols, and
-     is therefore significantly slower (see the note in the commentary
-     to this function).  */
-  if (ctx->a_unibyte && ctx->b_unibyte)
-    return BUF_FETCH_BYTE (ctx->buffer_a, bpos_a)
-      =3D=3D BUF_FETCH_BYTE (ctx->buffer_b, bpos_b);
-  if (ctx->a_unibyte && !ctx->b_unibyte)
-    return UNIBYTE_TO_CHAR (BUF_FETCH_BYTE (ctx->buffer_a, bpos_a))
-      =3D=3D BUF_FETCH_MULTIBYTE_CHAR (ctx->buffer_b, bpos_b);
-  if (!ctx->a_unibyte && ctx->b_unibyte)
-    return BUF_FETCH_MULTIBYTE_CHAR (ctx->buffer_a, bpos_a)
-      =3D=3D UNIBYTE_TO_CHAR (BUF_FETCH_BYTE (ctx->buffer_b, bpos_b));
-  return BUF_FETCH_MULTIBYTE_CHAR (ctx->buffer_a, bpos_a)
-    =3D=3D BUF_FETCH_MULTIBYTE_CHAR (ctx->buffer_b, bpos_b);
+  return ctx->chars_a[pos_a] =3D=3D ctx->chars_b[pos_b];
 }
=20
 static bool
--=20
2.54.0





Acknowledgement sent to Pip Cet <pipcet@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs@HIDDEN. Full text available.
Report forwarded to bug-gnu-emacs@HIDDEN:
bug#81169; Package emacs. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Sat, 6 Jun 2026 11:30:03 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.