GNU logs - #63040, boring messages


Message sent to bug-gnu-emacs@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#63040: 30.0.50; Performance of buf_bytepos_to_charpos when a buffer has large number of markers
Resent-From: Ihor Radchenko <yantar92@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@HIDDEN
Resent-Date: Sun, 23 Apr 2023 19:40:01 +0000
Resent-Message-ID: <handler.63040.B.168227874729375 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: report 63040
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: 
To: 63040 <at> debbugs.gnu.org
X-Debbugs-Original-To: bug-gnu-emacs@HIDDEN
Received: via spool by submit <at> debbugs.gnu.org id=B.168227874729375
          (code B ref -1); Sun, 23 Apr 2023 19:40:01 +0000
Received: (at submit) by debbugs.gnu.org; 23 Apr 2023 19:39:07 +0000
Received: from localhost ([127.0.0.1]:47009 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pqfYA-0007di-Vw
	for submit <at> debbugs.gnu.org; Sun, 23 Apr 2023 15:39:07 -0400
Received: from lists.gnu.org ([209.51.188.17]:43514)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <yantar92@HIDDEN>) id 1pqfY8-0007dZ-TI
 for submit <at> debbugs.gnu.org; Sun, 23 Apr 2023 15:39:05 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <yantar92@HIDDEN>)
 id 1pqfY8-0005du-Ir
 for bug-gnu-emacs@HIDDEN; Sun, 23 Apr 2023 15:39:04 -0400
Received: from mout02.posteo.de ([185.67.36.66])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <yantar92@HIDDEN>)
 id 1pqfY4-0001lE-GG
 for bug-gnu-emacs@HIDDEN; Sun, 23 Apr 2023 15:39:03 -0400
Received: from submission (posteo.de [185.67.36.169]) 
 by mout02.posteo.de (Postfix) with ESMTPS id A4540240170
 for <bug-gnu-emacs@HIDDEN>; Sun, 23 Apr 2023 21:38:57 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017;
 t=1682278737; bh=WDvJvoFOGMe+96eBGGhd7uvnO1J4m45uPwnG/vE/I0U=;
 h=From:To:Subject:Date:From;
 b=HbeythQTKRxu18ay9A6R+d5uUZ7V0PSL/Dn3Mv/C/C/J3nklrlKNt7JArjRp9VKed
 XP4WZwEkXrdn7R/gf+ZGk1XHNRA9TfLuP/3axKyFazSy3A880VPp7ZCNsJhqH6s3i+
 OegGEBugvheoDmcIQQ+xF2PCvUkfNRHv9g7UCSKSKgciRWFKZ9lJLwI+laaboCOr7q
 s/cFmYns1C9vTzVWDD8HUzb2SPDqSLLixty5PF8h5EAzosJkRQcwOw6zE3bvxPpGAt
 Q099LGJz6ufPlp+tJ1z7xzD5btUgkllMg0y3yhfRbV8W8uVPqy+K2t5DUHmQQsvGN5
 jJlUOoRDb0JJw==
Received: from customer (localhost [127.0.0.1])
 by submission (posteo.de) with ESMTPSA id 4Q4JV139Lyz9rxH
 for <bug-gnu-emacs@HIDDEN>; Sun, 23 Apr 2023 21:38:49 +0200 (CEST)
From: Ihor Radchenko <yantar92@HIDDEN>
Date: Sun, 23 Apr 2023 19:41:40 +0000
Message-ID: <878reiwfm3.fsf@localhost>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
Received-SPF: pass client-ip=185.67.36.66; envelope-from=yantar92@HIDDEN;
 helo=mout02.posteo.de
X-Spam_score_int: -43
X-Spam_score: -4.4
X-Spam_bar: ----
X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -1.3 (-)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.3 (--)

--=-=-=
Content-Type: text/plain

Hi,

When investigating `re-search-forward' performance in
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=58558 (bug#58558), I
noticed that buf_bytepos_to_charpos is taking most of the CPU time,
according to perf stats.

This was partially caused by `parse-sexp-lookup-properties', but even
after working around the text property issue, buf_bytepos_to_charpos
still shows up on top of the perf profile.

Since one of the apparent bottlenecks in buf_bytepos_to_charpos is

for (tail = BUF_MARKERS (b); tail; tail = tail->next)

which obviously scales with the number of markers in buffer, I decided
to add a cut-off parameter, as in the attached patch (number 50 has no
particular motivation underneath).

Surprisingly, this simple change reduced my Org agenda generation times
from 20 seconds down to 3-4 seconds!

I am sure that my dumb approach is not the best way to improve the
performance, but this place in buf_bytepos_to_charpos is clearly
something that can be optimized.


--=-=-=
Content-Type: text/x-patch
Content-Disposition: inline;
 filename=0001-src-marker.c-buf_bytepos_to_charpos-Limit-marker-sea.patch

From a6ff6268bdc42a7dfedc6729d4232a2ae149da56 Mon Sep 17 00:00:00 2001
Message-Id: <a6ff6268bdc42a7dfedc6729d4232a2ae149da56.1682278830.git.yantar92@HIDDEN>
From: Ihor Radchenko <yantar92@HIDDEN>
Date: Sun, 23 Apr 2023 21:31:46 +0200
Subject: [PATCH] * src/marker.c (buf_bytepos_to_charpos): Limit marker search

Limit searching across buffer markers to first 50 markers and thus
avoid performance scaling with the number of markers.

I got 5x `re-search-forward' speed improvement in my setup with this
dumb change.
---
 src/marker.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/marker.c b/src/marker.c
index e42c49a5434..008a76c49e6 100644
--- a/src/marker.c
+++ b/src/marker.c
@@ -348,8 +348,10 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos)
   if (b == cached_buffer && BUF_MODIFF (b) == cached_modiff)
     CONSIDER (cached_bytepos, cached_charpos);
 
-  for (tail = BUF_MARKERS (b); tail; tail = tail->next)
+  int i = 0;
+  for (tail = BUF_MARKERS (b); tail && i < 50; tail = tail->next)
     {
+      i++;
       CONSIDER (tail->bytepos, tail->charpos);
 
       /* If we are down to a range of 50 chars,
-- 
2.40.0


--=-=-=
Content-Type: text/plain


In GNU Emacs 30.0.50 (build 2, x86_64-pc-linux-gnu, GTK+ Version
 3.24.37, cairo version 1.17.8) of 2023-04-23 built on localhost
Repository revision: ca875e3947e29d222554a05583068c49a56ed8ca
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12101008
System Description: Gentoo Linux

Configured using:
 'configure --with-native-compilation'


-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

--=-=-=--




Message sent:


Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Mailer: MIME-tools 5.505 (Entity 5.505)
Content-Type: text/plain; charset=utf-8
X-Loop: help-debbugs@HIDDEN
From: help-debbugs@HIDDEN (GNU bug Tracking System)
To: Ihor Radchenko <yantar92@HIDDEN>
Subject: bug#63040: Acknowledgement (30.0.50; Performance of
 buf_bytepos_to_charpos when a buffer has large number of markers)
Message-ID: <handler.63040.B.168227874729375.ack <at> debbugs.gnu.org>
References: <878reiwfm3.fsf@localhost>
X-Gnu-PR-Message: ack 63040
X-Gnu-PR-Package: emacs
Reply-To: 63040 <at> debbugs.gnu.org
Date: Sun, 23 Apr 2023 19:40:02 +0000

Thank you for filing a new bug report with debbugs.gnu.org.

This is an automatically generated reply to let you know your message
has been received.

Your message is being forwarded to the package maintainers and other
interested parties for their attention; they will reply in due course.

Your message has been sent to the package maintainer(s):
 bug-gnu-emacs@HIDDEN

If you wish to submit further information on this problem, please
send it to 63040 <at> debbugs.gnu.org.

Please do not send mail to help-debbugs@HIDDEN unless you wish
to report a problem with the Bug-tracking system.

--=20
63040: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D63040
GNU Bug Tracking System
Contact help-debbugs@HIDDEN with problems


Message sent to bug-gnu-emacs@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#63040: 30.0.50; Performance of buf_bytepos_to_charpos when a buffer has large number of markers
Resent-From: Eli Zaretskii <eliz@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@HIDDEN
Resent-Date: Mon, 24 Apr 2023 02:25:01 +0000
Resent-Message-ID: <handler.63040.B63040.16823030548622 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 63040
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: 
To: Ihor Radchenko <yantar92@HIDDEN>
Cc: 63040 <at> debbugs.gnu.org
Received: via spool by 63040-submit <at> debbugs.gnu.org id=B63040.16823030548622
          (code B ref 63040); Mon, 24 Apr 2023 02:25:01 +0000
Received: (at 63040) by debbugs.gnu.org; 24 Apr 2023 02:24:14 +0000
Received: from localhost ([127.0.0.1]:47243 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pqlsD-0002Ez-MG
	for submit <at> debbugs.gnu.org; Sun, 23 Apr 2023 22:24:13 -0400
Received: from eggs.gnu.org ([209.51.188.92]:52892)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1pqlsA-0002Ei-K7
 for 63040 <at> debbugs.gnu.org; Sun, 23 Apr 2023 22:24:12 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pqls4-0008MO-6O; Sun, 23 Apr 2023 22:24:04 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=uCMuouM3gMsSktLW5BgIsVMewcAuvrz0IBEpWuc6mr0=; b=iQ1a7NXkY3UM
 8JDMmU80R2QqP5uQOAL5//ZUJte88hBx8hAAe3728OmyDD7CbZnMhQuckRddrSIHSYJ8zS8B9o+Bb
 CFxA0xNIWo+U1IRtPkXQocFwGoqUQwSW8lHbp8s+Dpb5thu7DIUdaruok3XT5ETIz99d5HIBA52AU
 ebIRrXZyM6HejzgKq2v7DtCuiqHgZHlVAOM5kaGyQe0pj8sUeW01E292dXOCAmd1BOFtZ1/hkfmMC
 xKtY7x1g3O6zqu/fHKqH97B+LNvI21b7e5RJfVuhcUs2nZo5AnDLKcepx6tcI2ROP3ODyK2QtaM6P
 s0UXkFCccn4Tqb2e5mdqbw==;
Received: from [87.69.77.57] (helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pqls3-0006vg-HO; Sun, 23 Apr 2023 22:24:03 -0400
Date: Mon, 24 Apr 2023 05:24:26 +0300
Message-Id: <83wn22xbj9.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
In-Reply-To: <878reiwfm3.fsf@localhost> (message from Ihor Radchenko on Sun,
 23 Apr 2023 19:41:40 +0000)
References: <878reiwfm3.fsf@localhost>
X-Spam-Score: -2.3 (--)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Ihor Radchenko <yantar92@HIDDEN>
> Date: Sun, 23 Apr 2023 19:41:40 +0000
> 
> When investigating `re-search-forward' performance in
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=58558 (bug#58558), I
> noticed that buf_bytepos_to_charpos is taking most of the CPU time,
> according to perf stats.
> 
> This was partially caused by `parse-sexp-lookup-properties', but even
> after working around the text property issue, buf_bytepos_to_charpos
> still shows up on top of the perf profile.
> 
> Since one of the apparent bottlenecks in buf_bytepos_to_charpos is
> 
> for (tail = BUF_MARKERS (b); tail; tail = tail->next)
> 
> which obviously scales with the number of markers in buffer, I decided
> to add a cut-off parameter, as in the attached patch (number 50 has no
> particular motivation underneath).
> 
> Surprisingly, this simple change reduced my Org agenda generation times
> from 20 seconds down to 3-4 seconds!
> 
> I am sure that my dumb approach is not the best way to improve the
> performance, but this place in buf_bytepos_to_charpos is clearly
> something that can be optimized.

Interesting.  Would it be possible to show the effect of different
values of the cut-off on the performance, so we could decide which
value to use?

Thanks.




Message sent to bug-gnu-emacs@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#63040: 30.0.50; Performance of buf_bytepos_to_charpos when a buffer has large number of markers
Resent-From: Ihor Radchenko <yantar92@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@HIDDEN
Resent-Date: Mon, 24 Apr 2023 06:34:02 +0000
Resent-Message-ID: <handler.63040.B63040.16823180044340 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 63040
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: 
To: Eli Zaretskii <eliz@HIDDEN>
Cc: 63040 <at> debbugs.gnu.org
Received: via spool by 63040-submit <at> debbugs.gnu.org id=B63040.16823180044340
          (code B ref 63040); Mon, 24 Apr 2023 06:34:02 +0000
Received: (at 63040) by debbugs.gnu.org; 24 Apr 2023 06:33:24 +0000
Received: from localhost ([127.0.0.1]:47341 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pqplM-00017w-A1
	for submit <at> debbugs.gnu.org; Mon, 24 Apr 2023 02:33:24 -0400
Received: from mout01.posteo.de ([185.67.36.65]:35165)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <yantar92@HIDDEN>) id 1pqplK-00017g-Cq
 for 63040 <at> debbugs.gnu.org; Mon, 24 Apr 2023 02:33:23 -0400
Received: from submission (posteo.de [185.67.36.169]) 
 by mout01.posteo.de (Postfix) with ESMTPS id 28C46240139
 for <63040 <at> debbugs.gnu.org>; Mon, 24 Apr 2023 08:33:16 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017;
 t=1682317996; bh=+FZ8PE6bjUdMApN9r/BX9qrXrQO9ZmGzwVb1V9lRiVo=;
 h=From:To:Cc:Subject:Date:From;
 b=VpVnP32kbeH4PQK/NYoDMA4VAbx3EKTovrHiXMHOLprySXVx3KMKkBeRIpbA+f7ud
 O3vqOfRtqHcm5S1pNseuVZgsEjNZJOaFWZVTq1vYyi4ITV46FXK9HEoB8BDZ6Utz1X
 yB0Hx6FP1FdtJL1QEYnmlJ593F0cPmKgjsVB8m7+wr0q47BimD6/o4qAdQ2mc0XRcJ
 3BwyG8kb8sXMaBZgDXoQS9VjtAEkaOGhOL3xQJ8rRcahE0hAY8zvZWSmS/Yt+0QZkq
 WxKZ9FAId5MQcwPwA8WDxxRH0gT2wx6LqbPSWIXpsFLsKhON4FQFXHKv4wjlpnOBFs
 5Rmyyk8DgLOow==
Received: from customer (localhost [127.0.0.1])
 by submission (posteo.de) with ESMTPSA id 4Q4b175Htnz6tyb;
 Mon, 24 Apr 2023 08:33:15 +0200 (CEST)
From: Ihor Radchenko <yantar92@HIDDEN>
In-Reply-To: <83wn22xbj9.fsf@HIDDEN>
References: <878reiwfm3.fsf@localhost> <83wn22xbj9.fsf@HIDDEN>
Date: Mon, 24 Apr 2023 06:36:07 +0000
Message-ID: <87bkjdzt0o.fsf@localhost>
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Score: -2.3 (--)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Eli Zaretskii <eliz@HIDDEN> writes:

>> I am sure that my dumb approach is not the best way to improve the
>> performance, but this place in buf_bytepos_to_charpos is clearly
>> something that can be optimized.
>
> Interesting.  Would it be possible to show the effect of different
> values of the cut-off on the performance, so we could decide which
> value to use?

I can do such test, but I do not think that playing with cut off is the
best approach here.

The full code in question is below and there is already existing
condition to cut the marker loop early based on the distance from
best_above to the requested bytepos. So, another approach could be
playing with BYTECHAR_DISTANCE_INCREMENT. Now, it is clearly not
efficient enough for my large file. (Note that I expressed similar idea
in another discussion and was pointed exactly to this existing condition)

Further, the later code creates markers to cache recent results and
cutting too early may waste this cache. Another idea could be moving the
cache markers into a separate array, so that we can examine them without
mixing with all other buffer markers. I am not sure if it is feasible
though.

  int i = 0;
  for (tail = BUF_MARKERS (b); tail && i < 50; tail = tail->next)
    {
      i++;
      CONSIDER (tail->bytepos, tail->charpos);

      /* If we are down to a range of 50 chars,
	 don't bother checking any other markers;
	 scan the intervening chars directly now.  */
      if (best_above - bytepos < distance
          || bytepos - best_below < distance)
	break;
      else
        distance += BYTECHAR_DISTANCE_INCREMENT;
    }

  /* We get here if we did not exactly hit one of the known places.
     We have one known above and one known below.
     Scan, counting characters, from whichever one is closer.  */

  if (bytepos - best_below_byte < best_above_byte - bytepos)
    {
...
      /* If this position is quite far from the nearest known position,
	 cache the correspondence by creating a marker here.
	 It will last until the next GC.
	 But don't do it if BUF_MARKERS is nil;
	 that is a signal from Fset_buffer_multibyte.  */
      if (record && BUF_MARKERS (b))
	build_marker (b, best_below, best_below_byte);
...
      return best_below;
    }
  else
    {
...
      if (record && BUF_MARKERS (b))
	build_marker (b, best_above, best_above_byte);
...
      return best_above;
    }
}


-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>




Message sent to bug-gnu-emacs@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#63040: 30.0.50; Performance of buf_bytepos_to_charpos when a buffer has large number of markers
Resent-From: Eli Zaretskii <eliz@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@HIDDEN
Resent-Date: Mon, 24 Apr 2023 11:03:01 +0000
Resent-Message-ID: <handler.63040.B63040.168233416120942 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 63040
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: 
To: Ihor Radchenko <yantar92@HIDDEN>
Cc: 63040 <at> debbugs.gnu.org
Received: via spool by 63040-submit <at> debbugs.gnu.org id=B63040.168233416120942
          (code B ref 63040); Mon, 24 Apr 2023 11:03:01 +0000
Received: (at 63040) by debbugs.gnu.org; 24 Apr 2023 11:02:41 +0000
Received: from localhost ([127.0.0.1]:47666 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pqtxw-0005Rh-V1
	for submit <at> debbugs.gnu.org; Mon, 24 Apr 2023 07:02:41 -0400
Received: from eggs.gnu.org ([209.51.188.92]:51598)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1pqtxi-0005RD-J6
 for 63040 <at> debbugs.gnu.org; Mon, 24 Apr 2023 07:02:39 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pqtxc-0000Ag-41; Mon, 24 Apr 2023 07:02:20 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=QzwZFK3AB0F6/ZbByAoPBVJSfCHTEW+7PrN0ySIU40k=; b=KRy/vYiIzHt+
 atpBNTF3MZ42ZASVtlm21X4lRPRGk71rSZvr9kNVOmWTJXPcm3fpvcSQCGRDRO61DL6pllGbpyEt/
 syA4vSrMxigqrSQqlep8MUsqZqrQ7uFzZW3wrjz+fo9fCOkEm3IkE/zAZq+CRCkU+kSrlUToI7izB
 okug3DkSpW7WFxqzanfJBZY19AH62He0+fmOp8Xx3xVOoRBmTIpxmTM2fDT+NM6sHDXOWIvx6Us3D
 OQz2DIT6RjcNoo04bVkPrCrmp0s2+13YLSOpVbdF5PzqJ2L6c8AJxpKZVPOcKPnzMl4LJeWZDp8/T
 gfrqvUrRV7HXpPPQ2OwkGA==;
Received: from [87.69.77.57] (helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pqtxa-000103-Tp; Mon, 24 Apr 2023 07:02:19 -0400
Date: Mon, 24 Apr 2023 14:02:42 +0300
Message-Id: <83sfcpy23x.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
In-Reply-To: <87bkjdzt0o.fsf@localhost> (message from Ihor Radchenko on Mon,
 24 Apr 2023 06:36:07 +0000)
References: <878reiwfm3.fsf@localhost> <83wn22xbj9.fsf@HIDDEN>
 <87bkjdzt0o.fsf@localhost>
X-Spam-Score: -0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Ihor Radchenko <yantar92@HIDDEN>
> Cc: 63040 <at> debbugs.gnu.org
> Date: Mon, 24 Apr 2023 06:36:07 +0000
> 
> > Interesting.  Would it be possible to show the effect of different
> > values of the cut-off on the performance, so we could decide which
> > value to use?
> 
> I can do such test, but I do not think that playing with cut off is the
> best approach here.
> 
> The full code in question is below and there is already existing
> condition to cut the marker loop early based on the distance from
> best_above to the requested bytepos. So, another approach could be
> playing with BYTECHAR_DISTANCE_INCREMENT.

Yes, that would be an even better idea, IMO.

> Now, it is clearly not efficient enough for my large file.

Why do you say that?  Did you try something and the results were
unsatisfactory?  And what is not efficient enough -- the cutoff based
on the number of markers tested or based on the distance?

> Further, the later code creates markers to cache recent results and
> cutting too early may waste this cache.

And the technique that you tried doesn't waste the cache?

> Another idea could be moving the cache markers into a separate
> array, so that we can examine them without mixing with all other
> buffer markers.

Why would that separation be useful?

Thanks.




Message sent to bug-gnu-emacs@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#63040: 30.0.50; Performance of buf_bytepos_to_charpos when a buffer has large number of markers
Resent-From: Eli Zaretskii <eliz@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@HIDDEN
Resent-Date: Mon, 24 Apr 2023 11:04:02 +0000
Resent-Message-ID: <handler.63040.B63040.168233419521013 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 63040
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: 
To: Ihor Radchenko <yantar92@HIDDEN>, Stefan Monnier <monnier@HIDDEN>
Cc: 63040 <at> debbugs.gnu.org
Received: via spool by 63040-submit <at> debbugs.gnu.org id=B63040.168233419521013
          (code B ref 63040); Mon, 24 Apr 2023 11:04:02 +0000
Received: (at 63040) by debbugs.gnu.org; 24 Apr 2023 11:03:15 +0000
Received: from localhost ([127.0.0.1]:47671 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pqtyU-0005Sr-Fv
	for submit <at> debbugs.gnu.org; Mon, 24 Apr 2023 07:03:15 -0400
Received: from eggs.gnu.org ([209.51.188.92]:49856)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1pqtyF-0005S2-FI
 for 63040 <at> debbugs.gnu.org; Mon, 24 Apr 2023 07:03:14 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pqty9-0000Fx-Bm; Mon, 24 Apr 2023 07:02:53 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=bC5ZIbE/4+//Gw9dWOGAe+LhO5tVMRDoscD5hWxmj9I=; b=hjud5E/49lSK
 XsHGcVN0declldmqy95XLNjwBrK/0cUMe2u5T+5krpMNfPCO78cJ8vGTkDRJHOUqhbjogTDvcPTAi
 2nEHy6xAM/As40c+JAgMqeonWsReXy/Mtu+bZuyEhzZJutdnkfAMQpkWHGfaHvyHpFLBiKu2eyUba
 6alKUDY96lxrqS/N+wzdcA+pMmRhN7+o841nZGVWCN0dngf6upzLPmk+HHMm3WwKr4cQoK+6VeN/B
 OZr9DfHDvAqTE8CarKyecVLwL3qeBPU4urHiyfPp6PRIhbLUaINqG3BZpymrmPyjiIz4sH3Weh3Dv
 xkarHgFtu+zwsbVGXAWB9A==;
Received: from [87.69.77.57] (helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pqty8-00039a-Ll; Mon, 24 Apr 2023 07:02:52 -0400
Date: Mon, 24 Apr 2023 14:03:17 +0300
Message-Id: <83r0s9y22y.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
In-Reply-To: <87bkjdzt0o.fsf@localhost> (message from Ihor Radchenko on Mon,
 24 Apr 2023 06:36:07 +0000)
References: <878reiwfm3.fsf@localhost> <83wn22xbj9.fsf@HIDDEN>
 <87bkjdzt0o.fsf@localhost>
X-Spam-Score: 0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

[Resending with Stefan added.]

> From: Ihor Radchenko <yantar92@HIDDEN>
> Cc: 63040 <at> debbugs.gnu.org
> Date: Mon, 24 Apr 2023 06:36:07 +0000
> 
> > Interesting.  Would it be possible to show the effect of different
> > values of the cut-off on the performance, so we could decide which
> > value to use?
> 
> I can do such test, but I do not think that playing with cut off is the
> best approach here.
> 
> The full code in question is below and there is already existing
> condition to cut the marker loop early based on the distance from
> best_above to the requested bytepos. So, another approach could be
> playing with BYTECHAR_DISTANCE_INCREMENT.

Yes, that would be an even better idea, IMO.

> Now, it is clearly not efficient enough for my large file.

Why do you say that?  Did you try something and the results were
unsatisfactory?  And what is not efficient enough -- the cutoff based
on the number of markers tested or based on the distance?

> Further, the later code creates markers to cache recent results and
> cutting too early may waste this cache.

And the technique that you tried doesn't waste the cache?

> Another idea could be moving the cache markers into a separate
> array, so that we can examine them without mixing with all other
> buffer markers.

Why would that separation be useful?

Thanks.




Message sent to bug-gnu-emacs@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#63040: 30.0.50; Performance of buf_bytepos_to_charpos when a buffer has large number of markers
Resent-From: Ihor Radchenko <yantar92@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@HIDDEN
Resent-Date: Mon, 24 Apr 2023 11:16:02 +0000
Resent-Message-ID: <handler.63040.B63040.168233492222331 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 63040
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: 
To: Eli Zaretskii <eliz@HIDDEN>
Cc: 63040 <at> debbugs.gnu.org, Stefan Monnier <monnier@HIDDEN>
Received: via spool by 63040-submit <at> debbugs.gnu.org id=B63040.168233492222331
          (code B ref 63040); Mon, 24 Apr 2023 11:16:02 +0000
Received: (at 63040) by debbugs.gnu.org; 24 Apr 2023 11:15:22 +0000
Received: from localhost ([127.0.0.1]:47682 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pquAE-0005o6-20
	for submit <at> debbugs.gnu.org; Mon, 24 Apr 2023 07:15:22 -0400
Received: from mout02.posteo.de ([185.67.36.66]:35967)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <yantar92@HIDDEN>) id 1pquAC-0005nt-0C
 for 63040 <at> debbugs.gnu.org; Mon, 24 Apr 2023 07:15:20 -0400
Received: from submission (posteo.de [185.67.36.169]) 
 by mout02.posteo.de (Postfix) with ESMTPS id C9F38240228
 for <63040 <at> debbugs.gnu.org>; Mon, 24 Apr 2023 13:15:13 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017;
 t=1682334913; bh=dgctnNN88xqCvIDk1fUULel8YCKpMB48bXRnTtWQH2g=;
 h=From:To:Cc:Subject:Date:From;
 b=NcFmOJgiiJ5DRg2YWiQydOqJ/ZFx8SyEliFcT9Ijcv7nqg9pppZWUVSqkqvel0n70
 zd0AsZUebzJE0h04pOay+m0vEjrSNnBXnU73ILpUWqLUb4ipuYrIK18D0q1NAVWw/r
 5yfW4WNzFtD8eK37v61u581zLLccqnl8mhDpSsBKUq3if6wLGJyJuggdc1iAEuSYiV
 AYK9/rwjJIk/e5HMREvCbOlr1XZE/aUAPyWMSVZ7Elo8YADkyUtps5ENs8HvoIlZXc
 szXYBhdyHoxkJ1BD9IjXPBUmqo8ref8Wasy0B3K0BsXxmXcWWWh5EZ865fxJVqCQeP
 34uINqyB4SUBQ==
Received: from customer (localhost [127.0.0.1])
 by submission (posteo.de) with ESMTPSA id 4Q4jGS51Yqz9rxT;
 Mon, 24 Apr 2023 13:15:12 +0200 (CEST)
From: Ihor Radchenko <yantar92@HIDDEN>
In-Reply-To: <83r0s9y22y.fsf@HIDDEN>
References: <878reiwfm3.fsf@localhost> <83wn22xbj9.fsf@HIDDEN>
 <87bkjdzt0o.fsf@localhost> <83r0s9y22y.fsf@HIDDEN>
Date: Mon, 24 Apr 2023 11:17:59 +0000
Message-ID: <87ildlttp4.fsf@localhost>
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Score: -2.3 (--)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Eli Zaretskii <eliz@HIDDEN> writes:

>> Now, it is clearly not efficient enough for my large file.
>
> Why do you say that?  Did you try something and the results were
> unsatisfactory?  And what is not efficient enough -- the cutoff based
> on the number of markers tested or based on the distance?

Sorry for not being clear.

I was referring to the existing code.
"BYTECHAR_DISTANCE_INCREMENT" alone is clearly not efficient in my use
case because introducing an addition 50 cut-off improved the performance
significantly. Hence, there is some room for improvement in this area.

>> Further, the later code creates markers to cache recent results and
>> cutting too early may waste this cache.
>
> And the technique that you tried doesn't waste the cache?

I was talking about my technique. It is wasting the cache, and it is the
reason why I think that we should find a better approach; not the one I
used in the patch.

>> Another idea could be moving the cache markers into a separate
>> array, so that we can examine them without mixing with all other
>> buffer markers.
>
> Why would that separation be useful?

Because the markers created by buf_bytepos_to_charpos are at least 5000
bytes apart. There is no such guarantee for other buffer markers.

The while loops "while (best_below_byte < bytepos)" used as fallback
(when no nearby marker is found) traverse the buffer char-by-char
"("best_below_byte += buf_next_char_len (b, best_below_byte);" and
should be strictly inferior compared to well-spaced marker list.

However, when the marker list is not well-spaced, looping over all the
buffer markers can be a waste. And it looks like I hit exactly such
scenario in my setup.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





Last modified: Mon, 24 Apr 2023 11:30:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.