GNU bug report logs - #60953
The :match predicate with large regexp in tree-sitter font-lock seems inefficient

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: emacs; Reported by: Dmitry Gutov <dgutov@HIDDEN>; dated Fri, 20 Jan 2023 03:54:02 UTC; Maintainer for emacs is bug-gnu-emacs@HIDDEN.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 2 Feb 2023 12:13:05 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Feb 02 07:13:04 2023
Received: from localhost ([127.0.0.1]:32779 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pNYSe-0003pe-FQ
	for submit <at> debbugs.gnu.org; Thu, 02 Feb 2023 07:13:04 -0500
Received: from mail-wm1-f49.google.com ([209.85.128.49]:42848)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <raaahh@HIDDEN>) id 1pNYSc-0003ow-TZ
 for 60953 <at> debbugs.gnu.org; Thu, 02 Feb 2023 07:13:03 -0500
Received: by mail-wm1-f49.google.com with SMTP id
 j29-20020a05600c1c1d00b003dc52fed235so1231594wms.1
 for <60953 <at> debbugs.gnu.org>; Thu, 02 Feb 2023 04:13:02 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=in-reply-to:from:references:cc:to:content-language:subject
 :user-agent:mime-version:date:message-id:sender:from:to:cc:subject
 :date:message-id:reply-to;
 bh=Uk/X+tLx1wLXEzXQNVMJ5v/ig+gG4vumt+aUs+Ju3oM=;
 b=OQ7CS6iYaH8OWd9CJ4Etx8Ss5bQX7WYPiiS+wGcMTHUE9KYEHWOXKGNDeWOu2IvStA
 +q9W5WHY1E9WkwKms2krHPRMjU7y1j/QUJKOnVs6AJPiasYEflPYFc71gUJrxD3PXCUy
 6Z3+KGSKUc9QxovZcZWRRyyKIIE0Szu+g71M+rlIbwFVQ/L4pCXuwPlCbqdCWYtsWCKD
 ELQlEUpd5DqmaLT4wBC9l81kmXjLx17TIS4sDvGDI3dBPMnNDIVEaEiqQPuRAHiBI0QW
 KrDb9y8pTgHb8YndD0TyYfG+Ga5HbI6zuhHBv+00bKS3NICHdOKvNP/jvFleqhcSadhz
 DDpw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=in-reply-to:from:references:cc:to:content-language:subject
 :user-agent:mime-version:date:message-id:sender:x-gm-message-state
 :from:to:cc:subject:date:message-id:reply-to;
 bh=Uk/X+tLx1wLXEzXQNVMJ5v/ig+gG4vumt+aUs+Ju3oM=;
 b=8AAEGM8CfAU0fvfGIoZkHoT4PYhtR/pLaACr5FPqyk9wINIzg230VBvleeb2eHQm0o
 bewikXaeuHVffdoTJRprCPztqt2UWxjJ6VAd4jAG5TFLJ/+8hFVNJuXkx+EmMpkrhyF+
 xtE/tQicyJMkozgsT8Y8l3lJuR2yTIPP9tOfd6eTXVUruKml3yLxsYo6XKdU8I6uAhQi
 BsdTczFhrE4T48NN9/lQuzjxzF0ZHDNys+7bOielmo3HJy7s9hi6GPpSo/WUq7EmsjMW
 0/V1PpcUcJP6QvxFpggb/6NRSajYZoMSRmTzcZgnr8nMKFWy+7rKQcDqTayA9qmZKcF7
 U8Zw==
X-Gm-Message-State: AO0yUKWfil1l5HBhSqSnEU6AZrDX5/8WMP1qswgR7D23K/kNE8sqEmiA
 2j/D0HGhCbSBON9EHWDVtA8=
X-Google-Smtp-Source: AK7set9JsLgIkTXU0rxa3smom2oMHTTirvCyWLvMlXGZ/K0yTc0j7NlRM+6mPOILAtIF9qAG39TJBQ==
X-Received: by 2002:a05:600c:35cd:b0:3d3:5319:b6d3 with SMTP id
 r13-20020a05600c35cd00b003d35319b6d3mr5733764wmq.38.1675339976868; 
 Thu, 02 Feb 2023 04:12:56 -0800 (PST)
Received: from [192.168.0.2] ([46.251.119.176])
 by smtp.googlemail.com with ESMTPSA id
 x9-20020a05600c21c900b003dc434b39c7sm9252253wmj.0.2023.02.02.04.12.55
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Thu, 02 Feb 2023 04:12:55 -0800 (PST)
Content-Type: multipart/mixed; boundary="------------1UISxF3jfS6GiC00qsvGlSvW"
Message-ID: <c6677ced-9538-2a1b-0025-4d90888f1246@HIDDEN>
Date: Thu, 2 Feb 2023 14:12:54 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
Content-Language: en-US
To: Eli Zaretskii <eliz@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN>
 <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN>
 <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN>
 <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
 <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN>
 <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN>
 <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN>
 <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN>
 <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN>
 <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> <83sffr2xq2.fsf@HIDDEN>
 <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN>
 <30b047a9-dfe3-948a-a123-2e15221c239d@HIDDEN> <83tu05zeqe.fsf@HIDDEN>
 <957c7e3e-830e-0a03-562f-fdddc6bd1c06@HIDDEN> <835yckzic7.fsf@HIDDEN>
From: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <835yckzic7.fsf@HIDDEN>
X-Spam-Score: 0.1 (/)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.9 (-)

This is a multi-part message in MIME format.
--------------1UISxF3jfS6GiC00qsvGlSvW
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

On 02/02/2023 08:34, Eli Zaretskii wrote:
>> Date: Wed, 1 Feb 2023 23:20:50 +0200
>> Cc:casouri@HIDDEN,60953 <at> debbugs.gnu.org
>> From: Dmitry Gutov<dgutov@HIDDEN>
>>
>> On 01/02/2023 15:39, Eli Zaretskii wrote:
>>>> Please see the attachment.
>>>>
>>>> To note the numbers: the first patch does quite well to improve the
>>>> performance of modes which use :match in queries which match a lot of
>>>> nodes (e.g. rust-ts-mode or ruby-ts-mode). There boost in those was on
>>>> the order of 25%.
>>>>
>>>> The second one is longer, and the boost (on top of the first one) is
>>>> around 5-6%, stable. Not as impressive, but at least it brings :match's
>>>> performance a little above :pred's in my example.
>>> Fine by me, if Yuan also approves.
>> For emacs-29, right?
> Yes.

Please take a look at this additional patch on top of the other one. It 
is ok?

I just remembered that if we don't want to auto-anchor the regexp, then 
'looking-at' is not the appropriate semantic. So this switches to 
'search_buffer'.
--------------1UISxF3jfS6GiC00qsvGlSvW
Content-Type: text/x-patch; charset=UTF-8;
 name="treesit_predicate_match_use_search_buffer.diff"
Content-Disposition: attachment;
 filename="treesit_predicate_match_use_search_buffer.diff"
Content-Transfer-Encoding: base64

ZGlmZiAtLWdpdCBhL3NyYy9saXNwLmggYi9zcmMvbGlzcC5oCmluZGV4IDcwNTU1YjNjZTkx
Li4xMjc2Mjg1ZTJmMiAxMDA2NDQKLS0tIGEvc3JjL2xpc3AuaAorKysgYi9zcmMvbGlzcC5o
CkBAIC00ODAyLDYgKzQ4MDIsOSBAQCBmYXN0X2Nfc3RyaW5nX21hdGNoX2lnbm9yZV9jYXNl
IChMaXNwX09iamVjdCByZWdleHAsCiAJCQkJICAgICAgIHB0cmRpZmZfdCwgcHRyZGlmZl90
ICopOwogZXh0ZXJuIHB0cmRpZmZfdCBmaW5kX2JlZm9yZV9uZXh0X25ld2xpbmUgKHB0cmRp
ZmZfdCwgcHRyZGlmZl90LAogCQkJCQkgICBwdHJkaWZmX3QsIHB0cmRpZmZfdCAqKTsKK2V4
dGVybiBFTUFDU19JTlQgc2VhcmNoX2J1ZmZlciAoTGlzcF9PYmplY3QsIHB0cmRpZmZfdCwg
cHRyZGlmZl90LAorCQkJCXB0cmRpZmZfdCwgcHRyZGlmZl90LCBFTUFDU19JTlQsCisJCQkJ
aW50LCBMaXNwX09iamVjdCwgTGlzcF9PYmplY3QsIGJvb2wpOwogZXh0ZXJuIHZvaWQgc3lt
c19vZl9zZWFyY2ggKHZvaWQpOwogZXh0ZXJuIHZvaWQgY2xlYXJfcmVnZXhwX2NhY2hlICh2
b2lkKTsKIApkaWZmIC0tZ2l0IGEvc3JjL3NlYXJjaC5jIGIvc3JjL3NlYXJjaC5jCmluZGV4
IGRiYzVhODM5NDZmLi4wYmI1MmMwM2VlZiAxMDA2NDQKLS0tIGEvc3JjL3NlYXJjaC5jCisr
KyBiL3NyYy9zZWFyY2guYwpAQCAtNjgsOSArNjgsNiBAQCAjZGVmaW5lIFJFR0VYUF9DQUNI
RV9TSVpFIDIwCiBzdGF0aWMgRU1BQ1NfSU5UIGJveWVyX21vb3JlIChFTUFDU19JTlQsIHVu
c2lnbmVkIGNoYXIgKiwgcHRyZGlmZl90LAogICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgTGlzcF9PYmplY3QsIExpc3BfT2JqZWN0LCBwdHJkaWZmX3QsCiAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICBwdHJkaWZmX3QsIGludCk7Ci1zdGF0aWMgRU1BQ1NfSU5UIHNl
YXJjaF9idWZmZXIgKExpc3BfT2JqZWN0LCBwdHJkaWZmX3QsIHB0cmRpZmZfdCwKLSAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgcHRyZGlmZl90LCBwdHJkaWZmX3QsIEVNQUNT
X0lOVCwgaW50LAotICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBMaXNwX09iamVj
dCwgTGlzcF9PYmplY3QsIGJvb2wpOwogCiBMaXNwX09iamVjdCByZV9tYXRjaF9vYmplY3Q7
CiAKQEAgLTE1MTAsNyArMTUwNyw3IEBAIHNlYXJjaF9idWZmZXJfbm9uX3JlIChMaXNwX09i
amVjdCBzdHJpbmcsIHB0cmRpZmZfdCBwb3MsCiAgIHJldHVybiByZXN1bHQ7CiB9CiAKLXN0
YXRpYyBFTUFDU19JTlQKK0VNQUNTX0lOVAogc2VhcmNoX2J1ZmZlciAoTGlzcF9PYmplY3Qg
c3RyaW5nLCBwdHJkaWZmX3QgcG9zLCBwdHJkaWZmX3QgcG9zX2J5dGUsCiAJICAgICAgIHB0
cmRpZmZfdCBsaW0sIHB0cmRpZmZfdCBsaW1fYnl0ZSwgRU1BQ1NfSU5UIG4sCiAJICAgICAg
IGludCBSRSwgTGlzcF9PYmplY3QgdHJ0LCBMaXNwX09iamVjdCBpbnZlcnNlX3RydCwgYm9v
bCBwb3NpeCkKZGlmZiAtLWdpdCBhL3NyYy90cmVlc2l0LmMgYi9zcmMvdHJlZXNpdC5jCmlu
ZGV4IDUxMDE3MGNhNjQwLi4wNjlmYTM2MDhiZCAxMDA2NDQKLS0tIGEvc3JjL3RyZWVzaXQu
YworKysgYi9zcmMvdHJlZXNpdC5jCkBAIC0yNDkxLDcgKzI0OTEsOCBAQCB0cmVlc2l0X3By
ZWRpY2F0ZV9tYXRjaCAoTGlzcF9PYmplY3QgYXJncywgc3RydWN0IGNhcHR1cmVfcmFuZ2Ug
Y2FwdHVyZXMpCiAgIFpWID0gZW5kX3BvczsKICAgWlZfQllURSA9IGVuZF9ieXRlOwogCi0g
IHB0cmRpZmZfdCB2YWwgPSBmYXN0X2xvb2tpbmdfYXQgKHJlZ2V4cCwgc3RhcnRfcG9zLCBz
dGFydF9ieXRlLCBlbmRfcG9zLCBlbmRfYnl0ZSwgUW5pbCk7CisgIHB0cmRpZmZfdCB2YWwg
PSBzZWFyY2hfYnVmZmVyIChyZWdleHAsIHN0YXJ0X3Bvcywgc3RhcnRfYnl0ZSwgZW5kX3Bv
cywgZW5kX2J5dGUsCisJCQkJIDEsIDEsIFFuaWwsIFFuaWwsIGZhbHNlKTsKIAogICBCRUdW
ID0gb2xkX2JlZ3Y7CiAgIEJFR1ZfQllURSA9IG9sZF9iZWd2X2J5dGU7Cg==

--------------1UISxF3jfS6GiC00qsvGlSvW--




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 2 Feb 2023 06:34:26 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Feb 02 01:34:26 2023
Received: from localhost ([127.0.0.1]:60413 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pNTAv-0000d1-QE
	for submit <at> debbugs.gnu.org; Thu, 02 Feb 2023 01:34:26 -0500
Received: from eggs.gnu.org ([209.51.188.92]:54178)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1pNTAr-0000cn-Vf
 for 60953 <at> debbugs.gnu.org; Thu, 02 Feb 2023 01:34:24 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pNTAm-0003SY-9a; Thu, 02 Feb 2023 01:34:16 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=FQs895Incn6zWM0/+XUNfunRiTuEMmGMa5zqyED8NDA=; b=Y9bwBiNxyHu4
 S+wfC87uz3c/2SrDhfTEJ2GZHvTfNAfiljwxrHtLgUSILcGkYm3bNUN2x/qbU2bXyflbxhPm2qlcl
 sHbtSoXSpj4bkpoUF1cK4ovoHt/6py411lVfLudGYhqirP8nd/H3fR6kDVWy9ttokFpTHT1yX7vT2
 0iE/l4A+KXcBmeG3dUiCZo79hm8HSWq8aO4dqByrkrTfyc7t4dGZpGKmYkyP9oQn3OkO7g5GZ15+H
 VkBxOCC1kZsxL+oa50ReQZdb2C2j0YTOjdcdPAzmi3AuhSFi+YVT4oiW92ad6yNlIT13ZDMaAa8dF
 ZggY/wdXgHKhQVGNWBEEVQ==;
Received: from [87.69.77.57] (helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pNTAl-0007gZ-Bi; Thu, 02 Feb 2023 01:34:15 -0500
Date: Thu, 02 Feb 2023 08:34:16 +0200
Message-Id: <835yckzic7.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <957c7e3e-830e-0a03-562f-fdddc6bd1c06@HIDDEN> (message from
 Dmitry Gutov on Wed, 1 Feb 2023 23:20:50 +0200)
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN>
 <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN>
 <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN>
 <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN>
 <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
 <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN>
 <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN>
 <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN>
 <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN>
 <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN>
 <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> <83sffr2xq2.fsf@HIDDEN>
 <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN>
 <30b047a9-dfe3-948a-a123-2e15221c239d@HIDDEN> <83tu05zeqe.fsf@HIDDEN>
 <957c7e3e-830e-0a03-562f-fdddc6bd1c06@HIDDEN>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> Date: Wed, 1 Feb 2023 23:20:50 +0200
> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
> From: Dmitry Gutov <dgutov@HIDDEN>
> 
> On 01/02/2023 15:39, Eli Zaretskii wrote:
> >> Please see the attachment.
> >>
> >> To note the numbers: the first patch does quite well to improve the
> >> performance of modes which use :match in queries which match a lot of
> >> nodes (e.g. rust-ts-mode or ruby-ts-mode). There boost in those was on
> >> the order of 25%.
> >>
> >> The second one is longer, and the boost (on top of the first one) is
> >> around 5-6%, stable. Not as impressive, but at least it brings :match's
> >> performance a little above :pred's in my example.
> > Fine by me, if Yuan also approves.
> 
> For emacs-29, right?

Yes.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 2 Feb 2023 02:16:24 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Feb 01 21:16:24 2023
Received: from localhost ([127.0.0.1]:60182 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pNP9E-00023H-7K
	for submit <at> debbugs.gnu.org; Wed, 01 Feb 2023 21:16:24 -0500
Received: from mail-pl1-f175.google.com ([209.85.214.175]:40829)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <casouri@HIDDEN>) id 1pNP9D-000233-3s
 for 60953 <at> debbugs.gnu.org; Wed, 01 Feb 2023 21:16:23 -0500
Received: by mail-pl1-f175.google.com with SMTP id be8so443860plb.7
 for <60953 <at> debbugs.gnu.org>; Wed, 01 Feb 2023 18:16:23 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:from:to:cc:subject:date
 :message-id:reply-to;
 bh=JPx6jeI1Sa9CCyOeKgpaxHkYRohmeOwjbWyee+q70cY=;
 b=hKAxHoEcoQutZQlueg6dRRkKLHYFIS6ofSRq53HY9kRlZ0mj0EduEJ8jtN7QsKitG5
 kyQ+JiSeWhc6clZE1UiUPprsTSH+dBga+Ru19zmUfCfI6Y5YHmnmDx4eRwYokxqnjyii
 K6KlORhxpM+YKVLxL5bqNiZjQ2UfnGP+/EP+zQvgjuwYzuLHUyYFexvY+PaF4y9wIE0t
 okP01sh0HO+z181jIcvwGXbgFWLWib/z7VAv4iWcJ/HYdDsUmHMlt8tqjjPS2n4VWxLv
 xYK0UzbSCIlVR3pDO5J26T0Xwtf7YG3BDOW4D6ZBmtmO2kfn2LD2yh4w89qT+FkMAivG
 uPIg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=JPx6jeI1Sa9CCyOeKgpaxHkYRohmeOwjbWyee+q70cY=;
 b=FsZR3refYOJDfSEqG0xUZg7v12p74+pnlHwaKfZe7yupuvnPANkefo9ipDhaTUzZ9S
 XO9G3mtrWg+lpCE4yzpcsLdjk1NO0eVD5X/Rxsi5eEYY8AaPDvkdcxiZJ5Kf5XMfCQpW
 c0Ranf8CyeA055NovMIFS43mp+lct8lopYy5lc0cogLJe0v6yl9o8s2y6CYEjmIM7lsE
 lvIcOF5r2LeYEXkYTYZr3DlnXqOhTwsXDZ7Wrl7LCBRO+NX03smkiwvgdwmLKjByfMCW
 XawczEUPV+NO7W8tjTZNtbF8bHkX8zWDcpF+7ptWjeqCLpeRsoUA12wIvUwhkAnluAfw
 2g1Q==
X-Gm-Message-State: AO0yUKUvCrWSKCbOJWTnEUSGRPiq782c7CjrRbXvuDVoKrU4rlr3eB3C
 3jq2JVZHkibPF6YWkDHXJF4=
X-Google-Smtp-Source: AK7set8DEVCCyhkxVd86Fzs6haQ6L+Qg2qzZnJ1NEA8LLLnpz0Vvl98Gq1qVP97X1hmsGLtTlel2Ww==
X-Received: by 2002:a17:90a:1a46:b0:227:c69:3ca7 with SMTP id
 6-20020a17090a1a4600b002270c693ca7mr5033665pjl.10.1675304177035; 
 Wed, 01 Feb 2023 18:16:17 -0800 (PST)
Received: from smtpclient.apple (cpe-172-117-161-177.socal.res.rr.com.
 [172.117.161.177]) by smtp.gmail.com with ESMTPSA id
 f22-20020a17090ac29600b00228f45d589fsm2010996pjt.29.2023.02.01.18.16.16
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Wed, 01 Feb 2023 18:16:16 -0800 (PST)
Content-Type: text/plain;
	charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.300.101.1.3\))
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
From: Yuan Fu <casouri@HIDDEN>
In-Reply-To: <957c7e3e-830e-0a03-562f-fdddc6bd1c06@HIDDEN>
Date: Wed, 1 Feb 2023 18:16:05 -0800
Content-Transfer-Encoding: 7bit
Message-Id: <644EE883-9DC1-44B6-BD6C-2082319324F2@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN>
 <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN>
 <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN>
 <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN>
 <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
 <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN>
 <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN>
 <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN>
 <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN>
 <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN>
 <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> <83sffr2xq2.fsf@HIDDEN>
 <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN>
 <30b047a9-dfe3-948a-a123-2e15221c239d@HIDDEN> <83tu05zeqe.fsf@HIDDEN>
 <957c7e3e-830e-0a03-562f-fdddc6bd1c06@HIDDEN>
To: Dmitry Gutov <dgutov@HIDDEN>
X-Mailer: Apple Mail (2.3731.300.101.1.3)
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 60953
Cc: Eli Zaretskii <eliz@HIDDEN>, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)



> On Feb 1, 2023, at 1:20 PM, Dmitry Gutov <dgutov@HIDDEN> wrote:
> 
> On 01/02/2023 15:39, Eli Zaretskii wrote:
>>> Please see the attachment.
>>> 
>>> To note the numbers: the first patch does quite well to improve the
>>> performance of modes which use :match in queries which match a lot of
>>> nodes (e.g. rust-ts-mode or ruby-ts-mode). There boost in those was on
>>> the order of 25%.
>>> 
>>> The second one is longer, and the boost (on top of the first one) is
>>> around 5-6%, stable. Not as impressive, but at least it brings :match's
>>> performance a little above :pred's in my example.
>> Fine by me, if Yuan also approves.
> 
> For emacs-29, right?
> 
> Waiting for Yuan's confirmation.

Yeah please go ahead :-)

Yuan




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 1 Feb 2023 21:21:01 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Feb 01 16:21:00 2023
Received: from localhost ([127.0.0.1]:60017 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pNKXM-0002zT-OP
	for submit <at> debbugs.gnu.org; Wed, 01 Feb 2023 16:21:00 -0500
Received: from mail-wr1-f54.google.com ([209.85.221.54]:44006)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <raaahh@HIDDEN>) id 1pNKXK-0002zC-Vy
 for 60953 <at> debbugs.gnu.org; Wed, 01 Feb 2023 16:20:59 -0500
Received: by mail-wr1-f54.google.com with SMTP id h12so18567240wrv.10
 for <60953 <at> debbugs.gnu.org>; Wed, 01 Feb 2023 13:20:58 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:from:to:cc:subject:date:message-id:reply-to;
 bh=a6k/8yzXBVR8UiMQTrQMop4ghtbftoPYQVp0ZaZJzsg=;
 b=c8TmlCPimVrxVvYCvC4u5ntOhw92dNhc9CitcDsjM1TdofXwWacBTqgFcSos2TH0uY
 cx9HS062Aah9n5+gV0t838el6aY/oCa3RRA9xHfkNXsUU3JuoAaX0rHJcXnGjIV7uVNh
 LvdjQYbDDnCFgWVJIZqppbI3mNEGfmfFIW/U1HxbRbIpuWSkzaykBRpcgcz2CqFWx5Xl
 Vwhu5kaNqi/dmxJwx2rDR6fENhRDZ+UVUhJ4FhbTCDmNocZN0K8ANi6cl+ND7RTjd9k8
 A8LCiV4JpzQifpizUVTW6vE8Hs05UPgy/iaYJrq3px1iLn1XVbMSlgApU7YOBdf9jTko
 VDvQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=a6k/8yzXBVR8UiMQTrQMop4ghtbftoPYQVp0ZaZJzsg=;
 b=CYoy6greTDZjqu9gA3pVP2QLZ4rBX3WJrpPs+wpsLzdDqNbrAQw89YIDfM3OQ/X+Nc
 Mic83L03fxcMhKjVmwL7WXYBvvqDG5OBIIep7/ehMy8Js89pnO+eFQofxcL4ujmBH98a
 leXKbbTfmi9j3Xy7q1834l7EATISf4WLLjnLz6IKtYaTb97ykkK27aJo5my3IrHhs3Fr
 0Spr8L08AHIWcCuhpOY8bksV4OPZH/aDUOBiQy+7bFfEkfY2943CXhR8qvbIR3Pp7wnU
 EhoNb9vycihZMQunfHVdnJdnD/dP05poO3OJe1wyzeIMPTZaNT3J/cmiwbvidxUnPaGx
 vI6g==
X-Gm-Message-State: AO0yUKWcXDAkksTIHqSSuynxMQWRsQO7Cc/6ffHRmePDl0HCYS19YX01
 7gd06uvIoR23cXdo2vq66SM=
X-Google-Smtp-Source: AK7set/WHT3MNuCFUKkQIPHwJ+u0F4kjD+s0eEeYPo0ldQKT8U4TWKxjd2bL29XoanMbX2MnF1lhow==
X-Received: by 2002:a05:6000:10c5:b0:242:1b0d:9c58 with SMTP id
 b5-20020a05600010c500b002421b0d9c58mr3117191wrx.69.1675286452827; 
 Wed, 01 Feb 2023 13:20:52 -0800 (PST)
Received: from [192.168.0.2] ([46.251.119.176])
 by smtp.googlemail.com with ESMTPSA id
 r6-20020adff106000000b002bfe05bf6dcsm13195858wro.88.2023.02.01.13.20.51
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Wed, 01 Feb 2023 13:20:51 -0800 (PST)
Message-ID: <957c7e3e-830e-0a03-562f-fdddc6bd1c06@HIDDEN>
Date: Wed, 1 Feb 2023 23:20:50 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
Content-Language: en-US
To: Eli Zaretskii <eliz@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN>
 <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN>
 <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN>
 <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN>
 <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
 <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN>
 <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN>
 <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN>
 <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN>
 <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN>
 <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> <83sffr2xq2.fsf@HIDDEN>
 <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN>
 <30b047a9-dfe3-948a-a123-2e15221c239d@HIDDEN> <83tu05zeqe.fsf@HIDDEN>
From: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <83tu05zeqe.fsf@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: -0.9 (/)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.9 (-)

On 01/02/2023 15:39, Eli Zaretskii wrote:
>> Please see the attachment.
>>
>> To note the numbers: the first patch does quite well to improve the
>> performance of modes which use :match in queries which match a lot of
>> nodes (e.g. rust-ts-mode or ruby-ts-mode). There boost in those was on
>> the order of 25%.
>>
>> The second one is longer, and the boost (on top of the first one) is
>> around 5-6%, stable. Not as impressive, but at least it brings :match's
>> performance a little above :pred's in my example.
> Fine by me, if Yuan also approves.

For emacs-29, right?

Waiting for Yuan's confirmation.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 1 Feb 2023 15:15:54 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Feb 01 10:15:54 2023
Received: from localhost ([127.0.0.1]:59626 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pNEq1-0001CZ-TF
	for submit <at> debbugs.gnu.org; Wed, 01 Feb 2023 10:15:54 -0500
Received: from mail-ej1-f53.google.com ([209.85.218.53]:44013)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <raaahh@HIDDEN>) id 1pNEpz-0001CL-6Y
 for 60953 <at> debbugs.gnu.org; Wed, 01 Feb 2023 10:15:52 -0500
Received: by mail-ej1-f53.google.com with SMTP id mc11so29862849ejb.10
 for <60953 <at> debbugs.gnu.org>; Wed, 01 Feb 2023 07:15:51 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:from:to:cc:subject:date:message-id:reply-to;
 bh=jYXlKai+v24f9CvrcPXvjUu2TE94Y3TpFEQWFAkVg3I=;
 b=OfvcIrCrR812wLdc5Io9ZZFegUOPYDjQP9bhRsDq27SaDfpyps311maRkokfyfIs5+
 1zxHnBtJ0ALfi8D5X8H4GwW4apKhbsu6rifU9fdhUxH1xpEbH/5ipDSxwtJDP5SYx3cK
 +5Wj4Iz2tZ9I3/hB6JZUbX4IU2mkHKr+LTGVgKK8Yy3Q2H3X3nHFQJA1+sDxHe7rxX2v
 im+EFpOHpE0T2P9uPTF07zCwYz8bpt15AfzrSKuoU1LzP81cUg6MrF6fH3jddP9Blb7a
 B0d5LZIXf7jn616/W8D028tbRA78vGZnXsmBH9CmK8qcp/P97cgp5bEho4M2KLaM+hRZ
 dQkw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=jYXlKai+v24f9CvrcPXvjUu2TE94Y3TpFEQWFAkVg3I=;
 b=c1cVcUEoU5vOaFpoxiuCgtZO7eQvbrvirDc9mx3LPCWj6MsWy4wmeyZMId4bSNdpEy
 6S7kFP5Z6zoEJ1+mmxEETjUOxiDScfJ0eR8YypSOdfYr5E0bW1tdsIkG1Jq4VJhkIIU+
 21P4Fl5Qq4TAHP0JuP7I8y3WLTFFF8Kgf098nVvIeuUSYg2xJI3BI5IbpNdSDjIqZhci
 93okA88iL3Im/FSY2QTyPJBO0/ChxFmPsmr1gIgeKuLRhkbjzDcvexmRJ+wyWUtlFtgP
 hBAwARyh0DdXm1rVvWBoXPGBLu2svZLbm120FZNLyz+jGx2GfUAlmljb1CI18Cu+vLEu
 GF1g==
X-Gm-Message-State: AO0yUKXDAnwox0Pm8mHs9MzCwX6+LIRTiEOwusRp2U5fZ1i0GvDUuZyf
 iIEIla0tYAQsXcX3JrPqed8=
X-Google-Smtp-Source: AK7set9VgUNa23ky7Mf2o4C2T2ITfAZ8cOzoDlepAmJAdiTepdx21k6LAp9enIeb5DJFgDVBXWkaSQ==
X-Received: by 2002:a17:906:8a63:b0:888:7ce4:1dc1 with SMTP id
 hy3-20020a1709068a6300b008887ce41dc1mr2823022ejc.26.1675264545292; 
 Wed, 01 Feb 2023 07:15:45 -0800 (PST)
Received: from [192.168.0.2] ([46.251.119.176])
 by smtp.googlemail.com with ESMTPSA id
 h15-20020a170906584f00b00886ec4f2fc7sm5866902ejs.17.2023.02.01.07.15.44
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Wed, 01 Feb 2023 07:15:44 -0800 (PST)
Message-ID: <85293540-cc52-1535-eb8b-85adf646c016@HIDDEN>
Date: Wed, 1 Feb 2023 17:15:43 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
Content-Language: en-US
To: Eli Zaretskii <eliz@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
 <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN>
 <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN>
 <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
 <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN>
 <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN>
 <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN>
 <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN>
 <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN>
 <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> <83sffr2xq2.fsf@HIDDEN>
 <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN> <83zg9xzg35.fsf@HIDDEN>
From: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <83zg9xzg35.fsf@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.1 (/)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.9 (-)

On 01/02/2023 15:10, Eli Zaretskii wrote:
>> Date: Tue, 31 Jan 2023 20:16:01 +0200
>> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
>> From: Dmitry Gutov <dgutov@HIDDEN>
>>
>>> Can you describe what that function should do?  I don't think I have a
>>> clear idea of that.
>>
>> In Lisp that function could be implemented as
>>
>>     (defun buffer-substring-match (regexp &optional start end
>>                                    inhibit-modify)
>>       (string-match regexp
>>                     (buffer-substring (or start (point-min))
>>                                       (or end (point-max)))
>>                     inhibit-modify))
>>
>> Meaning, it matches the regexp against the buffer substring, with the
>> string-start and string-end anchors working.
>>
>> But it would be implemented in C, meaning we could avoid the extra
>> consing and funcall overhead.
> 
> Now I'm confused, because I thought the C functions we were
> considering all fit the above description.

Except they don't match \` and \' at START and END. Only at actual 
point-min and point-max.

I suppose the new function could be implemented with narrowing as well. 
If we decide it's the best method.

>> Anyway, it seems like it might be too late as an addition to Emacs 29.
>> And we can implement the match predicate using narrowing for this
>> release, to be updated later.
> 
> Right, this is not a catastrophe, IMO.  The work on TS-based modes is
> just beginning.

I also don't see any particular slowdown from altering BEGV and ZV in C. 
So the current method might be the fastest possible anyway.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 1 Feb 2023 15:13:24 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Feb 01 10:13:24 2023
Received: from localhost ([127.0.0.1]:59619 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pNEnb-00018M-Pz
	for submit <at> debbugs.gnu.org; Wed, 01 Feb 2023 10:13:24 -0500
Received: from mail-ed1-f43.google.com ([209.85.208.43]:36527)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <raaahh@HIDDEN>) id 1pNEnZ-000189-EV
 for 60953 <at> debbugs.gnu.org; Wed, 01 Feb 2023 10:13:22 -0500
Received: by mail-ed1-f43.google.com with SMTP id u21so18010411edv.3
 for <60953 <at> debbugs.gnu.org>; Wed, 01 Feb 2023 07:13:21 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:from:to:cc:subject:date:message-id:reply-to;
 bh=JCimCbq7bSkc8Y14IceIBWg93GEGP2jYRPo4vjT7DaE=;
 b=XjIeUfVZOBEofLAdGiucqrc2EbXe9XhxwSDrK70zXUucSAEsRfQtD4/qZOSFsSi/Hj
 7+PKV9EU9wKlDfoBLcZ3sjtMTlDXn186DALobIDEvTkewIOgJ9sQYPez0qHSCdYajNpq
 PwY9f7+t8xGkJh6UNnMvVqwVyKHjmsdnuNm52XUnhWwAIGANs+r08qARP3hZkioVr+Sh
 sHgo9L3fQF1Bd8V1BS54IYZMKI6FsTaI8kJWA2lN/UdetGXbmQF1TAoF+6rq7eWKDjti
 GI3J/EzDP6/u4SWOeQIJCoFYa9BcYccJRpDDsh40VzLOvy2MePIPNee+VKwSJycAy+ae
 IlIA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=JCimCbq7bSkc8Y14IceIBWg93GEGP2jYRPo4vjT7DaE=;
 b=ZxiQJbFppSvvpwjAkw/TVQEaf4bhO9bd1M+eSTmrEHRT/wMv+6uFS2VN1SjS9KJGtl
 R/lLz4efdWQ49fHb0pbfsnvWYJZ5Exy+xKGRb1tbUysvrCkzG9DREGJrEdD+RhJRAtR+
 /XmMIt1jC/X+7U7z59P1sYcREInAIZ+NqvXlUgArmNhJyRSn7KD40symD5nXIOvkXKpE
 HlpUoQxsH+uNwHar1WfHffyQBwPbqcW1SJF9ZdaK4hO6SGTsK0bTtHW3Aq6TH8NNv0oz
 SOP60tGsp8piJbFpMYbIyT1Qt/vhGqw/8mo+QfBMZJddau2/3Tx4s/Q9PoEVFRyjhuyn
 itgg==
X-Gm-Message-State: AO0yUKXfh9+4EUGO46zrtfLQw74TBVCixXQJkUzFbqB5BgJh+tfsrpX1
 74VFWHcVq6ye/GtZXd5nDuw=
X-Google-Smtp-Source: AK7set9xioumkZWMZ37VxoHo6NtfKMQDOlW9XVPGr/Q0hF32onQgGX96HY4FhYOPmb59JvxPNX/Iew==
X-Received: by 2002:aa7:dac6:0:b0:46f:a6ea:202 with SMTP id
 x6-20020aa7dac6000000b0046fa6ea0202mr2711837eds.37.1675264395899; 
 Wed, 01 Feb 2023 07:13:15 -0800 (PST)
Received: from [192.168.0.2] ([46.251.119.176])
 by smtp.googlemail.com with ESMTPSA id
 dc19-20020a170906c7d300b00887a28ac01asm5384769ejb.31.2023.02.01.07.13.14
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Wed, 01 Feb 2023 07:13:15 -0800 (PST)
Message-ID: <1103a065-ecd5-45f0-c5e7-d357d7674aaa@HIDDEN>
Date: Wed, 1 Feb 2023 17:13:13 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
Content-Language: en-US
To: Eli Zaretskii <eliz@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN>
 <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN>
 <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN>
 <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN>
 <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
 <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN>
 <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN>
 <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN>
 <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN>
 <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN>
 <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> <83sffr2xq2.fsf@HIDDEN>
 <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN>
 <30b047a9-dfe3-948a-a123-2e15221c239d@HIDDEN> <83tu05zeqe.fsf@HIDDEN>
From: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <83tu05zeqe.fsf@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.1 (/)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.9 (-)

On 01/02/2023 15:39, Eli Zaretskii wrote:
>> Date: Wed, 1 Feb 2023 04:39:29 +0200
>> From: Dmitry Gutov<dgutov@HIDDEN>
>> Cc:casouri@HIDDEN,60953 <at> debbugs.gnu.org
>>
>> So here. I've installed the first patch, which didn't raise up too many
>> concerns previously, and here's the new iteration on the second patch.
> Thanks, but please in the future when you make changes which call
> functions from external libraries that were not called previously, be
> sure to do the DEF_DLL_FN/LOAD_DLL_FN and the #undef/#define dance
> needed to avoid breaking the MS-Windows build.

Will do, sorry about that.





Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 1 Feb 2023 13:40:07 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Feb 01 08:40:07 2023
Received: from localhost ([127.0.0.1]:56866 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pNDLK-0006UD-TZ
	for submit <at> debbugs.gnu.org; Wed, 01 Feb 2023 08:40:07 -0500
Received: from eggs.gnu.org ([209.51.188.92]:46670)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1pNDLF-0006TY-KN
 for 60953 <at> debbugs.gnu.org; Wed, 01 Feb 2023 08:40:05 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pNDLA-0007QJ-8k; Wed, 01 Feb 2023 08:39:56 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=nMEWfD7BlffeQdd58d00gQN5sPYsooACvOBlTDJqV94=; b=LXr/+oYqMj7c
 9TImnSJw3/CSvZJWDDyeHyheHtYOTcL8DXit71rwMn8jbN7pkM/pLUMf0fQV6lBwmHK266y5RFplV
 mjyfR0G644v+Xs6jzxPCA5qL3JfJDzhEoMo2KH4UGLYJeZ2OTCSPhqg+ePiRtXi3B6QU9bzo1mfke
 R8jYmGnNpvTGtOArreggCqGlIUQ4RWk758lz3wZYIhKK6ADb1Z+vjkpVcJ7zNv14FyMZeZsz2cfQ+
 /NLLdvshg91nVlNJuc406WOYT3KkuHhpB5oTwQ1bUN7lrj/ZDBY0/fIY3W/7H8iu8T91su0XiqnGn
 tgUN+25AvnYFckQ64go4bw==;
Received: from [87.69.77.57] (helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pNDL9-0002oz-G8; Wed, 01 Feb 2023 08:39:56 -0500
Date: Wed, 01 Feb 2023 15:39:53 +0200
Message-Id: <83tu05zeqe.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <30b047a9-dfe3-948a-a123-2e15221c239d@HIDDEN> (message from
 Dmitry Gutov on Wed, 1 Feb 2023 04:39:29 +0200)
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
 <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN>
 <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN>
 <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
 <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN>
 <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN>
 <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN>
 <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN>
 <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN>
 <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> <83sffr2xq2.fsf@HIDDEN>
 <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN>
 <30b047a9-dfe3-948a-a123-2e15221c239d@HIDDEN>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> Date: Wed, 1 Feb 2023 04:39:29 +0200
> From: Dmitry Gutov <dgutov@HIDDEN>
> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
> 
> So here. I've installed the first patch, which didn't raise up too many 
> concerns previously, and here's the new iteration on the second patch. 

Thanks, but please in the future when you make changes which call
functions from external libraries that were not called previously, be
sure to do the DEF_DLL_FN/LOAD_DLL_FN and the #undef/#define dance
needed to avoid breaking the MS-Windows build.

> Please see the attachment.
> 
> To note the numbers: the first patch does quite well to improve the 
> performance of modes which use :match in queries which match a lot of 
> nodes (e.g. rust-ts-mode or ruby-ts-mode). There boost in those was on 
> the order of 25%.
> 
> The second one is longer, and the boost (on top of the first one) is 
> around 5-6%, stable. Not as impressive, but at least it brings :match's 
> performance a little above :pred's in my example.

Fine by me, if Yuan also approves.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 1 Feb 2023 13:10:53 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Feb 01 08:10:53 2023
Received: from localhost ([127.0.0.1]:56823 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pNCt2-0005jX-RA
	for submit <at> debbugs.gnu.org; Wed, 01 Feb 2023 08:10:53 -0500
Received: from eggs.gnu.org ([209.51.188.92]:48980)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1pNCsx-0005jC-N0
 for 60953 <at> debbugs.gnu.org; Wed, 01 Feb 2023 08:10:50 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pNCsr-0001V8-3k; Wed, 01 Feb 2023 08:10:41 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=PpOa9Ez+uUuDLqDUYK2sOaDYnxAwdtCrC1nsTlmlLQU=; b=KgeRr6vcCl7b
 LsCOo7UqaLn0oMT2vy501lEENa8njVzEgsJTAH24DAB8TAAdAst5V2mQz4lHPBb8flb94n6Cn1w/x
 50j3DappiaKQ4+YWSZtawqdZxekMmxokvdPBuukqq2W4S/i4BxRD+8nQO9qMkUlhNeqAvPLBpVYi7
 4wt8DcOnG/EZhvqVXaHE+nnzX1CvuUZ3juJRUTZLF2qlUQZqlk0OmqUCr4zojYD2h1ESorU2Q3Jd0
 ESM7LlaO5yPC0Q+cxecW754X8YDGL/vgRmsOUA+bOX55/3iFyvTjFCaIU7ugrfjU6+uEUr9c9jY7B
 NN4hW8EY2EnTflvbuPWJdA==;
Received: from [87.69.77.57] (helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pNCsq-00042c-8T; Wed, 01 Feb 2023 08:10:40 -0500
Date: Wed, 01 Feb 2023 15:10:38 +0200
Message-Id: <83zg9xzg35.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN> (message from
 Dmitry Gutov on Tue, 31 Jan 2023 20:16:01 +0200)
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
 <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN>
 <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN>
 <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
 <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN>
 <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN>
 <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN>
 <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN>
 <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN>
 <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> <83sffr2xq2.fsf@HIDDEN>
 <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> Date: Tue, 31 Jan 2023 20:16:01 +0200
> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
> From: Dmitry Gutov <dgutov@HIDDEN>
> 
> > Can you describe what that function should do?  I don't think I have a
> > clear idea of that.
> 
> In Lisp that function could be implemented as
> 
>    (defun buffer-substring-match (regexp &optional start end
>                                   inhibit-modify)
>      (string-match regexp
>                    (buffer-substring (or start (point-min))
>                                      (or end (point-max)))
>                    inhibit-modify))
> 
> Meaning, it matches the regexp against the buffer substring, with the 
> string-start and string-end anchors working.
> 
> But it would be implemented in C, meaning we could avoid the extra 
> consing and funcall overhead.

Now I'm confused, because I thought the C functions we were
considering all fit the above description.

> Anyway, it seems like it might be too late as an addition to Emacs 29. 
> And we can implement the match predicate using narrowing for this 
> release, to be updated later.

Right, this is not a catastrophe, IMO.  The work on TS-based modes is
just beginning.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 1 Feb 2023 02:39:39 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Jan 31 21:39:39 2023
Received: from localhost ([127.0.0.1]:55456 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pN32B-0008Bx-DB
	for submit <at> debbugs.gnu.org; Tue, 31 Jan 2023 21:39:39 -0500
Received: from mail-ej1-f46.google.com ([209.85.218.46]:38894)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <raaahh@HIDDEN>) id 1pN329-0008Bj-Hp
 for 60953 <at> debbugs.gnu.org; Tue, 31 Jan 2023 21:39:38 -0500
Received: by mail-ej1-f46.google.com with SMTP id gr7so22721701ejb.5
 for <60953 <at> debbugs.gnu.org>; Tue, 31 Jan 2023 18:39:37 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=in-reply-to:references:cc:to:from:content-language:subject
 :user-agent:mime-version:date:message-id:sender:from:to:cc:subject
 :date:message-id:reply-to;
 bh=05k4NAj9fq6J6DLfYpL6tR6z50ztcxv4xCMUbl6SESk=;
 b=OmfosSW3sYQGPP8Gauw9WSd2Yns9kWiG4tgP2237kqCjWmUq83tciY2BN5tzCQYTtU
 GnXM+si6yLV1TI/MIsBj9V+HBBXHYjoOi3M9UyXIF0m2Si4S6Q/sv/jwcvULoXXUMP3/
 55Fju1I5N9Dtow6A90ONdE79Yk0KeeZbpmWtt0O8mlvC1YB/FLbNwhE5D7UZ54DCJaP5
 2yaWT4wWNGJBVs7lnmXuqfOLg5UPWHK3Olo+6HX5xIRXBU3WPeqmanI6nNxWz+jE2/Rg
 txOr0RxQxjqClog4POLkBDsRo0CMQrmvyqb+JRJ9otP3WzuTvSWeruLucrje5/SXTt46
 EB8g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=in-reply-to:references:cc:to:from:content-language:subject
 :user-agent:mime-version:date:message-id:sender:x-gm-message-state
 :from:to:cc:subject:date:message-id:reply-to;
 bh=05k4NAj9fq6J6DLfYpL6tR6z50ztcxv4xCMUbl6SESk=;
 b=0JJxTzmOQAph/RID5TBBzCXh6juPfEgssxYWjdeTJeM61otq73eeJveF50RHXVztJl
 AF8acGthwfh6eLyCKTbGz/MM7pDMzhPz4X4PSkRtqM/JLas0KsrovGfmZQHHWnj0QQ7F
 xzhn8WZyQ7Y7q7kNpeAoq72LiWXnjLLLc7Dugj7ECndQ/5aC0RXWJFXkE7LgbWbjWczw
 lK693j08P3TkpIDcrzsTdQ7g/v1y3DoN+cyiM29mNi0GvDxySUMPngROy4ydHF1rjBg4
 aqu2PJIiV+/FxnAAMEQ2yKOL2BNck3jtCLrE2RLN7cGqCqDL2hjMlk8aL9uj4BokSh21
 Vs3g==
X-Gm-Message-State: AO0yUKXPB+Wr/VA5hOB6eBE5fj1Nk5r0+SdYmVHL1y6i4AEHXGPxwE/s
 voNE91BwGw3l8awt4Iadi5o=
X-Google-Smtp-Source: AK7set/RZL5GoKsc0fJjo1ReGdEflhQaMu+fb0pjbgvhTDmUw97h0j6gEi1Cyp+bGjnNFsdq88IvWQ==
X-Received: by 2002:a17:907:3e82:b0:878:5bce:291a with SMTP id
 hs2-20020a1709073e8200b008785bce291amr777047ejc.36.1675219171529; 
 Tue, 31 Jan 2023 18:39:31 -0800 (PST)
Received: from [192.168.0.2] ([46.251.119.176])
 by smtp.googlemail.com with ESMTPSA id
 u20-20020a50a414000000b004a08c52a2f0sm9242161edb.76.2023.01.31.18.39.30
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Tue, 31 Jan 2023 18:39:30 -0800 (PST)
Content-Type: multipart/mixed; boundary="------------Lz0Fo1WiaymPHEGhkJvrYbnr"
Message-ID: <30b047a9-dfe3-948a-a123-2e15221c239d@HIDDEN>
Date: Wed, 1 Feb 2023 04:39:29 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
Content-Language: en-US
From: Dmitry Gutov <dgutov@HIDDEN>
To: Eli Zaretskii <eliz@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
 <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN>
 <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN>
 <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
 <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN>
 <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN>
 <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN>
 <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN>
 <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN>
 <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> <83sffr2xq2.fsf@HIDDEN>
 <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN>
In-Reply-To: <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN>
X-Spam-Score: -0.9 (/)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.9 (-)

This is a multi-part message in MIME format.
--------------Lz0Fo1WiaymPHEGhkJvrYbnr
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

On 31/01/2023 20:16, Dmitry Gutov wrote:
> Anyway, it seems like it might be too late as an addition to Emacs 29. 
> And we can implement the match predicate using narrowing for this 
> release, to be updated later.

So here. I've installed the first patch, which didn't raise up too many 
concerns previously, and here's the new iteration on the second patch. 
Please see the attachment.

To note the numbers: the first patch does quite well to improve the 
performance of modes which use :match in queries which match a lot of 
nodes (e.g. rust-ts-mode or ruby-ts-mode). There boost in those was on 
the order of 25%.

The second one is longer, and the boost (on top of the first one) is 
around 5-6%, stable. Not as impressive, but at least it brings :match's 
performance a little above :pred's in my example.
--------------Lz0Fo1WiaymPHEGhkJvrYbnr
Content-Type: text/x-patch; charset=UTF-8; name="treesit_predicate_match.diff"
Content-Disposition: attachment; filename="treesit_predicate_match.diff"
Content-Transfer-Encoding: base64

ZGlmZiAtLWdpdCBhL3NyYy90cmVlc2l0LmMgYi9zcmMvdHJlZXNpdC5jCmluZGV4IGIxNjM2
ODU0MTlmLi41MTAxNzBjYTY0MCAxMDA2NDQKLS0tIGEvc3JjL3RyZWVzaXQuYworKysgYi9z
cmMvdHJlZXNpdC5jCkBAIC0yNDY2LDEwICsyNDY2LDQxIEBAIHRyZWVzaXRfcHJlZGljYXRl
X21hdGNoIChMaXNwX09iamVjdCBhcmdzLCBzdHJ1Y3QgY2FwdHVyZV9yYW5nZSBjYXB0dXJl
cykKIAkgICAgICBidWlsZF9zdHJpbmcgKCJUaGUgc2Vjb25kIGFyZ3VtZW50IHRvIGBtYXRj
aCcgc2hvdWxkICIKIAkJICAgICAgICAgICAgImJlIGEgY2FwdHVyZSBuYW1lLCBub3QgYSBz
dHJpbmciKSk7CiAKLSAgTGlzcF9PYmplY3QgdGV4dCA9IHRyZWVzaXRfcHJlZGljYXRlX2Nh
cHR1cmVfbmFtZV90b190ZXh0IChjYXB0dXJlX25hbWUsCisgIExpc3BfT2JqZWN0IG5vZGUg
PSB0cmVlc2l0X3ByZWRpY2F0ZV9jYXB0dXJlX25hbWVfdG9fbm9kZSAoY2FwdHVyZV9uYW1l
LAogCQkJCQkJCSAgICAgY2FwdHVyZXMpOwogCi0gIGlmIChmYXN0X3N0cmluZ19tYXRjaCAo
cmVnZXhwLCB0ZXh0KSA+PSAwKQorICBzdHJ1Y3QgYnVmZmVyICpvbGRfYnVmZmVyID0gY3Vy
cmVudF9idWZmZXI7CisgIHN0cnVjdCBidWZmZXIgKmJ1ZmZlciA9IFhCVUZGRVIgKFhUU19Q
QVJTRVIgKFhUU19OT0RFIChub2RlKS0+cGFyc2VyKS0+YnVmZmVyKTsKKyAgc2V0X2J1ZmZl
cl9pbnRlcm5hbCAoYnVmZmVyKTsKKworICBUU05vZGUgdHJlZXNpdF9ub2RlID0gWFRTX05P
REUgKG5vZGUpLT5ub2RlOworICBwdHJkaWZmX3QgdmlzaWJsZV9iZWcgPSBYVFNfUEFSU0VS
IChYVFNfTk9ERSAobm9kZSktPnBhcnNlciktPnZpc2libGVfYmVnOworICB1aW50MzJfdCBz
dGFydF9ieXRlX29mZnNldCA9IHRzX25vZGVfc3RhcnRfYnl0ZSAodHJlZXNpdF9ub2RlKTsK
KyAgdWludDMyX3QgZW5kX2J5dGVfb2Zmc2V0ID0gdHNfbm9kZV9lbmRfYnl0ZSAodHJlZXNp
dF9ub2RlKTsKKyAgcHRyZGlmZl90IHN0YXJ0X2J5dGUgPSB2aXNpYmxlX2JlZyArIHN0YXJ0
X2J5dGVfb2Zmc2V0OworICBwdHJkaWZmX3QgZW5kX2J5dGUgPSB2aXNpYmxlX2JlZyArIGVu
ZF9ieXRlX29mZnNldDsKKyAgcHRyZGlmZl90IHN0YXJ0X3BvcyA9IGJ1Zl9ieXRlcG9zX3Rv
X2NoYXJwb3MgKGJ1ZmZlciwgc3RhcnRfYnl0ZSk7CisgIHB0cmRpZmZfdCBlbmRfcG9zID0g
YnVmX2J5dGVwb3NfdG9fY2hhcnBvcyAoYnVmZmVyLCBlbmRfYnl0ZSk7CisgIHB0cmRpZmZf
dCBvbGRfYmVndiA9IEJFR1Y7CisgIHB0cmRpZmZfdCBvbGRfYmVndl9ieXRlID0gQkVHVl9C
WVRFOworICBwdHJkaWZmX3Qgb2xkX3p2ID0gWlY7CisgIHB0cmRpZmZfdCBvbGRfenZfYnl0
ZSA9IFpWX0JZVEU7CisKKyAgQkVHViA9IHN0YXJ0X3BvczsKKyAgQkVHVl9CWVRFID0gc3Rh
cnRfYnl0ZTsKKyAgWlYgPSBlbmRfcG9zOworICBaVl9CWVRFID0gZW5kX2J5dGU7CisKKyAg
cHRyZGlmZl90IHZhbCA9IGZhc3RfbG9va2luZ19hdCAocmVnZXhwLCBzdGFydF9wb3MsIHN0
YXJ0X2J5dGUsIGVuZF9wb3MsIGVuZF9ieXRlLCBRbmlsKTsKKworICBCRUdWID0gb2xkX2Jl
Z3Y7CisgIEJFR1ZfQllURSA9IG9sZF9iZWd2X2J5dGU7CisgIFpWID0gb2xkX3p2OworICBa
Vl9CWVRFID0gb2xkX3p2X2J5dGU7CisKKyAgc2V0X2J1ZmZlcl9pbnRlcm5hbCAob2xkX2J1
ZmZlcik7CisKKyAgaWYgKHZhbCA+IDApCiAgICAgcmV0dXJuIHRydWU7CiAgIGVsc2UKICAg
ICByZXR1cm4gZmFsc2U7Cg==

--------------Lz0Fo1WiaymPHEGhkJvrYbnr--




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 31 Jan 2023 18:16:15 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Jan 31 13:16:15 2023
Received: from localhost ([127.0.0.1]:54981 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pMvB0-0007Jy-7L
	for submit <at> debbugs.gnu.org; Tue, 31 Jan 2023 13:16:15 -0500
Received: from mail-wm1-f48.google.com ([209.85.128.48]:52101)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <raaahh@HIDDEN>) id 1pMvAv-0007Jg-Qs
 for 60953 <at> debbugs.gnu.org; Tue, 31 Jan 2023 13:16:13 -0500
Received: by mail-wm1-f48.google.com with SMTP id o36so4957840wms.1
 for <60953 <at> debbugs.gnu.org>; Tue, 31 Jan 2023 10:16:09 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:from:to:cc:subject:date:message-id:reply-to;
 bh=kYS85V8bkH8LrnH73VDcQemOm0IqW61ijT8idpnzcvI=;
 b=g5236Emlx5AUwpInH0QAXZXfkRkQInxzO+qfcXd9tqObvcMNNc9TbbV2TcbaVFk3yN
 6YGI1vIghgwVk/2QRlPRSA9Nn4f7hfeTL7SsnNYog0MC12ZY9bYI1mfV3xCB/9RupzNX
 80w8q3qQ/9kuYv3PaZqWYxlpyLlAtPcw7fkLjviZiqMzFdBE/s4fn5sQzTVXOW0Zzhsv
 aJN1H50DtNTSK91H52Euwdy8J9minsRykacGwRfLxsMAfhRJLkJd2mD+Y15mCaUpDh4H
 JeW9/fw0gaHYwN5OyYDH+8/dvQ3vd/8SM+d5uYcTMtkIbIDknFexDWNRnvkrWh+VvUMO
 Sirg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=kYS85V8bkH8LrnH73VDcQemOm0IqW61ijT8idpnzcvI=;
 b=go3pjEyfyeTc9QiO/ZfJU9DRxZdX2vr+eMpfapD7nwQ2GU1fDkAceiB0RaRDchrfIl
 4gv5CqZCSF27rN+5pW3Cvq7rw/qHe4ufHqIuxphf3MwNayyAmFRLt6zg1jvTnTaOC5B5
 82Z9n/zu54qg8guNsamr/uPqm0cYZIBFyms1jsXrGzdQujA7wg/pkiZaKy2fhKrHj4NB
 SZbZBrrFeSFxjqZelF3cNRne+cWegdUQ7wm4A3+dgdXCe04sw3GwAktLN/RGN8SjnH5B
 4iUfgucqWkAnSo8Dgxtf5N415oCO4EJVMmzyM/dy282tahTnlQqG7/7n6DlDjOSvk3PT
 Glzw==
X-Gm-Message-State: AO0yUKXJlKiraEnOyK0GzZFP3oWoEgvWoq+8YDoRz1DQWjWPZvEfA+bf
 dgfSQuWCXgt8suepdzGP3IA=
X-Google-Smtp-Source: AK7set9qIBXkzJXWNpuh2XDadif4oDvZLiW2GgQPesP12VHxtQ1XauTO8GcOtJNh/HUnp/p460zlfA==
X-Received: by 2002:a1c:4c01:0:b0:3db:2c8:d7e1 with SMTP id
 z1-20020a1c4c01000000b003db02c8d7e1mr6113wmf.20.1675188963986; 
 Tue, 31 Jan 2023 10:16:03 -0800 (PST)
Received: from [192.168.0.2] ([46.251.119.176])
 by smtp.googlemail.com with ESMTPSA id
 bh6-20020a05600c3d0600b003daffc2ecdesm20166157wmb.13.2023.01.31.10.16.02
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Tue, 31 Jan 2023 10:16:03 -0800 (PST)
Message-ID: <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN>
Date: Tue, 31 Jan 2023 20:16:01 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
Content-Language: en-US
To: Eli Zaretskii <eliz@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
 <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN>
 <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN>
 <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
 <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN>
 <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN>
 <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN>
 <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN>
 <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN>
 <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> <83sffr2xq2.fsf@HIDDEN>
From: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <83sffr2xq2.fsf@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.1 (/)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.9 (-)

On 31/01/2023 05:23, Eli Zaretskii wrote:
>> Date: Mon, 30 Jan 2023 21:58:22 +0200
>> Cc:casouri@HIDDEN,60953 <at> debbugs.gnu.org
>> From: Dmitry Gutov<dgutov@HIDDEN>
>>
>> On 30/01/2023 21:05, Eli Zaretskii wrote:
>>>> Date: Mon, 30 Jan 2023 21:01:02 +0200
>>>> Cc:casouri@HIDDEN,60953 <at> debbugs.gnu.org
>>>> From: Dmitry Gutov<dgutov@HIDDEN>
>>>>
>>>> But that doesn't answer the question "Could it?".
>>> I don't understand what you are asking.  "Could" in what sense?
>> Like, would it make sense to try to modify it that way, or extract a
>> function that would do that, without writing it from scratch.
>>
>> Or create a new function which would reuse some common code.
>>
>> We would call the new function something like match_buffer_substring.
>> Optionally, also expose it to Lisp.
> Can you describe what that function should do?  I don't think I have a
> clear idea of that.

In Lisp that function could be implemented as

   (defun buffer-substring-match (regexp &optional start end
                                  inhibit-modify)
     (string-match regexp
                   (buffer-substring (or start (point-min))
                                     (or end (point-max)))
                   inhibit-modify))

Meaning, it matches the regexp against the buffer substring, with the 
string-start and string-end anchors working.

But it would be implemented in C, meaning we could avoid the extra 
consing and funcall overhead.

It might also be handy to use from Lisp in other cases, where we don't 
need the anchors, but it's easier to call (buffer-substring-match "foo") 
rather than

   (save-excursion
     (goto-char (point-min))
     (re-search-forward "foo" nil t)
     (point))

Probably a little faster, too.

Anyway, it seems like it might be too late as an addition to Emacs 29. 
And we can implement the match predicate using narrowing for this 
release, to be updated later.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 31 Jan 2023 03:24:04 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 22:24:04 2023
Received: from localhost ([127.0.0.1]:51067 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pMhFb-0008P1-L5
	for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 22:24:03 -0500
Received: from eggs.gnu.org ([209.51.188.92]:37268)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1pMhFa-0008OU-H5
 for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 22:24:03 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pMhFU-0000mU-3T; Mon, 30 Jan 2023 22:23:56 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=F1NsgQf2/PXar52dCSQ94rJTGuwq4WUgLRn+RX/OVOc=; b=fDYYW5q5vsYm
 wI5Z7mxjAbstXzwcWj0ajkhMq9gH/D/goao6WfnzsPSsR4vLR7NUyPZ5yFVhxAeDJigw39dkZ+2rq
 3ZSF3FUGspeW5he1E4H1aSmo5k4XyawgUYDVUaROWkVbAl4Ik1jVfoh9evlwABKXsfKEYsXmglr6L
 UMxwn83P17lyZ1QnylKkzDRApAgt1PsmHI5PjX9WAYIPf96/h2C8hCCfn3S8zNtoYfuHk+Vs6dL33
 HIufJz1O41XtXPhtGbgNrKvUjacD0fFVHEvp0GZ2j7JHTpLDomdfTw/+lbqGeCzkyCPpjOflc5mgU
 4nBFAV7fF5BTxC/Oumhp9w==;
Received: from [87.69.77.57] (helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pMhFT-0000b9-Jj; Mon, 30 Jan 2023 22:23:55 -0500
Date: Tue, 31 Jan 2023 05:23:49 +0200
Message-Id: <83sffr2xq2.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> (message from
 Dmitry Gutov on Mon, 30 Jan 2023 21:58:22 +0200)
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
 <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN>
 <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN>
 <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
 <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN>
 <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN>
 <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN>
 <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN>
 <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN>
 <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> Date: Mon, 30 Jan 2023 21:58:22 +0200
> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
> From: Dmitry Gutov <dgutov@HIDDEN>
> 
> On 30/01/2023 21:05, Eli Zaretskii wrote:
> >> Date: Mon, 30 Jan 2023 21:01:02 +0200
> >> Cc:casouri@HIDDEN,60953 <at> debbugs.gnu.org
> >> From: Dmitry Gutov<dgutov@HIDDEN>
> >>
> >> But that doesn't answer the question "Could it?".
> > I don't understand what you are asking.  "Could" in what sense?
> 
> Like, would it make sense to try to modify it that way, or extract a 
> function that would do that, without writing it from scratch.
> 
> Or create a new function which would reuse some common code.
> 
> We would call the new function something like match_buffer_substring. 
> Optionally, also expose it to Lisp.

Can you describe what that function should do?  I don't think I have a
clear idea of that.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 31 Jan 2023 00:45:04 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 19:45:03 2023
Received: from localhost ([127.0.0.1]:50938 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pMelj-0004Gh-BR
	for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 19:45:03 -0500
Received: from mail-wm1-f45.google.com ([209.85.128.45]:34671)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <raaahh@HIDDEN>) id 1pMelh-0004Fv-GL
 for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 19:45:02 -0500
Received: by mail-wm1-f45.google.com with SMTP id
 q10-20020a1cf30a000000b003db0edfdb74so36669wmq.1
 for <60953 <at> debbugs.gnu.org>; Mon, 30 Jan 2023 16:45:01 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:from:to:cc:subject:date:message-id:reply-to;
 bh=Uk/JF6SFA8vlxSzOJ+gI01Cx1Low37nl9eKiJxXDVUc=;
 b=LaM/2P8QitDBO5amsRAEO+krnaC2sl3YppYckdtsU+kV3fTyrTRffHyJUXohOnXHpM
 e/E/TiCZsJBNbdKvlvdyIFnNTE8BvO4Dx6smAzw9eSQT+8hC7nQxCl8/GLQoKgnggFiT
 Z0JR4MKhM7FGMREaBd0L/f415eoSmGdNQt3CdDAmBdG1HiNGMdoXile+PzUfCeemJWGS
 hx+H6pQxZtBWHiDdce6U0OS+GFeNQGxWLWYlnATKSVcfkFgHuxgAfohUehGKYpy9UyDy
 6F+g+OHXwMV7IficOUNCkVNGFkopbFNvQLf++9raJDLahmQlJKwcaWcM9WohkJxbu+VR
 DK0g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=Uk/JF6SFA8vlxSzOJ+gI01Cx1Low37nl9eKiJxXDVUc=;
 b=5jOojnc1MWJnw/ErhNBMZ+uWzJ0cBLKLUR8+7VbosYnpCdryyFfWOxm8Pgc50F0vxs
 W2SlQQzopZaD2ZUshec5AlwG64JJAIJdzF9g/eo2DRxzJtUG5Q0+LHqjhR8JN3m1KjWS
 6Mc2gI4HoEx2P1JL7ZOMVa0UskB0RbAjchgky0EYZu/uyzpWttDfGXhe/QQv/eN/oxTq
 ecB9Q6ZEqfFW4k/gaO9eG50/lNB1GrlCabX44f/eI++gGQYgtpBoFSA9K89xSWsALjX0
 EhBLs9YlTqd1I2YWbw8j3I+TYWb71SN1h884a+h1ECEOy4QKxM27XmlnBpQhsHOU8xCX
 lnrQ==
X-Gm-Message-State: AO0yUKUwDqm/0ZgA4YRfHLD4Nlok8GMEQ+3VUmxdES00an5lz9gI3VWv
 UMIlzy6OSi3qyG+tM3A541w=
X-Google-Smtp-Source: AK7set+gPewOxAq38/g5RAYfmOe9TD1/j1FSlaTw9xqMynj8u2a5YHW5dkCGElV5sh2U7Ci7yiJfIQ==
X-Received: by 2002:a05:600c:1d97:b0:3dc:5009:bc74 with SMTP id
 p23-20020a05600c1d9700b003dc5009bc74mr8611250wms.7.1675125895454; 
 Mon, 30 Jan 2023 16:44:55 -0800 (PST)
Received: from [192.168.0.2] ([46.251.119.176])
 by smtp.googlemail.com with ESMTPSA id
 i27-20020a05600c4b1b00b003dc54d9aeeasm5816107wmp.36.2023.01.30.16.44.54
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Mon, 30 Jan 2023 16:44:54 -0800 (PST)
Message-ID: <a8dc0f23-92cc-69d7-c308-2ea970119d8e@HIDDEN>
Date: Tue, 31 Jan 2023 02:44:53 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
Content-Language: en-US
To: Yuan Fu <casouri@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
 <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN>
 <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN>
 <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
 <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN>
 <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN>
 <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN>
 <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN>
 <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN>
 <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN>
 <A9D3AD21-2057-4964-801C-B8966326F17F@HIDDEN>
From: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <A9D3AD21-2057-4964-801C-B8966326F17F@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Score: 0.1 (/)
X-Debbugs-Envelope-To: 60953
Cc: Eli Zaretskii <eliz@HIDDEN>, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.9 (-)

On 31/01/2023 01:57, Yuan Fu wrote:
> 
> 
>> On Jan 30, 2023, at 11:58 AM, Dmitry Gutov <dgutov@HIDDEN> wrote:
>>
>> On 30/01/2023 21:05, Eli Zaretskii wrote:
>>>> Date: Mon, 30 Jan 2023 21:01:02 +0200
>>>> Cc:casouri@HIDDEN,60953 <at> debbugs.gnu.org
>>>> From: Dmitry Gutov<dgutov@HIDDEN>
>>>>
>>>> But that doesn't answer the question "Could it?".
>>> I don't understand what you are asking.  "Could" in what sense?
>>
>> Like, would it make sense to try to modify it that way, or extract a function that would do that, without writing it from scratch.
>>
>> Or create a new function which would reuse some common code.
>>
>> We would call the new function something like match_buffer_substring. Optionally, also expose it to Lisp.
> 
> Another option is to change user/programmer’s expectation of the anchor: we could say that the regexp must match the entirety of the node text. IOW, \\` \\' are implied.

Huh, I guess that's an option too.

A couple reasons not to do that would be:

- Potential breakage in all existing TS modes, a week (?) before we're 
going to release Emacs 29 pretest. Maybe that's okay, I can't say. But 
the breakage from that kind of change could be subtle.

- Compatibility reasons? People writing TS modes for Emacs might be 
coming from other editors/TS integrations.

While TreeSitter docs say the predicates are not handled by it, it does 
show this example:

   (#match? @constant "^[A-Z][A-Z_]+")

The use of '^' anchor seems to imply that the regexp doesn't have to 
otherwise match the whole node text (OTOH it's not clear why the example 
doesn't just say "^[A-Z]" or "^[A-Z][A-Z_]").

The doc also references the Rust crate and WebAssembly binding which 
support #match?.

IIUC Rust uses "re.is_match", which is documented to use "implicit .*? 
at the beginning and end". Which matches our current semantics. 
WebAssembly uses "regex.test", same effect.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 23:57:30 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 18:57:30 2023
Received: from localhost ([127.0.0.1]:50901 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pMe1i-00035K-8k
	for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 18:57:30 -0500
Received: from mail-pl1-f172.google.com ([209.85.214.172]:37760)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <casouri@HIDDEN>) id 1pMe1e-000355-Lr
 for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 18:57:28 -0500
Received: by mail-pl1-f172.google.com with SMTP id m2so8377904plg.4
 for <60953 <at> debbugs.gnu.org>; Mon, 30 Jan 2023 15:57:26 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:from:to:cc:subject:date
 :message-id:reply-to;
 bh=yjQQFaQfpyr9lplHnNfVgiLjPVrqTiVDWVcaazpeOMk=;
 b=GYVTlTp/FYMXQEuB1yF6TJvmDa6/LJHWkGnmhWqqE8yEG2B+HeABW8R3wBRrTwix7J
 ojvnkq8YdZWhcHQked6pBBgL1Dm4gpdbIkCG7/hzk6Dg8iFowdtmE68JMSnnrXn/D3mc
 QIHMeKg9J/eubUzUuv7jUoShptA2byymcwdrKh3aREJQajbBq8Bco1R4pmq6XUk2I9Y/
 4AVtuqWKA9bfi6ilPAy74XO9m/eRVs72qYXuJCRC+4QxZzCdnp/ISxrjQ9G0F/suaz5C
 7yhE5A2UYLEhOBrsOWCmnKIHlqv32Cqz4l6TDHK/iCnzXpeCq4aohoCRCdzevMp3HUAV
 1HLg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=yjQQFaQfpyr9lplHnNfVgiLjPVrqTiVDWVcaazpeOMk=;
 b=r1Dki1D8LtRRTnv9Y9G52H6F99jd/I+4SWGfc0WTR2J5Fe1uqv7nIIPG0X6aP8oZdn
 gJOm4B5U82yBs2hzlONznJLGOdf8G4/r74ObawWFkgQiP+Avg5d7Ku266UAicZN/1oa/
 QZMDF5j5ZJ+BauiiFXKgmV8jW/VYq+C9jqw6wTE5y+u1EJ1DWDoSO3HYH5XMCI3PnJp1
 mUNOPBmumIofJzZrP+qPY4SL0aJOXO6TJJuAqN8jEYM/erTCKZzKScgv/csyBnpZ15dU
 RhI0VaEEKRePog0T7yvVPBTYA9zCUIDEUHWym/M7CYA9qbNiV+SB7I69V0yepbJIPytV
 FK6Q==
X-Gm-Message-State: AO0yUKWx6ggyX/LUTBhoY8JlT/jGA9cH0egfnPgnGMPH6n+BMImlyE2i
 McKNgnT/ij28aXUBHMO2vIg=
X-Google-Smtp-Source: AK7set9D3O6ZZ68wllvMhLAjwpiusqiNaOAJhoSyR8Ecq09xud7y/oAKEvJddoknlWB0hdbvcOWyVg==
X-Received: by 2002:a17:902:cf51:b0:196:5540:3972 with SMTP id
 e17-20020a170902cf5100b0019655403972mr8187308plg.3.1675123040587; 
 Mon, 30 Jan 2023 15:57:20 -0800 (PST)
Received: from smtpclient.apple (cpe-172-117-161-177.socal.res.rr.com.
 [172.117.161.177]) by smtp.gmail.com with ESMTPSA id
 f6-20020a17090274c600b001885d15e3c1sm2275808plt.26.2023.01.30.15.57.19
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 30 Jan 2023 15:57:20 -0800 (PST)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.300.101.1.3\))
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
From: Yuan Fu <casouri@HIDDEN>
In-Reply-To: <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN>
Date: Mon, 30 Jan 2023 15:57:05 -0800
Content-Transfer-Encoding: quoted-printable
Message-Id: <A9D3AD21-2057-4964-801C-B8966326F17F@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
 <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN>
 <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN>
 <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
 <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN>
 <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN>
 <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN>
 <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN>
 <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN>
 <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN>
To: Dmitry Gutov <dgutov@HIDDEN>
X-Mailer: Apple Mail (2.3731.300.101.1.3)
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 60953
Cc: Eli Zaretskii <eliz@HIDDEN>, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)



> On Jan 30, 2023, at 11:58 AM, Dmitry Gutov <dgutov@HIDDEN> wrote:
>=20
> On 30/01/2023 21:05, Eli Zaretskii wrote:
>>> Date: Mon, 30 Jan 2023 21:01:02 +0200
>>> Cc:casouri@HIDDEN,60953 <at> debbugs.gnu.org
>>> From: Dmitry Gutov<dgutov@HIDDEN>
>>>=20
>>> But that doesn't answer the question "Could it?".
>> I don't understand what you are asking.  "Could" in what sense?
>=20
> Like, would it make sense to try to modify it that way, or extract a =
function that would do that, without writing it from scratch.
>=20
> Or create a new function which would reuse some common code.
>=20
> We would call the new function something like match_buffer_substring. =
Optionally, also expose it to Lisp.

Another option is to change user/programmer=E2=80=99s expectation of the =
anchor: we could say that the regexp must match the entirety of the node =
text. IOW, \\` \\' are implied.=20

Yuan=




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 19:58:33 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 14:58:33 2023
Received: from localhost ([127.0.0.1]:50629 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pMaIT-00053P-0i
	for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 14:58:33 -0500
Received: from mail-wr1-f50.google.com ([209.85.221.50]:33643)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <raaahh@HIDDEN>) id 1pMaIQ-00053B-U0
 for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 14:58:31 -0500
Received: by mail-wr1-f50.google.com with SMTP id q5so12269772wrv.0
 for <60953 <at> debbugs.gnu.org>; Mon, 30 Jan 2023 11:58:30 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:from:to:cc:subject:date:message-id:reply-to;
 bh=zdyJEOaOarsf3xWTapG3ME3CQkijW8JG0Wz97+FHVlQ=;
 b=UApEza5MD/svMoITz5+7N7YRdf2vXkLKVuTC/sWT7AoA1Pap7A5/OFGrnCrHt81w/8
 8a6s1dlmjctdQkZTfcHmmH0mFz80/S4WudJZWyP2F/CK6c+joC07LQ9dSmrBPskwCQXZ
 pHk0t+US3Pxr4mqanrFSlwo0sQapBd0MTBpJmAdV4sR24yufCzzqCUb6PY0HLc5NwkY6
 3GsVNxhPR469oj9lXRDY9t5lsgtfotpjgGJ0lcJnTC6yGIYuH04U9y66h0QOGBuqGyhV
 e3QflqcAd82ovFH+DneCROCZfqbubhDrFNiuZzrgt2tu+ZacKYQv1HYdwKRNfDLr9j72
 GzAQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=zdyJEOaOarsf3xWTapG3ME3CQkijW8JG0Wz97+FHVlQ=;
 b=0JPC5Ni/Ho9E7VyqfCt/S/FwcEAMMcKU43aUVq4P/AbBsYux/kYeRPuyk9C12Yu+Fs
 W10OfgdlJqzhsoLG2BvPBvxQ+Jr9qoIEqBamE35JWEj+zkYfWoMm7Y9+8lc5JRrZiIp7
 IrUA367FlggAjBnj4vIPb9qt/ms2o82jphpyMJP/veGto5kKhwiuhXw+RPKv1YjhHYRU
 do89XwB1KHZ8LFiK2+z5F/1Nb0rx8Xsslb7inSEF3IqVqpy7C9fWOAcCE/K1lE6PWzeH
 ve9JOR/55E2Wyag+APwPYPJqI7k8lTM+2I2Suy+LslFgOelFOLsAJqwCoBR1Z6LvmzbL
 3T8w==
X-Gm-Message-State: AO0yUKWSf2nS7oTiq/D/bGTe+K0noIPmnolaW68reIQ1H2fVOPUwxm4q
 wiz+F9iC6O641nlM7sYEmCQ=
X-Google-Smtp-Source: AK7set9TFYuRac3a22AVe31Iw7tqy2RRBJG/hkjGLdaiy/Mcy8CJq8D+gJd5kPekcglj9nKhdq/zvg==
X-Received: by 2002:a5d:65d1:0:b0:2bf:cefa:fd99 with SMTP id
 e17-20020a5d65d1000000b002bfcefafd99mr13173906wrw.8.1675108705172; 
 Mon, 30 Jan 2023 11:58:25 -0800 (PST)
Received: from [192.168.0.2] ([46.251.119.176])
 by smtp.googlemail.com with ESMTPSA id
 f28-20020a5d58fc000000b002be5401ef5fsm13197350wrd.39.2023.01.30.11.58.23
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Mon, 30 Jan 2023 11:58:24 -0800 (PST)
Message-ID: <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN>
Date: Mon, 30 Jan 2023 21:58:22 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
Content-Language: en-US
To: Eli Zaretskii <eliz@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
 <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN>
 <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN>
 <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
 <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN>
 <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN>
 <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN>
 <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN>
 <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN>
From: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <83tu073ksp.fsf@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.1 (/)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.9 (-)

On 30/01/2023 21:05, Eli Zaretskii wrote:
>> Date: Mon, 30 Jan 2023 21:01:02 +0200
>> Cc:casouri@HIDDEN,60953 <at> debbugs.gnu.org
>> From: Dmitry Gutov<dgutov@HIDDEN>
>>
>> But that doesn't answer the question "Could it?".
> I don't understand what you are asking.  "Could" in what sense?

Like, would it make sense to try to modify it that way, or extract a 
function that would do that, without writing it from scratch.

Or create a new function which would reuse some common code.

We would call the new function something like match_buffer_substring. 
Optionally, also expose it to Lisp.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 19:05:54 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 14:05:54 2023
Received: from localhost ([127.0.0.1]:50601 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pMZTW-0003k4-28
	for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 14:05:54 -0500
Received: from eggs.gnu.org ([209.51.188.92]:55422)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1pMZTV-0003js-6b
 for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 14:05:53 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pMZTP-0002cb-MD; Mon, 30 Jan 2023 14:05:47 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=+NEbNL0+Y6BQoRuxRI3iqeYeZNzM7z2YtXUMcOB+Kjg=; b=gHs1GKA64lGL
 jh/ec8xIqriNvVa2FoVubTW46PGapdv9aQydRFvV6Gg/0F2fCJgnEOXzDSDak6zeITF5Ph+JasZ9B
 ZuA5ZyHSyaD0GedLyE9LcKAyvKG6lcoke430S8MBEqhERz4UGcgWiOijguWSxM6qQ0UzXHLaCO3Jb
 VH0Zpf8wUZVbPlyj+P6HyuGk1DPSAMVnJwMsI8TLHNpHkdi2kH6ZD4AD34ixc0EsHdaVfPmmOXTeo
 fvnGboffOGXmLIzp5wW1STlEE76h8Zq1oUF6ppH47Glxh3umnLCTyFEgCUR+zdIZRu/zLB4lakofa
 J3aVk650qzvWfDdV5XHUWg==;
Received: from [87.69.77.57] (helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pMZTB-0004TO-06; Mon, 30 Jan 2023 14:05:47 -0500
Date: Mon, 30 Jan 2023 21:05:26 +0200
Message-Id: <83tu073ksp.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> (message from
 Dmitry Gutov on Mon, 30 Jan 2023 21:01:02 +0200)
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
 <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN>
 <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN>
 <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
 <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN>
 <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN>
 <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN>
 <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN>
 <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> Date: Mon, 30 Jan 2023 21:01:02 +0200
> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
> From: Dmitry Gutov <dgutov@HIDDEN>
> 
> But that doesn't answer the question "Could it?".

I don't understand what you are asking.  "Could" in what sense?




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 19:01:11 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 14:01:11 2023
Received: from localhost ([127.0.0.1]:50589 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pMZOw-0003cw-SG
	for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 14:01:11 -0500
Received: from mail-ed1-f48.google.com ([209.85.208.48]:35592)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <raaahh@HIDDEN>) id 1pMZOv-0003ci-Gu
 for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 14:01:10 -0500
Received: by mail-ed1-f48.google.com with SMTP id q19so2030682edd.2
 for <60953 <at> debbugs.gnu.org>; Mon, 30 Jan 2023 11:01:09 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:from:to:cc:subject:date:message-id:reply-to;
 bh=L2W4tBXaiW27sCnLUMSNyyCRdSO1nrUxaud8Rz75M+M=;
 b=OtvJIaeaEOtkMI6Rrv+pmMiW5OpMnkJEtpL2L/U8grqO1ItmXEMo09B7ZXaEv6vqcc
 rSzF1N3bNfgFHKQXt3QYYF3NiKKdb9RQpEmv2yzlP0Tz0cgm82dyXiis84Q4bl5zgcaY
 LPgeleoaAZ3NE03GQR27QyUwn0ItTgQgBXFBKrcQlF51QfzTBXLq1axFzxjSkifmgdoi
 tMLLIPmkU3E2reSfQjDLbHiK1sl51EMLreBOInvI/sYm3vuNrIWl44lPGYXubsdNfgRv
 wvL9d84Y9kIrEVjP67ou0kdPSQrfv3QISGfTl9EGzjQEkA+aOulmcHaTC5QVU/hwNzMb
 jLFQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=L2W4tBXaiW27sCnLUMSNyyCRdSO1nrUxaud8Rz75M+M=;
 b=IHZpb1vRdGir6iXTBZ9bqNScc5FSIEVa+Ze+MyXqsIqf4Y1P+6kUkThTh9Z4XYCPRQ
 riWMG8jFDI472qR/YMUq8ZR6ESJ5gxVSejV4R0oJDMoJHmrrZQO+C1Evweh99V7myucA
 a9nc3WA1ii8MUUQdNBBK3zwfXKYz+sTS0gnAkH57bLBymLThGLLbh57mVd0rNHVZ47ag
 O8L6tVHQxbBBqYXgu+Z+TXU/1lVjmQa+RVgJ0Ff8mm8lYNWLe4edWMERdgh4luUm7dVi
 zWKiaVJUjBXtROmx0aAmiDEwlKl9tFjmeOoaBp+6MV0wt8AG3EAW8BQAant/Dxv8TnCP
 5qsQ==
X-Gm-Message-State: AO0yUKUP0ACHTBOfKXV3bNoz11cImS898dE4mFRBeJzsGGYs2Wv5f1Gg
 G35A78fUHhzssB2ozvR8SSU=
X-Google-Smtp-Source: AK7set98emk4JfztjdjZ1Ra+mdf2j4T9oDUu4Z7Vq3CZ4CBNJmfaKqcA3F5HvVDtPcZ5GVYOY53mCA==
X-Received: by 2002:a05:6402:4003:b0:4a2:2fa:ead4 with SMTP id
 d3-20020a056402400300b004a202faead4mr16462840eda.17.1675105263811; 
 Mon, 30 Jan 2023 11:01:03 -0800 (PST)
Received: from [192.168.0.2] ([46.251.119.176])
 by smtp.googlemail.com with ESMTPSA id
 a16-20020aa7d910000000b00463bc1ddc76sm7131949edr.28.2023.01.30.11.01.02
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Mon, 30 Jan 2023 11:01:03 -0800 (PST)
Message-ID: <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN>
Date: Mon, 30 Jan 2023 21:01:02 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
Content-Language: en-US
To: Eli Zaretskii <eliz@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
 <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN>
 <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN>
 <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
 <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN>
 <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN>
 <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN>
 <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN>
From: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <83wn533lut.fsf@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.1 (/)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.9 (-)

On 30/01/2023 20:42, Eli Zaretskii wrote:
>> Date: Mon, 30 Jan 2023 20:20:46 +0200
>> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
>> From: Dmitry Gutov <dgutov@HIDDEN>
>>
>> On 30/01/2023 19:49, Eli Zaretskii wrote:
>>>> Date: Mon, 30 Jan 2023 19:15:07 +0200
>>>> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
>>>> From: Dmitry Gutov <dgutov@HIDDEN>
>>>>
>>>>> fast_looking_at already does an anchored match, so I'm not sure I
>>>>> follow.  I don't even understand why you need th \` part, when the
>>>>> match will either always start from the first position or fail.
>>>>
>>>> The regexp might include the anchors, or it might not.
>>>>
>>>> It might also use a different anchor like ^ or $ or \b.
>>>
>>> OK, but it always goes only forward, so narrowing to the beginning
>>> shouldn't be necessary.  Right?
>>
>> Are you saying that fast_looking_at ("\\`", ...) will always succeed?
>>
>> And fast_looking_at ("^", ...), etc.
> 
> For example, for "^", if you hint that it must look back to make sure
> there's a newline there, then your narrowing will also prevent it from
> doing that, right?

fast_looking_at ("^", ...) succeeds inside a narrowing because it always 
succeeds at BOB. Even though there are no physical newlines before BOB.

>>>> One possible alternative, I suppose, would be to create a raw pointer to
>>>> a part of the buffer text and call re_search directly specifying the
>>>> known length of the node in bytes. If buffer text is one contiguous
>>>> region in memory, that is.
>>>
>>> It isn't, though: there's the gap.  Which is why doing this is not
>>> recommended; instead, use something like search_buffer_re, which
>>> already handles this complication for you.  (Except that
>>> search_buffer_re is a static function, so only code in search.c can
>>> use it.  So you'd need to make it non-static.)
>>
>> Interesting. Does search_buffer_re match the \` anchor at POS and \' at
>> LIM? IOW, does in treat the rest of the buffer as non-existing? Or could it?
> 
> That is the low-level subroutine called by re-search-forward, so you
> know the answers already, I think?  IOW, that function behaves exactly
> like re-search-forward in those situations.

So, I suppose not?

But that doesn't answer the question "Could it?".




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 18:42:50 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 13:42:50 2023
Received: from localhost ([127.0.0.1]:50547 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pMZ7B-00035I-Kp
	for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 13:42:49 -0500
Received: from eggs.gnu.org ([209.51.188.92]:32958)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1pMZ79-00034e-UP
 for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 13:42:48 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pMZ74-0004dg-Du; Mon, 30 Jan 2023 13:42:42 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=lYR8aEeZdetOsZPsCuVf+Y150WL2e43FUhhXqz43rm0=; b=d+t5TOLL7gLk
 rDISKa5OI4/4NKSz7DECrfx8lRCgLRvKQtI8+cWsV93rXGZVe/ikRXMErVYcF4/RxYvjSGA7pRq7p
 TAx0HHYwIN2c8r4uEz1Vg8Whenb19tbGYExUf+QwVp3FF/QpUArewAn7h3blD19Wxm8lgmzNnoO6L
 Zd9o2o2JzWn65YDwyraqIbWe2gvSlD82emIP4zIkpHtInsG70uYY9TklHp6YHJlbgQE3xXYXC0Jnr
 HaQtYsWIkDJQR31FOOAO3Tyn0x/Ltx9ZO3kvMpvQrCQ3TYy0M73w0N9zlCOp/k4XMNNoWRnmQhlJB
 zsNSG//cVJTuI4+Vtu67XA==;
Received: from [87.69.77.57] (helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pMZ72-0003cX-OK; Mon, 30 Jan 2023 13:42:42 -0500
Date: Mon, 30 Jan 2023 20:42:34 +0200
Message-Id: <83wn533lut.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> (message from
 Dmitry Gutov on Mon, 30 Jan 2023 20:20:46 +0200)
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
 <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN>
 <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN>
 <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
 <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN>
 <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN>
 <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN>
 <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> Date: Mon, 30 Jan 2023 20:20:46 +0200
> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
> From: Dmitry Gutov <dgutov@HIDDEN>
> 
> On 30/01/2023 19:49, Eli Zaretskii wrote:
> >> Date: Mon, 30 Jan 2023 19:15:07 +0200
> >> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
> >> From: Dmitry Gutov <dgutov@HIDDEN>
> >>
> >>> fast_looking_at already does an anchored match, so I'm not sure I
> >>> follow.  I don't even understand why you need th \` part, when the
> >>> match will either always start from the first position or fail.
> >>
> >> The regexp might include the anchors, or it might not.
> >>
> >> It might also use a different anchor like ^ or $ or \b.
> > 
> > OK, but it always goes only forward, so narrowing to the beginning
> > shouldn't be necessary.  Right? 
> 
> Are you saying that fast_looking_at ("\\`", ...) will always succeed?
> 
> And fast_looking_at ("^", ...), etc.

For example, for "^", if you hint that it must look back to make sure
there's a newline there, then your narrowing will also prevent it from
doing that, right?

> >> One possible alternative, I suppose, would be to create a raw pointer to
> >> a part of the buffer text and call re_search directly specifying the
> >> known length of the node in bytes. If buffer text is one contiguous
> >> region in memory, that is.
> > 
> > It isn't, though: there's the gap.  Which is why doing this is not
> > recommended; instead, use something like search_buffer_re, which
> > already handles this complication for you.  (Except that
> > search_buffer_re is a static function, so only code in search.c can
> > use it.  So you'd need to make it non-static.)
> 
> Interesting. Does search_buffer_re match the \` anchor at POS and \' at 
> LIM? IOW, does in treat the rest of the buffer as non-existing? Or could it?

That is the low-level subroutine called by re-search-forward, so you
know the answers already, I think?  IOW, that function behaves exactly
like re-search-forward in those situations.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 18:20:57 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 13:20:57 2023
Received: from localhost ([127.0.0.1]:50478 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pMYm1-0002VO-4G
	for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 13:20:57 -0500
Received: from mail-ed1-f41.google.com ([209.85.208.41]:47094)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <raaahh@HIDDEN>) id 1pMYly-0002V7-SU
 for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 13:20:55 -0500
Received: by mail-ed1-f41.google.com with SMTP id cw4so6716990edb.13
 for <60953 <at> debbugs.gnu.org>; Mon, 30 Jan 2023 10:20:54 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:from:to:cc:subject:date:message-id:reply-to;
 bh=EojLxhbQEL/7sn8leakNI7N9j/pX+sTDPguao9F/wLE=;
 b=KmjNrmslHsMwE7HZHuz8jGVsEWhbSZcZRpx2DB3LbWTpDA/F4zOPXqm5gg1YmWRUK2
 8JO1i64jeaSEY5T0t+eWq6TrAzqjRqz8hXR4NWkhn37aIBGd52cn9HP7l513Hj0x4VC1
 GvqsHD1A21ZRnVX7ZLyt4uivBztBp8gnoDiWJRosGlCnrjj0oTXfnOT8ZZ4/XphPj3Kg
 xdhlBxhqmlut3acwD9aEidzm2C3GKGQmouJm8v7hRop3TunlM9rpqFKQ2HEl7kJ8tXuo
 aC2rypsPQBNOvAnIpU81BpqjthFf12QTbKu6ZQGpwzhZEuzcdSCrfrofjoM+zqV/kr5X
 PL/w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=EojLxhbQEL/7sn8leakNI7N9j/pX+sTDPguao9F/wLE=;
 b=w8RXMwxCru8iBYS9R6GZQxQewDBNCgAXFCfNA+QMS2ynWobclnu2G8dpFFj0arYtlU
 aPtA/s4AO35gMNsoOmNGSiVGzJD5Bc01dwQ0jSi/K3zKYS2s3oj6pn+JFL29qjv/UmU+
 oJepLLgMOJgrfK261UzYqYs1RBkB9FvCC1TPshyGai18IXu6k3X40NOoyIaIXiUkiaNG
 ebUj/Maj9mdEWxXPyBiiY9fveE1AR41OC6rUVRsdwvXqUqif+AoUolBzLF/58iA8Bte0
 lJ+XsWgecL1DuHuFlj1KUo2IuFxvoPx3buMUWjxi9pIZcBxPgg9wuX4gK0bO1wftazuC
 AK0w==
X-Gm-Message-State: AO0yUKXsUXBNT3T/BVM1tR4qfkLNNy0TKC1n46AowtFyfxuSUlT9vnS4
 qQ4Oap8cqLRJSbgw9I1gqeY=
X-Google-Smtp-Source: AK7set/Un6rvk+QOS8aRR0/o4N8bto5A06VrkmG+T01TI6Bb4w7jXRxYu472yFJ68yJhwmp3oxoJEg==
X-Received: by 2002:a50:9f43:0:b0:4a2:2d79:dce2 with SMTP id
 b61-20020a509f43000000b004a22d79dce2mr9544896edf.10.1675102849018; 
 Mon, 30 Jan 2023 10:20:49 -0800 (PST)
Received: from [192.168.0.2] ([46.251.119.176])
 by smtp.googlemail.com with ESMTPSA id
 z3-20020a50eb43000000b0045b4b67156fsm7048912edp.45.2023.01.30.10.20.47
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Mon, 30 Jan 2023 10:20:48 -0800 (PST)
Message-ID: <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN>
Date: Mon, 30 Jan 2023 20:20:46 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
Content-Language: en-US
To: Eli Zaretskii <eliz@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
 <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN>
 <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN>
 <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
 <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN>
 <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN>
 <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN>
From: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <83bkmf52va.fsf@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.1 (/)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.9 (-)

On 30/01/2023 19:49, Eli Zaretskii wrote:
>> Date: Mon, 30 Jan 2023 19:15:07 +0200
>> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
>> From: Dmitry Gutov <dgutov@HIDDEN>
>>
>>> fast_looking_at already does an anchored match, so I'm not sure I
>>> follow.  I don't even understand why you need th \` part, when the
>>> match will either always start from the first position or fail.
>>
>> The regexp might include the anchors, or it might not.
>>
>> It might also use a different anchor like ^ or $ or \b.
> 
> OK, but it always goes only forward, so narrowing to the beginning
> shouldn't be necessary.  Right? 

Are you saying that fast_looking_at ("\\`", ...) will always succeed?

And fast_looking_at ("^", ...), etc.

I would imagine that only fast_looking_at ("\\=", ...) is guaranteed to 
succeed.

> And you can use the LIMIT argument to
> limit how far it goes forward, right?  So once again, why narrow?

I tried to explain that there is a certain expectation (on the part of 
the user/programmer) which anchors are allowed in the :match regexp, and 
what their effects are, and those seem hard to support without narrowing.

>>> And for \', just compare the length of the match returned by
>>> fast_looking_at with the length of the text.
>>
>> This seems to work, i.e. even when before "carpet",
>>
>> (and (looking-at (regexp-opt '("car" "cardigan" "carpet")))
>>        (match-string 0))
>>
>> returns the full match. I was expecting that it could return just "car"
>> -- not sure why it doesn't stop there.
> 
> Because regex search is greedy?

Cool. TIL, thanks. That's not going to help here, but might in other 
situations when my code controls the regexp as well.

>> One possible alternative, I suppose, would be to create a raw pointer to
>> a part of the buffer text and call re_search directly specifying the
>> known length of the node in bytes. If buffer text is one contiguous
>> region in memory, that is.
> 
> It isn't, though: there's the gap.  Which is why doing this is not
> recommended; instead, use something like search_buffer_re, which
> already handles this complication for you.  (Except that
> search_buffer_re is a static function, so only code in search.c can
> use it.  So you'd need to make it non-static.)

Interesting. Does search_buffer_re match the \` anchor at POS and \' at 
LIM? IOW, does in treat the rest of the buffer as non-existing? Or could it?




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 17:50:00 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 12:50:00 2023
Received: from localhost ([127.0.0.1]:50306 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pMYI4-0001UT-Da
	for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 12:50:00 -0500
Received: from eggs.gnu.org ([209.51.188.92]:48006)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1pMYI2-0001UE-4L
 for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 12:49:58 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pMYHw-0004Eu-DE; Mon, 30 Jan 2023 12:49:52 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=knZ8xvmGmarAskBzn1bopQ9SFrdn1i5UVZdsQuJ8imM=; b=kk97wdW9mmpn
 8+3SHx0/IHOhtf9DIxxm+XJlmqFjK1F8a/uNJD8ys7slxibZR0OqSWha+vM/DgEtDfR3tc+pDY0GO
 T2JI++fLQMCHjyGkZeuG5jEZi+J9CE/i9JPDpI2lAqT6DnJorfQuUigktJsoC0WuuMQ6AHZmQ2mTY
 Uaa5xcZ5kJCfKEuAw2IMUXgVQvdo30BTSXcqZ0B0nF2JhcpXJl6C7az229/t5nuU5SB1znsXL7ZUL
 E4cjD52hcKUtk2UgHq1xFkiLVuSCG7mqbz0JZwURJcy3ZAgGdI+6sm1TiW6CwVWcy1g93DT1TKlHs
 hPubm2T+HPpRUYNdqcn40Q==;
Received: from [87.69.77.57] (helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pMYHv-0004dt-Sk; Mon, 30 Jan 2023 12:49:52 -0500
Date: Mon, 30 Jan 2023 19:49:45 +0200
Message-Id: <83bkmf52va.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> (message from
 Dmitry Gutov on Mon, 30 Jan 2023 19:15:07 +0200)
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
 <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN>
 <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN>
 <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
 <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN>
 <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN>
 <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> Date: Mon, 30 Jan 2023 19:15:07 +0200
> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
> From: Dmitry Gutov <dgutov@HIDDEN>
> 
> > fast_looking_at already does an anchored match, so I'm not sure I
> > follow.  I don't even understand why you need th \` part, when the
> > match will either always start from the first position or fail.
> 
> The regexp might include the anchors, or it might not.
> 
> It might also use a different anchor like ^ or $ or \b.

OK, but it always goes only forward, so narrowing to the beginning
shouldn't be necessary.  Right?  And you can use the LIMIT argument to
limit how far it goes forward, right?  So once again, why narrow?

> > And for \', just compare the length of the match returned by
> > fast_looking_at with the length of the text.
> 
> This seems to work, i.e. even when before "carpet",
> 
> (and (looking-at (regexp-opt '("car" "cardigan" "carpet")))
>       (match-string 0))
> 
> returns the full match. I was expecting that it could return just "car" 
> -- not sure why it doesn't stop there.

Because regex search is greedy?

> One possible alternative, I suppose, would be to create a raw pointer to 
> a part of the buffer text and call re_search directly specifying the 
> known length of the node in bytes. If buffer text is one contiguous 
> region in memory, that is.

It isn't, though: there's the gap.  Which is why doing this is not
recommended; instead, use something like search_buffer_re, which
already handles this complication for you.  (Except that
search_buffer_re is a static function, so only code in search.c can
use it.  So you'd need to make it non-static.)




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 17:15:18 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 12:15:18 2023
Received: from localhost ([127.0.0.1]:50234 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pMXkT-0006Ue-SC
	for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 12:15:18 -0500
Received: from mail-wm1-f46.google.com ([209.85.128.46]:54845)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <raaahh@HIDDEN>) id 1pMXkS-0006UP-05
 for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 12:15:16 -0500
Received: by mail-wm1-f46.google.com with SMTP id n13so1516093wmr.4
 for <60953 <at> debbugs.gnu.org>; Mon, 30 Jan 2023 09:15:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:from:to:cc:subject:date:message-id:reply-to;
 bh=c6eoin3lV+OngQFheRXNZDPsJeRwUEn5wHklsXe4LZ4=;
 b=nmywdIpp4P+pUB8BecJYZYlEuDMVSYikn4RjdZUB+TgKtNV53TBBl0mLaxTcS5ZEIo
 RuEfDJr2+ua0evUISrWBc3m6nhVOzmdJL6h0gBOQoPp7ud5Ss3nGIrnPNVH3R7dH9eXQ
 XrQ40Qy6H0WMEYjgRnTIpBux8fTXqhd3Dii0OngNUg81SMtj8yVaw20OTX9ryccSSUmz
 5QO1w9e/Ib2zaka2LNJwli9pK8BVvX6rxA7UiU0uZoXIbOTDvK0JwE852fErZfApB4jZ
 Wv6tmZFQHlw9Ez1ccK8DZmvKCZfuE32oedsyhFcU2Gm1fLULa265RfTFvN0NN7daOul3
 Vrhg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=c6eoin3lV+OngQFheRXNZDPsJeRwUEn5wHklsXe4LZ4=;
 b=pIc+QCzMqoRqm+ThtVEMrTA5rLEIJ0iJ81QxWqO76PHCZLekO79yRXDpLafB3d9+0x
 6VzAauIeRFHomwlowsMsC1EAEJOmNCgvDxcGMdvm0MixhY8GP3NTqwBTiJfehMRDPuJt
 3d77HrmfCSP3fmIRlraDDFx97fCErckbT3AKe7I/1co9+m73M7l/Zg8cqDUQyx5i0sFR
 nAj1FAbTTLM4cfxJLhU396agLrmrU+KG/cL35mUzjn9EhqaWS4fu4RtjcjRn9+cQdYXW
 G3pZAiClOnt14znpPqmHM9Pemg32FgnjAOq4AeB/poPCky+tZGc6Ykzvufk2rfzeMlSr
 K2sw==
X-Gm-Message-State: AFqh2krHIloKxu24ewTRwZRGHVKwvdcDIQgqq/5ly/zMZtABtRlsFZ0k
 KErZZLO/KaDBQY/BslaJHxQ=
X-Google-Smtp-Source: AMrXdXviseF0baJfN+Rpk9lF5nN5SMeATLkpJIx/37COD4U5YRTxEnHzlz4CqGMrlLP56uZrnLZ1sQ==
X-Received: by 2002:a05:600c:4f83:b0:3db:eab:a600 with SMTP id
 n3-20020a05600c4f8300b003db0eaba600mr45974749wmq.7.1675098909912; 
 Mon, 30 Jan 2023 09:15:09 -0800 (PST)
Received: from [192.168.0.2] ([46.251.119.176])
 by smtp.googlemail.com with ESMTPSA id
 fl22-20020a05600c0b9600b003d1e3b1624dsm17650099wmb.2.2023.01.30.09.15.08
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Mon, 30 Jan 2023 09:15:09 -0800 (PST)
Message-ID: <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN>
Date: Mon, 30 Jan 2023 19:15:07 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
Content-Language: en-US
To: Eli Zaretskii <eliz@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
 <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN>
 <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN>
 <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
 <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN>
 <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN>
From: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <83mt603vrc.fsf@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.1 (/)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.9 (-)

On 30/01/2023 17:08, Eli Zaretskii wrote:
>> Date: Mon, 30 Jan 2023 16:47:01 +0200
>> Cc:casouri@HIDDEN,60953 <at> debbugs.gnu.org
>> From: Dmitry Gutov<dgutov@HIDDEN>
>>
>> On 30/01/2023 16:06, Eli Zaretskii wrote:
>>
>>> But why do you need to narrow there?  fast_looking_at will not go
>>> beyond end_pos/end_byte anyway, there's no need to restrict it.
>> The reason for that is to be able to support the \` and \' markers in
>> REGEXP. I haven't found any alternative approach that doesn't call
>> 'substring'.
> fast_looking_at already does an anchored match, so I'm not sure I
> follow.  I don't even understand why you need th \` part, when the
> match will either always start from the first position or fail.

The regexp might include the anchors, or it might not.

It might also use a different anchor like ^ or $ or \b.

See these examples from the documentation:

((_) @bob (#match \"^B.b$\" @bob))

'((
    (compound_expression :anchor (_) @@first (_) :* @@rest)
    (:match "love" @@first)
    ))

> And for \', just compare the length of the match returned by
> fast_looking_at with the length of the text.

This seems to work, i.e. even when before "carpet",

(and (looking-at (regexp-opt '("car" "cardigan" "carpet")))
      (match-string 0))

returns the full match. I was expecting that it could return just "car" 
-- not sure why it doesn't stop there.

But again, to find out whether we need to use the end anchor at all, 
we'd have to parse the regexp, remove the actual anchor before calling 
fast_looking_at, and then add the above check.

One possible alternative, I suppose, would be to create a raw pointer to 
a part of the buffer text and call re_search directly specifying the 
known length of the node in bytes. If buffer text is one contiguous 
region in memory, that is. This way we would regexp test against a 
string (not a buffer), but without creating a separate string object.





Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 15:09:14 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 10:09:13 2023
Received: from localhost ([127.0.0.1]:50098 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pMVmT-0006j4-FB
	for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 10:09:13 -0500
Received: from eggs.gnu.org ([209.51.188.92]:53518)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1pMVmS-0006ir-0E
 for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 10:09:12 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pMVmM-0003Ss-Kk; Mon, 30 Jan 2023 10:09:06 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=E1PYpnZZSoslwBDvomEqejc+Qjck5tg9K1LIHs0kJ9M=; b=kIy12gUkRhwp
 ZFAAMDaWa6WRf2nAui/HnAvX+BbwWsxWb6ZLoPyIekshTWM+DRHOD3obWx8NIdOYzvZVh3nYY9wbW
 i+Yz/5E/9US+o2LDCEnN5KIiN9KoiC1Cc4q141s7/6MgSBW2p24bqnIvgQQu/lvGcWmBnIOK7c5kX
 dwIFDRjfYXlfQ3knmerzzzaifswNRoafC0mQ8ghft8TK4FXfsI1fA40gmNxzOOPSyav+XB8OxOsk6
 wiXKHN8W1CON/7DpSGtr708yOHTR/74HTO5lIv9v8BG18fya45FbOF9nNkvfDc5akRzT0GFbZ9c1o
 Mea/uLi4VDuu3FC1+pFtxA==;
Received: from [87.69.77.57] (helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pMVmA-00089b-GZ; Mon, 30 Jan 2023 10:09:06 -0500
Date: Mon, 30 Jan 2023 17:08:39 +0200
Message-Id: <83mt603vrc.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> (message from
 Dmitry Gutov on Mon, 30 Jan 2023 16:47:01 +0200)
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
 <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN>
 <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN>
 <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
 <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN>
 <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> Date: Mon, 30 Jan 2023 16:47:01 +0200
> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
> From: Dmitry Gutov <dgutov@HIDDEN>
> 
> On 30/01/2023 16:06, Eli Zaretskii wrote:
> 
> > But why do you need to narrow there?  fast_looking_at will not go
> > beyond end_pos/end_byte anyway, there's no need to restrict it.
> 
> The reason for that is to be able to support the \` and \' markers in 
> REGEXP. I haven't found any alternative approach that doesn't call 
> 'substring'.

fast_looking_at already does an anchored match, so I'm not sure I
follow.  I don't even understand why you need th \` part, when the
match will either always start from the first position or fail.

And for \', just compare the length of the match returned by
fast_looking_at with the length of the text.

What am I missing?

>  > And here I suggest an additional optimization, since you already know
>  > the byte positions:
> 
> No real objection from me if you're sure, but I tried that, and the 
> benchmarks showed no difference.

Sheer luck: you force SET_BUF_BEGV etc. to call buf_charpos_to_bytepos
for no reason: you already have the byte positions in hand.

> (I suppose we also could save the previous values of BEGV_BYTE and 
> ZV_BYTE to use when restoring.)

Yes.

> Either way, we would use this *instead of* Fnarrow_to_region, right? 

Fnarrow_to_region does little else.  But I need to understand first
why you need to change the restriction.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 14:47:16 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 09:47:16 2023
Received: from localhost ([127.0.0.1]:46816 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pMVRD-0005H5-Tt
	for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 09:47:16 -0500
Received: from mail-ed1-f42.google.com ([209.85.208.42]:39479)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <raaahh@HIDDEN>) id 1pMVR8-0005Gh-2r
 for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 09:47:10 -0500
Received: by mail-ed1-f42.google.com with SMTP id y11so11170616edd.6
 for <60953 <at> debbugs.gnu.org>; Mon, 30 Jan 2023 06:47:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:from:to:cc:subject:date:message-id:reply-to;
 bh=ZWqCmWvC7lDdWhhtUYU2g7FLAYhzBg0Wokh3Ji1e0bI=;
 b=XVEeEClz4tTorrRt6ebjRWjv1K9X7evRUbjiJOuLe9rYj0UhcRXL0AIvfa1ZfMkqW2
 mdi1yZnzG2WBO5LhJJ6qpVQujZaTYVBQbyRNBoL4lqr2MBiiKuKgzW34iLiH9VusCOJB
 PX3u9hD9YbcgbGIkgDDf+rl+CQGls0pskR5cZQc6VAu2Qa15FPzPZJiLYehegkGG1sSq
 LFoVGvSZoW7yKTRmzwOdnI56ioIlkKR5E4Cn6I+5j4hwdlOYXPBuuM6FwO/6HRfbge7d
 WQ3RvPCAlC3qWcuXqbQjcQIauD3DTseZMnDq00yBcTi1vJSj1B6hH/ebe9GIsTSr4nkD
 bBxA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=ZWqCmWvC7lDdWhhtUYU2g7FLAYhzBg0Wokh3Ji1e0bI=;
 b=pzrnsp0mBEmFg2A/ijflJMVxldmndRCjd1vyVsUhkszADAWfYFmr5AF6V501V3PwWy
 XoaVxCEmlgoBmsGRT9gKrZ9pD1I7C1vQt4VKQv3RakCvJA1r1SOzwdoWtuIC6JRjXUHg
 11T2Zkp08SlFxZ+9BSnulVk5C+p2qjmDy4j3bZyWTgMCtCtTZQ8Vv1bRmXjIxW0qQkxl
 sCQNgad7d5RppWcf6Z/UuO48F0tnJFQXfJCw/B52j34nX1lYzH+bcRTttKccBoJxo41M
 PAYVPUGE8DXS7ABGFz7pKrHrM0lCyt5Wtcy35hIAjkGeW1NXwGhWQYayCkHp8Ot4Q+zj
 L6LA==
X-Gm-Message-State: AFqh2koyajSNyR5EDHlmPzGhGDqmihcCZF4BMnb6+lNHc8MwImqZeKiW
 uhj57gequgb0R0K3soOYe2U=
X-Google-Smtp-Source: AMrXdXtxp5uCylt5XuqJFCkJGFVjEeFOpLgo68hM7QNIbUb+yh1qvZzup5LN8f1NAtkwvqtWEF47uA==
X-Received: by 2002:a50:c005:0:b0:49e:f062:99e6 with SMTP id
 r5-20020a50c005000000b0049ef06299e6mr37718932edb.28.1675090024105; 
 Mon, 30 Jan 2023 06:47:04 -0800 (PST)
Received: from [192.168.0.2] ([46.251.119.176])
 by smtp.googlemail.com with ESMTPSA id
 z7-20020a05640240c700b0046c4553010fsm6941353edb.1.2023.01.30.06.47.02
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Mon, 30 Jan 2023 06:47:03 -0800 (PST)
Message-ID: <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN>
Date: Mon, 30 Jan 2023 16:47:01 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
Content-Language: en-US
To: Eli Zaretskii <eliz@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
 <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN>
 <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN>
 <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
 <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN>
From: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <83zga03yne.fsf@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: -0.9 (/)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.9 (-)

On 30/01/2023 16:06, Eli Zaretskii wrote:

 > Our style is to leave a blank between ASET and the left parenthesis.

Sure, thanks.

> Mmm... no.  You should use Fnarrow_to_region, I think.

Last time I tried that (from Lisp, with save-restriction), I think the 
result was measurably slower. I can try it again from C, though.

> But why do you need to narrow there?  fast_looking_at will not go
> beyond end_pos/end_byte anyway, there's no need to restrict it.

The reason for that is to be able to support the \` and \' markers in 
REGEXP. I haven't found any alternative approach that doesn't call 
'substring'.

 > And here I suggest an additional optimization, since you already know
 > the byte positions:

No real objection from me if you're sure, but I tried that, and the 
benchmarks showed no difference. So I submitted the shorter version.

(I suppose we also could save the previous values of BEGV_BYTE and 
ZV_BYTE to use when restoring.)

Either way, we would use this *instead of* Fnarrow_to_region, right? 
Because it only accepts two arguments.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 14:06:30 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 09:06:29 2023
Received: from localhost ([127.0.0.1]:46735 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pMUnl-0001u4-IG
	for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 09:06:29 -0500
Received: from eggs.gnu.org ([209.51.188.92]:39794)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1pMUni-0001tr-SL
 for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 09:06:28 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pMUnd-0007Sr-Hd; Mon, 30 Jan 2023 09:06:21 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=Wyq9ne4NlBwxBTb9CaVmeT/DuUxYnO30s70SqEbIgoo=; b=Z7nA98vW+1U3
 chSFLQ2Nzmro7SbRU3z8azVu6K/GkJ2Rpwa8kfYA1OhDxcMIr+zfn1yI7TwpsrLmMkeIQZwNEHbxg
 KwlYujHlVyiSfyQfB/HQeBuNuX17w9irG215NFK4eFD6E/0aojWAkM4qs/dynlF35JXYYBn+8GaSd
 Yla8uQ6v7TmtMktcxaggDG8bjU8m7AWWqjnd8j0vJsobuHNb1ELsq1Pin5rUndTT0Pc6MCD/ZB9oO
 tBXYS4TwjyUXuu7Vz1UWb/w1lP8BMbFfWg5KrqQqvP908/T8P/vvmnaEKI2OgliFo8QudmmiBZxYF
 YCtDVsYevpSMCkxP4JWwGw==;
Received: from [87.69.77.57] (helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pMUnc-0006UU-Qv; Mon, 30 Jan 2023 09:06:21 -0500
Date: Mon, 30 Jan 2023 16:06:13 +0200
Message-Id: <83zga03yne.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> (message from
 Dmitry Gutov on Mon, 30 Jan 2023 02:49:47 +0200)
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
 <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN>
 <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN>
 <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
 <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> Date: Mon, 30 Jan 2023 02:49:47 +0200
> From: Dmitry Gutov <dgutov@HIDDEN>
> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
> 
> Code review welcome.

See some below.

> Is applying (and undoing) the narrowing this way legal enough? Or should 
> I go through some error handlers, or ensure blocks, etc?

Mmm... no.  You should use Fnarrow_to_region, I think.

But why do you need to narrow there?  fast_looking_at will not go
beyond end_pos/end_byte anyway, there's no need to restrict it.

Or are you thinking about widening a buffer that is already narrowed?
But if so, can we have parser data beyond the restriction?

> +      Lisp_Object predicates = AREF(predicates_table, match.pattern_index);
> +      if (EQ (predicates, Qt))
> +	{
> +	  predicates = treesit_predicates_for_pattern (treesit_query, 0);
> +	  ASET(predicates_table, match.pattern_index, predicates);

Our style is to leave a blank between ASET and the left parenthesis.

> +  set_buffer_internal (buffer);
> +
> +  TSNode treesit_node = XTS_NODE (node)->node;
> +  ptrdiff_t visible_beg = XTS_PARSER (XTS_NODE (node)->parser)->visible_beg;
> +  uint32_t start_byte_offset = ts_node_start_byte (treesit_node);
> +  uint32_t end_byte_offset = ts_node_end_byte (treesit_node);
> +  ptrdiff_t start_byte = visible_beg + start_byte_offset;
> +  ptrdiff_t end_byte = visible_beg + end_byte_offset;
> +  ptrdiff_t start_pos = buf_bytepos_to_charpos (buffer, start_byte);
> +  ptrdiff_t end_pos = buf_bytepos_to_charpos (buffer, end_byte);
> +  ptrdiff_t old_begv = BEGV;
> +  ptrdiff_t old_zv = ZV;

Since you switch to BUFFER, you can use BYTE_TO_CHAR, no need for
buf_bytepos_to_charpos.

> +  SET_BUF_BEGV(buffer, start_pos);
> +  SET_BUF_ZV(buffer, end_pos);

And here I suggest an additional optimization, since you already know
the byte positions:

  BEGV = start_pos;
  BEGV_BYTE = start_byte;
  ZV = end_pos;
  ZV_BYTE = end_byte;




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 00:49:59 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Jan 29 19:49:59 2023
Received: from localhost ([127.0.0.1]:45616 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pMIMx-0007ru-3i
	for submit <at> debbugs.gnu.org; Sun, 29 Jan 2023 19:49:59 -0500
Received: from mail-wm1-f53.google.com ([209.85.128.53]:51140)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <raaahh@HIDDEN>) id 1pMIMu-0007rg-Rw
 for 60953 <at> debbugs.gnu.org; Sun, 29 Jan 2023 19:49:57 -0500
Received: by mail-wm1-f53.google.com with SMTP id bg26so1142981wmb.0
 for <60953 <at> debbugs.gnu.org>; Sun, 29 Jan 2023 16:49:56 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=in-reply-to:references:cc:to:from:content-language:subject
 :user-agent:mime-version:date:message-id:sender:from:to:cc:subject
 :date:message-id:reply-to;
 bh=ccMJ3JxQbMNi8a0+irMIdsxaWvzxPV5RMJaE5jeftp0=;
 b=Qs9To49G1i0hEEl2RdkNgPCAcgi3rMNzz03fMYW9/NldMSoLudpp3FB4e22qmLHVVH
 zKbA3JeVGFg6y4wO9mbnzFO3hEDw3o5Y9/sSoe39+g1o9vcEeGVTwPnm8ik5fvFm7D1G
 krfTzRFueyWRg6ZlhWLIOBLg0Y45rfqqnZv8tXZ2CkBOfyRjbqrTNrwyYmMCvY4ZmzBm
 1InLYTB8IuHzU40FsT0gsEHMQBwa0eoBV5Cfj2eYkZOl9zs+Z45sXgaHEN9Wkx+AcdMv
 HSe3sLWfcZSGRDa7bh+t+ycRH3MlsapA7gdGrCOuZxfyhGVyueXImJDydOnL24twJVEF
 nzfQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=in-reply-to:references:cc:to:from:content-language:subject
 :user-agent:mime-version:date:message-id:sender:x-gm-message-state
 :from:to:cc:subject:date:message-id:reply-to;
 bh=ccMJ3JxQbMNi8a0+irMIdsxaWvzxPV5RMJaE5jeftp0=;
 b=FiJN4M3tbj/kA7r0fAiqBWwPDLeEjCuJtaMd/KbrswvkXCsjBF792pyy9ZProw95b1
 dy8AuC/JHv8IK7efsRks3KgK9W9phLi2G7jlA8/GTku+ukFh7tzsOP5UMZfwXOl9dcD1
 zpjZr21a84lxincNrzM+QN5ypGFsFfHFM6+sf8hM7YfrxL9OY6qz9Z3KKG0F1lMlgKY9
 LpO0YGAzJR2PIIWJo0OvXYCdtONfXRkv4IkU4MVVVuMQTaEjiNRw9+kv+LzjMMMi5Eun
 NYkWN+frJvH4x/VV7SqH8/6URb7Be4O02cJm2URSfx2Rg2MPh/P/chvdF6F9dDFF6+H6
 Y4Uw==
X-Gm-Message-State: AFqh2kqxG5RHbf2N2F2FU+76WmGvcl7bzIGH/61iyP0nq0e9yv7k0nuO
 DJ2OX9oOwbFpr7S30C6I1QE=
X-Google-Smtp-Source: AMrXdXscN4xgE4HJaKMvzP3NYA33iT0iAVy/7/ksuuAseJtluurk0rMJl4vcZrH/ojnT16JtxOO39w==
X-Received: by 2002:a05:600c:601c:b0:3d9:ee01:60a4 with SMTP id
 az28-20020a05600c601c00b003d9ee0160a4mr48658374wmb.1.1675039790753; 
 Sun, 29 Jan 2023 16:49:50 -0800 (PST)
Received: from [192.168.0.2] ([46.251.119.176])
 by smtp.googlemail.com with ESMTPSA id
 h15-20020a05600c2caf00b003d974076f13sm12525596wmc.3.2023.01.29.16.49.49
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Sun, 29 Jan 2023 16:49:49 -0800 (PST)
Content-Type: multipart/mixed; boundary="------------FtNCE7uFJJvPd9u0gZl5ikKW"
Message-ID: <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN>
Date: Mon, 30 Jan 2023 02:49:47 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
Content-Language: en-US
From: Dmitry Gutov <dgutov@HIDDEN>
To: Eli Zaretskii <eliz@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
 <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN>
 <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN>
 <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
In-Reply-To: <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
X-Spam-Score: -0.9 (/)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.9 (-)

This is a multi-part message in MIME format.
--------------FtNCE7uFJJvPd9u0gZl5ikKW
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

On 26/01/2023 23:26, Dmitry Gutov wrote:
>>>> (But I thought you concluded that GC alone cannot explain the
>>>> difference in performance?)
>>> I'm inclined to think the difference is related to copying of the regexp
>>> string, but whether the time is spent in actually copying it, or
>>> scanning its copies for garbage later, it was harder to say. Seems like
>>> it's the latter, though.
>> If we can avoid the copying, I think it's desirable in any case.  They
>> are constant regexps, aren't they?
> 
> Yes, but how?
> 
> Memoization is one possible step, but then we only avoid re-creating the 
> predicate structures for each match. We still send a pretty large query 
> and, apparently, get it back..? Might be some copying involved there.
> 
> TBH the moderate success the memoization patch shows has me stumped.

Okay, I have cleaned up both experiments that I had. And when combined, 
they make the :match approach a little faster than the :pred one.

I'm still not sure why the difference is so little, given that the :pred 
one has Lisp funcalls and extra allocation, and :match does not.

Still, if nobody has any better ideas, I suggest we install both of 
these changes now. They are attached in separate patches.

memoize_vector.diff improves the performance of both cases. For :pred, 
it's roughly 10%; for :match, it's more.

treesit_predicate_match.diff improves the performance of the latter, 
though only a little: maybe 3-4%.

Code review welcome.

Is applying (and undoing) the narrowing this way legal enough? Or should 
I go through some error handlers, or ensure blocks, etc?

Speaking of pref, the profile looks like this now (very similar to what 
it was before the added rule):

   17.25%  emacs  libtree-sitter.so.0.0       [.] 
ts_tree_cursor_current_status
   10.93%  emacs  libtree-sitter.so.0.0       [.] 
ts_tree_cursor_goto_next_sibling
    9.89%  emacs  libtree-sitter.so.0.0       [.] 
ts_tree_cursor_goto_first_child
    9.01%  emacs  emacs                       [.] process_mark_stack
    4.80%  emacs  libtree-sitter.so.0.0       [.] ts_node_start_point
    3.84%  emacs  emacs                       [.] re_match_2_internal
    3.82%  emacs  libtree-sitter.so.0.0       [.] ts_tree_cursor_parent_node
    3.06%  emacs  libtree-sitter.so.0.0       [.] 
ts_language_symbol_metadata
--------------FtNCE7uFJJvPd9u0gZl5ikKW
Content-Type: text/x-patch; charset=UTF-8; name="memoize_vector.diff"
Content-Disposition: attachment; filename="memoize_vector.diff"
Content-Transfer-Encoding: base64

ZGlmZiAtLWdpdCBhL3NyYy90cmVlc2l0LmMgYi9zcmMvdHJlZXNpdC5jCmluZGV4IGIyMTBl
YzA5MjNhLi43MWFmZjMyMDJhZSAxMDA2NDQKLS0tIGEvc3JjL3RyZWVzaXQuYworKysgYi9z
cmMvdHJlZXNpdC5jCkBAIC0yNzIwLDggKzI3NDQsMTAgQEAgREVGVU4gKCJ0cmVlc2l0LXF1
ZXJ5LWNhcHR1cmUiLAogICAgICBldmVyeSBmb3IgbG9vcCBhbmQgbmNvbmMgaXQgdG8gUkVT
VUxUIGV2ZXJ5IHRpbWUuICBUaGF0IGlzIGluZGVlZAogICAgICB0aGUgaW5pdGlhbCBpbXBs
ZW1lbnRhdGlvbiBpbiB3aGljaCBZb2F2IGZvdW5kIG5jb25jIGJlaW5nIHRoZQogICAgICBi
b3R0bGVuZWNrICg5OC40JSBvZiB0aGUgcnVubmluZyB0aW1lIHNwZW50IG9uIG5jb25jKS4g
ICovCisgIHVpbnQzMl90IHBhdHRlcm5zX2NvdW50ID0gdHNfcXVlcnlfcGF0dGVybl9jb3Vu
dCh0cmVlc2l0X3F1ZXJ5KTsKICAgTGlzcF9PYmplY3QgcmVzdWx0ID0gUW5pbDsKICAgTGlz
cF9PYmplY3QgcHJldl9yZXN1bHQgPSByZXN1bHQ7CisgIExpc3BfT2JqZWN0IHByZWRpY2F0
ZXNfdGFibGUgPSBtYWtlX3ZlY3RvcihwYXR0ZXJuc19jb3VudCwgUXQpOwogICB3aGlsZSAo
dHNfcXVlcnlfY3Vyc29yX25leHRfbWF0Y2ggKGN1cnNvciwgJm1hdGNoKSkKICAgICB7CiAg
ICAgICAvKiBSZWNvcmQgdGhlIGNoZWNrcG9pbnQgdGhhdCB3ZSBtYXkgcm9sbCBiYWNrIHRv
LiAgKi8KQEAgLTI3NTAsOSArMjc3NiwxMiBAQCBERUZVTiAoInRyZWVzaXQtcXVlcnktY2Fw
dHVyZSIsCiAJICByZXN1bHQgPSBGY29ucyAoY2FwLCByZXN1bHQpOwogCX0KICAgICAgIC8q
IEdldCBwcmVkaWNhdGVzLiAgKi8KLSAgICAgIExpc3BfT2JqZWN0IHByZWRpY2F0ZXMKLQk9
IHRyZWVzaXRfcHJlZGljYXRlc19mb3JfcGF0dGVybiAodHJlZXNpdF9xdWVyeSwKLQkJCQkJ
ICBtYXRjaC5wYXR0ZXJuX2luZGV4KTsKKyAgICAgIExpc3BfT2JqZWN0IHByZWRpY2F0ZXMg
PSBBUkVGKHByZWRpY2F0ZXNfdGFibGUsIG1hdGNoLnBhdHRlcm5faW5kZXgpOworICAgICAg
aWYgKEVRIChwcmVkaWNhdGVzLCBRdCkpCisJeworCSAgcHJlZGljYXRlcyA9IHRyZWVzaXRf
cHJlZGljYXRlc19mb3JfcGF0dGVybiAodHJlZXNpdF9xdWVyeSwgMCk7CisJICBBU0VUKHBy
ZWRpY2F0ZXNfdGFibGUsIG1hdGNoLnBhdHRlcm5faW5kZXgsIHByZWRpY2F0ZXMpOworCX0K
IAogICAgICAgLyogY2FwdHVyZXNfbGlzcCA9IEZucmV2ZXJzZSAoY2FwdHVyZXNfbGlzcCk7
ICovCiAgICAgICBzdHJ1Y3QgY2FwdHVyZV9yYW5nZSBjYXB0dXJlc19yYW5nZSA9IHsgcmVz
dWx0LCBwcmV2X3Jlc3VsdCB9Owo=
--------------FtNCE7uFJJvPd9u0gZl5ikKW
Content-Type: text/x-patch; charset=UTF-8; name="treesit_predicate_match.diff"
Content-Disposition: attachment; filename="treesit_predicate_match.diff"
Content-Transfer-Encoding: base64

ZGlmZiAtLWdpdCBhL3NyYy90cmVlc2l0LmMgYi9zcmMvdHJlZXNpdC5jCmluZGV4IGIyMTBl
YzA5MjNhLi4zNjMwZGI0MmY1ZSAxMDA2NDQKLS0tIGEvc3JjL3RyZWVzaXQuYworKysgYi9z
cmMvdHJlZXNpdC5jCkBAIC0yNDY2LDEwICsyNDY2LDM0IEBAIHRyZWVzaXRfcHJlZGljYXRl
X21hdGNoIChMaXNwX09iamVjdCBhcmdzLCBzdHJ1Y3QgY2FwdHVyZV9yYW5nZSBjYXB0dXJl
cykKIAkgICAgICBidWlsZF9zdHJpbmcgKCJUaGUgc2Vjb25kIGFyZ3VtZW50IHRvIGBtYXRj
aCcgc2hvdWxkICIKIAkJICAgICAgICAgICAgImJlIGEgY2FwdHVyZSBuYW1lLCBub3QgYSBz
dHJpbmciKSk7CiAKLSAgTGlzcF9PYmplY3QgdGV4dCA9IHRyZWVzaXRfcHJlZGljYXRlX2Nh
cHR1cmVfbmFtZV90b190ZXh0IChjYXB0dXJlX25hbWUsCi0JCQkJCQkJICAgICBjYXB0dXJl
cyk7CisgIExpc3BfT2JqZWN0IG5vZGUgPSB0cmVlc2l0X3ByZWRpY2F0ZV9jYXB0dXJlX25h
bWVfdG9fbm9kZSAoY2FwdHVyZV9uYW1lLCBjYXB0dXJlcyk7CiAKLSAgaWYgKGZhc3Rfc3Ry
aW5nX21hdGNoIChyZWdleHAsIHRleHQpID49IDApCisgIHN0cnVjdCBidWZmZXIgKm9sZF9i
dWZmZXIgPSBjdXJyZW50X2J1ZmZlcjsKKyAgc3RydWN0IGJ1ZmZlciAqYnVmZmVyID0gWEJV
RkZFUiAoWFRTX1BBUlNFUiAoWFRTX05PREUgKG5vZGUpLT5wYXJzZXIpLT5idWZmZXIpOwor
ICBzZXRfYnVmZmVyX2ludGVybmFsIChidWZmZXIpOworCisgIFRTTm9kZSB0cmVlc2l0X25v
ZGUgPSBYVFNfTk9ERSAobm9kZSktPm5vZGU7CisgIHB0cmRpZmZfdCB2aXNpYmxlX2JlZyA9
IFhUU19QQVJTRVIgKFhUU19OT0RFIChub2RlKS0+cGFyc2VyKS0+dmlzaWJsZV9iZWc7Cisg
IHVpbnQzMl90IHN0YXJ0X2J5dGVfb2Zmc2V0ID0gdHNfbm9kZV9zdGFydF9ieXRlICh0cmVl
c2l0X25vZGUpOworICB1aW50MzJfdCBlbmRfYnl0ZV9vZmZzZXQgPSB0c19ub2RlX2VuZF9i
eXRlICh0cmVlc2l0X25vZGUpOworICBwdHJkaWZmX3Qgc3RhcnRfYnl0ZSA9IHZpc2libGVf
YmVnICsgc3RhcnRfYnl0ZV9vZmZzZXQ7CisgIHB0cmRpZmZfdCBlbmRfYnl0ZSA9IHZpc2li
bGVfYmVnICsgZW5kX2J5dGVfb2Zmc2V0OworICBwdHJkaWZmX3Qgc3RhcnRfcG9zID0gYnVm
X2J5dGVwb3NfdG9fY2hhcnBvcyAoYnVmZmVyLCBzdGFydF9ieXRlKTsKKyAgcHRyZGlmZl90
IGVuZF9wb3MgPSBidWZfYnl0ZXBvc190b19jaGFycG9zIChidWZmZXIsIGVuZF9ieXRlKTsK
KyAgcHRyZGlmZl90IG9sZF9iZWd2ID0gQkVHVjsKKyAgcHRyZGlmZl90IG9sZF96diA9IFpW
OworCisgIFNFVF9CVUZfQkVHVihidWZmZXIsIHN0YXJ0X3Bvcyk7CisgIFNFVF9CVUZfWlYo
YnVmZmVyLCBlbmRfcG9zKTsKKworICBwdHJkaWZmX3QgdmFsID0gZmFzdF9sb29raW5nX2F0
IChyZWdleHAsIHN0YXJ0X3Bvcywgc3RhcnRfYnl0ZSwgZW5kX3BvcywgZW5kX2J5dGUsIFFu
aWwpOworCisgIFNFVF9CVUZfQkVHVihidWZmZXIsIG9sZF9iZWd2KTsKKyAgU0VUX0JVRl9a
VihidWZmZXIsIG9sZF96dik7CisKKyAgc2V0X2J1ZmZlcl9pbnRlcm5hbCAob2xkX2J1ZmZl
cik7CisKKyAgaWYgKHZhbCA+IDApCiAgICAgcmV0dXJuIHRydWU7CiAgIGVsc2UKICAgICBy
ZXR1cm4gZmFsc2U7Cg==

--------------FtNCE7uFJJvPd9u0gZl5ikKW--




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 21:27:05 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 26 16:27:05 2023
Received: from localhost ([127.0.0.1]:36340 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pL9lx-00041Y-1g
	for submit <at> debbugs.gnu.org; Thu, 26 Jan 2023 16:27:05 -0500
Received: from mail-ej1-f43.google.com ([209.85.218.43]:42826)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <raaahh@HIDDEN>) id 1pL9lu-000414-S1
 for 60953 <at> debbugs.gnu.org; Thu, 26 Jan 2023 16:27:03 -0500
Received: by mail-ej1-f43.google.com with SMTP id bk15so8663703ejb.9
 for <60953 <at> debbugs.gnu.org>; Thu, 26 Jan 2023 13:27:02 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:content-language
 :references:cc:to:subject:user-agent:mime-version:date:message-id
 :sender:from:to:cc:subject:date:message-id:reply-to;
 bh=BS3DGF0sSLKQtAa0ql9SoQjS5x+j1bcnb6YbOErguac=;
 b=eDBmkJe/kjYLsTCejgZAThIFGHFumDuhOGE4MwdROUAtuIKWJcIv0HKaUFlOJIKkje
 laMnzTX0nn9Ge9yoiTDsssvG+n5mZQc/g9rtzncbN9p8aukbKXbtTwp+UabRiPNyvzvS
 tO4IPHOFP6IhmeURQHO2Er/ikBMkipr9K00ygMoCpJ5N74gxmRoqeyydXYEh5m/4lGAF
 B0oIFh/licKsQAM7cF0ndFcYuHyp4oHm6SlVdAW2syh1MQgdHv0agjjzrmoWvVO+scK8
 bzm334vn2PvUKAdnMoVxag+7BVrKF734NwQtyWpYOjgnT348aBjWL0zy5rmoT8lEieoS
 incA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:content-language
 :references:cc:to:subject:user-agent:mime-version:date:message-id
 :sender:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=BS3DGF0sSLKQtAa0ql9SoQjS5x+j1bcnb6YbOErguac=;
 b=XacnzGBxpMQaa4wdRdnmECruNopGQfYKTkOdX7dTOvVb913UbLTyHxZsJoHngOEWFx
 bdBjLrPx22wFOu9h5V2OXkms6g0Q4df5qwdcTiS96VrHDaPu3BCM7hSsiZc3IIBGrS5x
 OFae1bqRn6YcaAg+CGVzvkAGUELH9mE5jL6AsHWyt5o8Y6DbNwMpqMKjLC8b0NMd20eH
 PAECCYgEO24mtYuEC6o0zG8OXBoqoeaWuze4GQOifCpg2BqIKvAHG2gcvPbN2LhPbN/Z
 m3pwqE26LtpzjUIjB4PciGUOTp486qdyWI9Vwjo/Q69FLuNvvxc8pqvOEMX+5Zgh04DU
 wTLQ==
X-Gm-Message-State: AFqh2kqHXTEltVUcq7b7fIj5GskAdiudb6qg6+V84WYJ0wE44V2YZTCx
 V0DVc9S8mSb650QFPzkz2EA=
X-Google-Smtp-Source: AMrXdXsmtwgb8B0cUUQv+2/LS3QZ6WipMm7C5J7hS8wYm6FSwNzSRQm+jn611kQKzsPQckAxzgQmGg==
X-Received: by 2002:a17:906:175a:b0:877:6713:7e99 with SMTP id
 d26-20020a170906175a00b0087767137e99mr31234882eje.58.1674768416834; 
 Thu, 26 Jan 2023 13:26:56 -0800 (PST)
Received: from [192.168.0.2] ([46.251.119.176])
 by smtp.googlemail.com with ESMTPSA id
 w9-20020a170906184900b007c0f217aadbsm1121822eje.24.2023.01.26.13.26.55
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Thu, 26 Jan 2023 13:26:56 -0800 (PST)
Message-ID: <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN>
Date: Thu, 26 Jan 2023 23:26:54 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
To: Eli Zaretskii <eliz@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
 <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN>
 <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN>
Content-Language: en-US
From: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <83pmb1cbg5.fsf@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.1 (/)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.9 (-)

On 26/01/2023 22:01, Eli Zaretskii wrote:
>> Date: Thu, 26 Jan 2023 21:35:55 +0200
>> Cc:casouri@HIDDEN,60953 <at> debbugs.gnu.org
>> From: Dmitry Gutov<dgutov@HIDDEN>
>>
>>> If you are saying that GC is responsible, then running the benchmark
>>> with gc-cons-threshold set to most-positive-fixnum should produce a
>>> more interesting profile and perhaps a more interesting comparison.
>> That really helps:
>>
>> (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (let
>> (treesit--font-lock-fast-mode) (font-lock-ensure))))
>>
>> => (16.078430587 251 5.784299419999996)
>>
>> (let ((gc-cons-threshold most-positive-fixnum)) (benchmark-run 1000
>> (progn (font-lock-mode -1) (font-lock-mode 1) (let
>> (treesit--font-lock-fast-mode) (font-lock-ensure)))))
>>
>> => (10.369389725 0 0.0)
>>
>> Do you want a perf profile for the latter? It might not be very useful.
> I'd be interested in comparing the profiles of the two techniques, the
> :pred and the :match, with GC disabled like that.

Curiously, :pred is still faster, but the difference is much smaller:

pred:

(9.212951344 0 0.0)

   18.23%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_current_status
   11.61%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_goto_next_sibling
   11.43%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_goto_first_child
    5.00%  emacs         libtree-sitter.so.0.0       [.] ts_node_start_point
    4.02%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_parent_node
    3.97%  emacs         emacs                       [.] re_match_2_internal
    3.36%  emacs         libtree-sitter.so.0.0       [.] 
ts_language_symbol_metadata
    2.45%  emacs         emacs                       [.] 
parse_str_as_multibyte
    1.95%  emacs         emacs                       [.] exec_byte_code
    1.66%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_current_node
    1.66%  emacs         libtree-sitter.so.0.0       [.] ts_node_end_point
    1.30%  emacs         emacs                       [.] allocate_vectorlike
    1.24%  emacs         emacs                       [.] find_interval

match:

(10.059083317 0 0.0)

   19.23%  emacs         libtree-sitter.so.0.0  [.] 
ts_tree_cursor_current_status
   12.41%  emacs         libtree-sitter.so.0.0  [.] 
ts_tree_cursor_goto_next_sibling
   11.22%  emacs         libtree-sitter.so.0.0  [.] 
ts_tree_cursor_goto_first_child
    5.21%  emacs         libtree-sitter.so.0.0  [.] ts_node_start_point
    4.22%  emacs         emacs                  [.] re_match_2_internal
    3.97%  emacs         libtree-sitter.so.0.0  [.] 
ts_tree_cursor_parent_node
    3.64%  emacs         libtree-sitter.so.0.0  [.] 
ts_language_symbol_metadata
    2.36%  emacs         emacs                  [.] exec_byte_code
    1.66%  emacs         libtree-sitter.so.0.0  [.] ts_node_end_point
    1.62%  emacs         libtree-sitter.so.0.0  [.] 
ts_tree_cursor_current_node
    1.34%  emacs         libtree-sitter.so.0.0  [.] ts_node_end_byte
    1.28%  emacs         emacs                  [.] allocate_vectorlike
    0.95%  emacs         libtree-sitter.so.0.0  [.] 
ts_tree_cursor_goto_parent

This is with the current code and disabled GC. No additional changes to 
treesit.c.

>>> (But I thought you concluded that GC alone cannot explain the
>>> difference in performance?)
>> I'm inclined to think the difference is related to copying of the regexp
>> string, but whether the time is spent in actually copying it, or
>> scanning its copies for garbage later, it was harder to say. Seems like
>> it's the latter, though.
> If we can avoid the copying, I think it's desirable in any case.  They
> are constant regexps, aren't they?

Yes, but how?

Memoization is one possible step, but then we only avoid re-creating the 
predicate structures for each match. We still send a pretty large query 
and, apparently, get it back..? Might be some copying involved there.

TBH the moderate success the memoization patch shows has me stumped.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 20:46:41 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 26 15:46:40 2023
Received: from localhost ([127.0.0.1]:36303 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pL98q-0002za-II
	for submit <at> debbugs.gnu.org; Thu, 26 Jan 2023 15:46:40 -0500
Received: from mail-ej1-f41.google.com ([209.85.218.41]:33523)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <raaahh@HIDDEN>) id 1pL98p-0002zO-Ja
 for 60953 <at> debbugs.gnu.org; Thu, 26 Jan 2023 15:46:40 -0500
Received: by mail-ej1-f41.google.com with SMTP id tz11so8597884ejc.0
 for <60953 <at> debbugs.gnu.org>; Thu, 26 Jan 2023 12:46:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=in-reply-to:references:cc:to:from:content-language:subject
 :user-agent:mime-version:date:message-id:sender:from:to:cc:subject
 :date:message-id:reply-to;
 bh=rQazVJ8+UDAi+tFvCR2cvCHtaDjBGT0q5PiObKJfIuc=;
 b=NbR/5O8rQsr4Nbx/JFcD+FVsqq3zT2nwUA0qzBvvgEjUsKwwsHaV6X0u4O/FqhErp4
 22vYNlfAl0ZLa/J6pwOhDik3Q8CuwoTBWw1B7x0dCCIgfz5fLI5VCT28gTcDdM+fzLrl
 lio9XdVPi5trbtH+0jggwPhrVjVQsKbIFs9U5X5za+tYAYlWvziXsUxA64Z+sPRZI43W
 24kir6ArBnkTzDy+pp8bwYbiMMLohvKCsVCir7KNcQ8NY2yreJ9bjKE+vJrYKTx6HTGN
 dGJZcF8T/lF21vdMs1+5hcazPCQaTUy5LLgSGBTC+x4iWrT54phMY8aqQnD7kmDoIQBV
 1hwg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=in-reply-to:references:cc:to:from:content-language:subject
 :user-agent:mime-version:date:message-id:sender:x-gm-message-state
 :from:to:cc:subject:date:message-id:reply-to;
 bh=rQazVJ8+UDAi+tFvCR2cvCHtaDjBGT0q5PiObKJfIuc=;
 b=wKSpPq8bqK2W4z8iMT75V7QWl2wKMuzjvoyS8hrDdbMTMoh2XMmnCn0E887x+c33ff
 IDolJGgneX23LKa2by/8dCUBf50jugFJLtph8Oqh3zQq+MIHxeOKfRSGWs2kFmgdcmVM
 Ea8gVaU9SYqpMzzrfBlXqsBny3r6Obl7W3OUeClrzk19fJFDjov56UwFWYuMoXxdJqTG
 W1RUfR+dywHPMJwgObpzL8YQDmZayXzBJZnf822IOhW5s+cLpF+9tIPp1YmVXouaMW37
 ui4XbKNFKoRa8/yl4AzT6YbgLNU1IyvXQV/ClvYDnvZXBTP+uT+2lqi85GZgE1oK+6Yh
 TG4A==
X-Gm-Message-State: AO0yUKUOQArfnx4FifaIVyQnf2AnAOPhshRPuU4SwCRKlsQmKsmkKmMZ
 G4uKE+04Vut6ZRGE7ueR8ek=
X-Google-Smtp-Source: AK7set+oOKNx/cs/16eCytpJ/YkfwgDfIuiQ0kher+QjqXUJoIvbuM8gDPeB36SLNHES6ofukM41zw==
X-Received: by 2002:a17:906:d974:b0:878:7a0e:5730 with SMTP id
 rp20-20020a170906d97400b008787a0e5730mr2297979ejb.56.1674765993519; 
 Thu, 26 Jan 2023 12:46:33 -0800 (PST)
Received: from [10.115.253.32] ([138.199.34.134])
 by smtp.googlemail.com with ESMTPSA id
 e22-20020a17090658d600b0085214114218sm1089170ejs.185.2023.01.26.12.46.31
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Thu, 26 Jan 2023 12:46:32 -0800 (PST)
Content-Type: multipart/mixed; boundary="------------sHXGqMYlVXefA1o6sqxFpSea"
Message-ID: <1d7aaf56-6130-c0f0-446f-4bc2c5cafa28@HIDDEN>
Date: Thu, 26 Jan 2023 22:46:30 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
Content-Language: en-US
From: Dmitry Gutov <dgutov@HIDDEN>
To: Eli Zaretskii <eliz@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <31559c1f-1a12-691d-3d03-f566019a0aab@HIDDEN>
In-Reply-To: <31559c1f-1a12-691d-3d03-f566019a0aab@HIDDEN>
X-Spam-Score: -0.9 (/)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.9 (-)

This is a multi-part message in MIME format.
--------------sHXGqMYlVXefA1o6sqxFpSea
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

On 26/01/2023 20:07, Dmitry Gutov wrote:
> One could hope to avoid recreating the list of predicates on every 
> match, but that seems to be a limitation of the TS API: 
> ts_query_predicates_for_pattern requires a second argument, 
> match.pattern_index. Maybe we could memoize that, though?

Speaking of memoization, here is a POC patch.

It's a definite improvement: with the attached :match almost reaches the 
performance of :pred. Not sure why it's still not faster, though.

(I also tried a more comprehensive memoization using a hash table, the 
resulting performance was slightly worse.)
--------------sHXGqMYlVXefA1o6sqxFpSea
Content-Type: text/x-patch; charset=UTF-8; name="memoize_simple.diff"
Content-Disposition: attachment; filename="memoize_simple.diff"
Content-Transfer-Encoding: base64

ZGlmZiAtLWdpdCBhL3NyYy90cmVlc2l0LmMgYi9zcmMvdHJlZXNpdC5jCmluZGV4IDkxN2Ri
NTgyNjc2Li42OWY1NDk3NjUwOSAxMDA2NDQKLS0tIGEvc3JjL3RyZWVzaXQuYworKysgYi9z
cmMvdHJlZXNpdC5jCkBAIC0yNzIyLDYgKzI3MjIsNyBAQCBERUZVTiAoInRyZWVzaXQtcXVl
cnktY2FwdHVyZSIsCiAgICAgIGJvdHRsZW5lY2sgKDk4LjQlIG9mIHRoZSBydW5uaW5nIHRp
bWUgc3BlbnQgb24gbmNvbmMpLiAgKi8KICAgTGlzcF9PYmplY3QgcmVzdWx0ID0gUW5pbDsK
ICAgTGlzcF9PYmplY3QgcHJldl9yZXN1bHQgPSByZXN1bHQ7CisgIExpc3BfT2JqZWN0IHBy
ZWRpY2F0ZXNfZm9yXzAgPSBOVUxMOwogICB3aGlsZSAodHNfcXVlcnlfY3Vyc29yX25leHRf
bWF0Y2ggKGN1cnNvciwgJm1hdGNoKSkKICAgICB7CiAgICAgICAvKiBSZWNvcmQgdGhlIGNo
ZWNrcG9pbnQgdGhhdCB3ZSBtYXkgcm9sbCBiYWNrIHRvLiAgKi8KQEAgLTI3NTAsOSArMjc1
MSwxOCBAQCBERUZVTiAoInRyZWVzaXQtcXVlcnktY2FwdHVyZSIsCiAJICByZXN1bHQgPSBG
Y29ucyAoY2FwLCByZXN1bHQpOwogCX0KICAgICAgIC8qIEdldCBwcmVkaWNhdGVzLiAgKi8K
LSAgICAgIExpc3BfT2JqZWN0IHByZWRpY2F0ZXMKLQk9IHRyZWVzaXRfcHJlZGljYXRlc19m
b3JfcGF0dGVybiAodHJlZXNpdF9xdWVyeSwKLQkJCQkJICBtYXRjaC5wYXR0ZXJuX2luZGV4
KTsKKyAgICAgIExpc3BfT2JqZWN0IHByZWRpY2F0ZXM7CisgICAgICBpZiAobWF0Y2gucGF0
dGVybl9pbmRleCA9PSAwKQorCXsKKwkgIGlmIChwcmVkaWNhdGVzX2Zvcl8wID09IE5VTEwp
CisJICAgIHByZWRpY2F0ZXNfZm9yXzAgPSB0cmVlc2l0X3ByZWRpY2F0ZXNfZm9yX3BhdHRl
cm4gKHRyZWVzaXRfcXVlcnksIDApOworCisJICBwcmVkaWNhdGVzID0gcHJlZGljYXRlc19m
b3JfMDsKKwl9CisgICAgICBlbHNlCisJeworCSAgcHJlZGljYXRlcyA9IHRyZWVzaXRfcHJl
ZGljYXRlc19mb3JfcGF0dGVybiAodHJlZXNpdF9xdWVyeSwgbWF0Y2gucGF0dGVybl9pbmRl
eCk7CisJfQogCiAgICAgICAvKiBjYXB0dXJlc19saXNwID0gRm5yZXZlcnNlIChjYXB0dXJl
c19saXNwKTsgKi8KICAgICAgIHN0cnVjdCBjYXB0dXJlX3JhbmdlIGNhcHR1cmVzX3Jhbmdl
ID0geyByZXN1bHQsIHByZXZfcmVzdWx0IH07Cg==

--------------sHXGqMYlVXefA1o6sqxFpSea--




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 20:01:37 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 26 15:01:37 2023
Received: from localhost ([127.0.0.1]:36262 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pL8RF-0001vr-8J
	for submit <at> debbugs.gnu.org; Thu, 26 Jan 2023 15:01:37 -0500
Received: from eggs.gnu.org ([209.51.188.92]:45500)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1pL8RD-0001ve-Fd
 for 60953 <at> debbugs.gnu.org; Thu, 26 Jan 2023 15:01:36 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pL8R7-0006vy-Sx; Thu, 26 Jan 2023 15:01:30 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=jh/ylYjk4N1pyci2YZyOauxjzn6+c99gp03GFmTG+ac=; b=QPo8V3v4uW8I
 i/BdOA+Gd3og27Qxv2NJjDPJQTW3Q5CV+bXas4hIvT6YNCD9eyvOtetPj5txdiaUB9FkYW2TGOkyU
 gRKE5S/C60sBRn3af01GjNXwg5gu3YQmIkxTBK4SSeMNXghFOfznzw/kJARDh8f3d4qbWx3wnfWvb
 MBhLNCrOnGFGd2+sd9oGsfEhCHHs6rCkTXnuGVANA7D/pPU4nbMTDKAjvDFnTXxl25Bdt4QDcKmpz
 1VycX4XX9/ZH2HVwFtc+G109UT75ouWJAiuda0ivAqPKou9mpDJG76sJsS5D26PrB0ewQYYzzjHjU
 eRLpFVYWXkt6JwnoTUM6dA==;
Received: from [87.69.77.57] (helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pL8R4-0006Rm-CA; Thu, 26 Jan 2023 15:01:28 -0500
Date: Thu, 26 Jan 2023 22:01:14 +0200
Message-Id: <83pmb1cbg5.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> (message from
 Dmitry Gutov on Thu, 26 Jan 2023 21:35:55 +0200)
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
 <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN>
 <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> Date: Thu, 26 Jan 2023 21:35:55 +0200
> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
> From: Dmitry Gutov <dgutov@HIDDEN>
> 
> > If you are saying that GC is responsible, then running the benchmark
> > with gc-cons-threshold set to most-positive-fixnum should produce a
> > more interesting profile and perhaps a more interesting comparison.
> 
> That really helps:
> 
> (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (let 
> (treesit--font-lock-fast-mode) (font-lock-ensure))))
> 
> => (16.078430587 251 5.784299419999996)
> 
> (let ((gc-cons-threshold most-positive-fixnum)) (benchmark-run 1000 
> (progn (font-lock-mode -1) (font-lock-mode 1) (let 
> (treesit--font-lock-fast-mode) (font-lock-ensure)))))
> 
> => (10.369389725 0 0.0)
> 
> Do you want a perf profile for the latter? It might not be very useful.

I'd be interested in comparing the profiles of the two techniques, the
:pred and the :match, with GC disabled like that.

> > (But I thought you concluded that GC alone cannot explain the
> > difference in performance?)
> 
> I'm inclined to think the difference is related to copying of the regexp 
> string, but whether the time is spent in actually copying it, or 
> scanning its copies for garbage later, it was harder to say. Seems like 
> it's the latter, though.

If we can avoid the copying, I think it's desirable in any case.  They
are constant regexps, aren't they?




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 19:36:06 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 26 14:36:06 2023
Received: from localhost ([127.0.0.1]:36241 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pL82Y-0001Cc-2h
	for submit <at> debbugs.gnu.org; Thu, 26 Jan 2023 14:36:06 -0500
Received: from mail-ej1-f42.google.com ([209.85.218.42]:36516)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <raaahh@HIDDEN>) id 1pL82V-0001C8-MD
 for 60953 <at> debbugs.gnu.org; Thu, 26 Jan 2023 14:36:04 -0500
Received: by mail-ej1-f42.google.com with SMTP id kt14so7986388ejc.3
 for <60953 <at> debbugs.gnu.org>; Thu, 26 Jan 2023 11:36:03 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:from:to:cc:subject:date:message-id:reply-to;
 bh=uoMPe98iMFQ+ydLt/Iw2fyylgcLnQkR3iEyT4x7zKfw=;
 b=d8Q+2gUl9emJ1ki63OJGVu+aaJMeiIrX7kKBpkxHZy21ZG+OXp3wFSPq7hlTrPHqRu
 4EpU59dFEHooq90cZWmrtbcc1tPfEz8r0KFnXpdACaPjCd1EyaGK1/dARUGRWaznfWaQ
 feT6QnkoJ5dMyEIVKc/5Z03qh9dOLdKafzSBxN88afDQFpfpFlzGRQcWheJllJmBCyIg
 21sKQiCt+bt1gEQlxynqaQL6ZWWdcxQlkDRFt2zFOTJNaVle4jmjmPKH6CKSj/5g3lvc
 xMmS3ybZUQjNKw9Isg+3p+RzOH7psUrmbReE29JCOdNX7Xcya4eLmfXul4LqLLdaM7xG
 9Rrw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=uoMPe98iMFQ+ydLt/Iw2fyylgcLnQkR3iEyT4x7zKfw=;
 b=zFJLBt5EhXAU3on3lVhAbgtk+sGfWGzx5W6mrxR0o+xYcNLiivVNb+r2E9IN69udQ+
 aLYev7lZS7yFtx90rDDkUvkJej3JiBnizdYX/jSgJIyGuKSEpgePWb/trGIIsbMyvU7i
 e7iyzf01qgCpdxp0FEuchUqPWzzNWATlO+YpedDpBJcP+IQrp0C1pQwpiO5EvXj4wBrA
 V1rkPa3QhTsUGdxTrxmue0nhhuMple+FEANcU9/b8gcJmmT2T9uLPK6hsVE36Gmo35SQ
 Fe4Fr8/vF6cZDIihAfkjAmZnBiOeeFOV61Do3BfHR4p3H7be+nKiR6VpD35TLlHj9KLr
 eqMA==
X-Gm-Message-State: AFqh2kpin718+4YMWZCMwLsKiDw3nBHsdnji4szQw153n1Sn/n2/nQeF
 FL3YGOUuXEPKiUWS6T6iTmQ=
X-Google-Smtp-Source: AMrXdXuG1JW8ocRKBUZRz5rG7Djq9Zw4aFvjde0YmvggdX18SraYKeW2feoP79SevEk3exBGbCrd6A==
X-Received: by 2002:a17:907:d089:b0:7ad:aed7:a5da with SMTP id
 vc9-20020a170907d08900b007adaed7a5damr42422134ejc.28.1674761757744; 
 Thu, 26 Jan 2023 11:35:57 -0800 (PST)
Received: from [10.115.253.32] ([138.199.34.134])
 by smtp.googlemail.com with ESMTPSA id
 t24-20020a170906269800b007c16e083b01sm1030770ejc.9.2023.01.26.11.35.56
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Thu, 26 Jan 2023 11:35:57 -0800 (PST)
Message-ID: <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN>
Date: Thu, 26 Jan 2023 21:35:55 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
Content-Language: en-US
To: Eli Zaretskii <eliz@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
 <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN>
From: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <83sffxcfxw.fsf@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: -0.9 (/)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.9 (-)

On 26/01/2023 20:24, Eli Zaretskii wrote:
>> Date: Thu, 26 Jan 2023 19:15:51 +0200
>> Cc:60953 <at> debbugs.gnu.org
>> From: Dmitry Gutov<dgutov@HIDDEN>
>>
>> On 26/01/2023 10:10, Eli Zaretskii wrote:
>>> Perhaps Dmitry could present comparison of profiles from perf which
>>> would allow us to understand the reason(s)?
>> I believe I did that in the second message in this thread:
>> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60953#8
>>
>> To quote the specific profiles, it's
>>
>>     15.30%  emacs         libtree-sitter.so.0.0       [.]
>> ts_tree_cursor_current_status
>>     14.92%  emacs         emacs                       [.] process_mark_stack
>>      9.75%  emacs         libtree-sitter.so.0.0       [.]
>> ts_tree_cursor_goto_next_sibling
>>      8.90%  emacs         libtree-sitter.so.0.0       [.]
>> ts_tree_cursor_goto_first_child
>>      3.87%  emacs         libtree-sitter.so.0.0       [.] ts_node_start_point
>>
>> for :pred vs.
>>
>>     23.72%  emacs         emacs                    [.] process_mark_stack
>>     12.33%  emacs         libtree-sitter.so.0.0    [.]
>> ts_tree_cursor_current_status
>>      7.96%  emacs         libtree-sitter.so.0.0    [.]
>> ts_tree_cursor_goto_next_sibling
>>      7.38%  emacs         libtree-sitter.so.0.0    [.]
>> ts_tree_cursor_goto_first_child
>>      3.37%  emacs         libtree-sitter.so.0.0    [.] ts_node_start_point
>>
>> for :match.
>>
>> And to continue the quote:
>>
>>     Here's a significant jump in GC time which is almost the same as the
>>     difference in runtime. And all of it is spent marking?
>>
>>     I suppose if the problem is allocation of a large string (many times
>>     over), the GC could be spending a lot of time scanning through the
>>     memory. Could this be avoided by passing some substitute handle to TS,
>>     instead of the full string? E.g. some kind of reference to it in the
>>     regexp cache.
> If you are saying that GC is responsible, then running the benchmark
> with gc-cons-threshold set to most-positive-fixnum should produce a
> more interesting profile and perhaps a more interesting comparison.

That really helps:

(benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (let 
(treesit--font-lock-fast-mode) (font-lock-ensure))))

=> (16.078430587 251 5.784299419999996)

(let ((gc-cons-threshold most-positive-fixnum)) (benchmark-run 1000 
(progn (font-lock-mode -1) (font-lock-mode 1) (let 
(treesit--font-lock-fast-mode) (font-lock-ensure)))))

=> (10.369389725 0 0.0)

Do you want a perf profile for the latter? It might not be very useful.

> (But I thought you concluded that GC alone cannot explain the
> difference in performance?)

I'm inclined to think the difference is related to copying of the regexp 
string, but whether the time is spent in actually copying it, or 
scanning its copies for garbage later, it was harder to say. Seems like 
it's the latter, though.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 18:24:41 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 26 13:24:40 2023
Received: from localhost ([127.0.0.1]:36157 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pL6vN-0007d2-AN
	for submit <at> debbugs.gnu.org; Thu, 26 Jan 2023 13:24:40 -0500
Received: from eggs.gnu.org ([209.51.188.92]:44046)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1pL6vL-0007cp-TO
 for 60953 <at> debbugs.gnu.org; Thu, 26 Jan 2023 13:24:36 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pL6v8-0001vF-HP; Thu, 26 Jan 2023 13:24:29 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=Qmi3YOXcojTNp5jsRt/GIVG3BHQUuoEkta292bTOkF8=; b=NW1CoQgMdg0u
 VHFjT1nEIpqgVOCURhiI9aRwQv6DIfz/I94mdYn/yaoR9q7BrxwNNNcTfpfQqz4NmWvaukpWqxhbd
 seh1mEOL4FwaM2U/0aR5/X0SAu2Gae2e+Y8GPTlLiTCC4hB4GOJMVADXurt43PtzFpV4bWBmxyNwn
 bOro9v3XSU6iyGcgEk8aKoFKs3Lyo213dMejSAzzRs/zXrZ/5xHeUE+ZsmtNHyhgT1ACtKJknVFLb
 XnmOHFNsaooxAA/2hxzVbd8cfRz1ky4F4BHS0RArWeCyE5Mwk1Vz5c++Oxygznvx19HVbSSLTl4wm
 i6AmGjxa5uNBwL+EU79jyg==;
Received: from [87.69.77.57] (helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pL6v7-0008VX-If; Thu, 26 Jan 2023 13:24:21 -0500
Date: Thu, 26 Jan 2023 20:24:11 +0200
Message-Id: <83sffxcfxw.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> (message from
 Dmitry Gutov on Thu, 26 Jan 2023 19:15:51 +0200)
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
 <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> Date: Thu, 26 Jan 2023 19:15:51 +0200
> Cc: 60953 <at> debbugs.gnu.org
> From: Dmitry Gutov <dgutov@HIDDEN>
> 
> On 26/01/2023 10:10, Eli Zaretskii wrote:
> > Perhaps Dmitry could present comparison of profiles from perf which
> > would allow us to understand the reason(s)?
> 
> I believe I did that in the second message in this thread: 
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60953#8
> 
> To quote the specific profiles, it's
> 
>    15.30%  emacs         libtree-sitter.so.0.0       [.]
> ts_tree_cursor_current_status
>    14.92%  emacs         emacs                       [.] process_mark_stack
>     9.75%  emacs         libtree-sitter.so.0.0       [.]
> ts_tree_cursor_goto_next_sibling
>     8.90%  emacs         libtree-sitter.so.0.0       [.]
> ts_tree_cursor_goto_first_child
>     3.87%  emacs         libtree-sitter.so.0.0       [.] ts_node_start_point
> 
> for :pred vs.
> 
>    23.72%  emacs         emacs                    [.] process_mark_stack
>    12.33%  emacs         libtree-sitter.so.0.0    [.]
> ts_tree_cursor_current_status
>     7.96%  emacs         libtree-sitter.so.0.0    [.]
> ts_tree_cursor_goto_next_sibling
>     7.38%  emacs         libtree-sitter.so.0.0    [.]
> ts_tree_cursor_goto_first_child
>     3.37%  emacs         libtree-sitter.so.0.0    [.] ts_node_start_point
> 
> for :match.
> 
> And to continue the quote:
> 
>    Here's a significant jump in GC time which is almost the same as the
>    difference in runtime. And all of it is spent marking?
> 
>    I suppose if the problem is allocation of a large string (many times
>    over), the GC could be spending a lot of time scanning through the
>    memory. Could this be avoided by passing some substitute handle to TS,
>    instead of the full string? E.g. some kind of reference to it in the
>    regexp cache.

If you are saying that GC is responsible, then running the benchmark
with gc-cons-threshold set to most-positive-fixnum should produce a
more interesting profile and perhaps a more interesting comparison.
(But I thought you concluded that GC alone cannot explain the
difference in performance?)

Otherwise, the profiles are too similar to support any conclusions,
and the fact that process_mark_stack is in a prominent place doesn't
help.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 18:07:40 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 26 13:07:40 2023
Received: from localhost ([127.0.0.1]:36142 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pL6ex-0007As-P0
	for submit <at> debbugs.gnu.org; Thu, 26 Jan 2023 13:07:40 -0500
Received: from mail-wm1-f52.google.com ([209.85.128.52]:51815)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <raaahh@HIDDEN>) id 1pL6ew-0007Ac-7U
 for 60953 <at> debbugs.gnu.org; Thu, 26 Jan 2023 13:07:38 -0500
Received: by mail-wm1-f52.google.com with SMTP id fl24so1715489wmb.1
 for <60953 <at> debbugs.gnu.org>; Thu, 26 Jan 2023 10:07:38 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:from:to:cc:subject:date:message-id:reply-to;
 bh=M2pH88xOb9V43g6yrp2i59Thp/jKnq2pxz+qExBtkQQ=;
 b=OQABbPBv4fDrd+EREjWAvCYCcN/vfadA/FzVWW2j/orpJS93Lk3smf/13imzGake6Z
 KsNA33TghsspuQHZtYtb92QKLdIoP3lwgSe4a84DALTOBSOQFkjw7dXaDhB1QSD6456K
 4+aDI9z8s1T+GeI2OJEXyzyIPWSPY4k8aO7/DMlIuAnpozvZoTmm482e8dgUjW48V2bI
 sZhpWYLYcBrrdrl+TCIHjGuzn7Jab6dcc7AmAIrnM9MKEgthW0AJgn0c06Eh/B70YpDF
 elLuIpUevCXWgLUY28jGrwbQ7xDxNPyFvV9wHHfuxmtoCgLbmU483f/iKboRtMQaC6Ay
 /BHg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=M2pH88xOb9V43g6yrp2i59Thp/jKnq2pxz+qExBtkQQ=;
 b=vaFL+ZPRP6N6vjFCAaTuDhtqiy5gfidbQS8cpHNHYgJ1oHbUmLepBcD2tEqLhjuSfu
 5Ienq9qQPpZ8aAUN9Q8cnad9H1qzQRk5MPSsnCgxopVx3ZVkBk3dLOeQBfGLOkO+a9O5
 QPzrlxpBlSS/bGAy6YdvIUeWjYtfxZtLau+zO1AnPkoGIyb+FyAJ3M7h3/OjpETNhX7i
 +MSTh16uGfukf9/pVauoqhG8vhC6pJgHsCngx/7xIaZPN1meA5ZnfhlQL1+mGwsB4amD
 F+tagnJ+xZ67FWIh1gl1nT8e2V6HiEgSQLYlPCK/uOZpzrbm8ejANYBJJb20HOWt3dN5
 apbw==
X-Gm-Message-State: AFqh2koBdT+Imi6TPq6GAepXwIWbQ3UzmNa+qLKub3Ex5+TOnkszXeph
 M39daiSz8/s8mDwI9QMZWy8=
X-Google-Smtp-Source: AMrXdXs9Lb31+THfr5sq3oJDwuXys4R0YMMcmnlct1OMEr9FDUJ3V9KMhZko+xPA0nJQ+7cI/tRj7Q==
X-Received: by 2002:a05:600c:1e1d:b0:3cf:674a:aefe with SMTP id
 ay29-20020a05600c1e1d00b003cf674aaefemr35976420wmb.22.1674756452289; 
 Thu, 26 Jan 2023 10:07:32 -0800 (PST)
Received: from [192.168.0.2] ([46.251.119.176])
 by smtp.googlemail.com with ESMTPSA id
 18-20020a05600c26d200b003da28dfdedcsm2473698wmv.5.2023.01.26.10.07.31
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Thu, 26 Jan 2023 10:07:31 -0800 (PST)
Message-ID: <31559c1f-1a12-691d-3d03-f566019a0aab@HIDDEN>
Date: Thu, 26 Jan 2023 20:07:30 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
Content-Language: en-US
To: Eli Zaretskii <eliz@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
From: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <838rhpg57n.fsf@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.1 (/)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.9 (-)

On 26/01/2023 08:50, Eli Zaretskii wrote:
>> Date: Thu, 26 Jan 2023 01:21:08 +0200
>> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
>> From: Dmitry Gutov <dgutov@HIDDEN>
>>
>> Thank you. Unfortunately, the performance improvement from this patch is
>> still fairly negligible.
> 
> This is quite strange, since all of the approaches basically use the
> same primitives under the hood.  Perhaps the reason for the slowness
> is that the code which computes the text span of a node is slow?

That code seems to be the same between the two options: 
treesit_predicate_capture_name_to_text basically does the same as 
treesit-node-text (except in C) after iterating through a Lisp list to 
find the node. ruby-ts--builtin-method-p calls treesit-node-text.

And treesit_predicate_pred does the same iteration, so the :pred option 
should just be slower, due to Lisp-related overhead. funcalls and stuff.

> Otherwise, I must be missing something here, since the rest of the
> code on the C level is basically the same, give or take some wrappers
> that should not change the overall picture.

The query object is smaller, though. That's basically my only remaining 
hypothesis.

>> Switching to using :pred with function (like I did in commit
>> d94dc606a0934) which still uses buffer-substring inside is significantly
>> faster.
> 
> If the performance issue is fixed, then the only aspect that we should
> perhaps try to improve is consing.

I wouldn't say it's "fixed", just improved. And :match really should be 
able to be made faster than :pred, since it'll probably be used for 
similar cases (where a lot/most of nodes match).

There seems to be a fair amount of consing going on inside 
treesit-query-capture already: we wrap every TS node in our objects, we 
turn the captured nodes into a Lisp alist, and we turn the predicates 
into a list, turning the strings into "our" strings. The 'make_string' 
function creates a new copy in the memory, right?

One could hope to avoid recreating the list of predicates on every 
match, but that seems to be a limitation of the TS API: 
ts_query_predicates_for_pattern requires a second argument, 
match.pattern_index. Maybe we could memoize that, though?

In any case, that seems to explain why adding or avoiding one 
buffer-substring call per match isn't moving the needle very much.

> Consing a string each time you
> need to fontify increases the GC pressure, so if there's a good way of
> avoiding that without performance degradation, we should take it.  Is
> it possible to use your :pred technique in a way that doesn't need to
> produce strings from buffer text?

The only version I managed to get some (very minor) performance 
improvement is this:

(defun ruby-ts--builtin-method-p (node)
   (goto-char (treesit-node-start node))
   (let ((inhibit-changing-match-data t))
     (re-search-forward ruby-ts--builtin-methods (treesit-node-end node) 
t)))

The improvement is like 200-300ms, whereas the difference between :match 
and :pred in this benchmark is several seconds.

And if I try to bring it back to 100% correctness, to ensure that the 
whole of node text is matched, I have to use narrowing (and string-start 
and string-end anchors in regexp):

(defvar ruby-ts--builtin-methods
   (format "\\`%s\\'" (regexp-opt (append ruby-builtin-methods-no-reqs
                                          ruby-builtin-methods-with-reqs)))
   "Ruby built-in methods.")

(defun ruby-ts--builtin-method-p (node)
   (save-restriction
     (goto-char (treesit-node-start node))
     (narrow-to-region (point) (treesit-node-end node))
     (let ((inhibit-changing-match-data t))
       (re-search-forward ruby-ts--builtin-methods nil t))))

And with that, the performance is again no better than the current 
version. If I also add save-excursion, it's worse.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 17:16:02 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 26 12:16:02 2023
Received: from localhost ([127.0.0.1]:36083 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pL5qz-0005mS-ID
	for submit <at> debbugs.gnu.org; Thu, 26 Jan 2023 12:16:01 -0500
Received: from mail-ej1-f51.google.com ([209.85.218.51]:35499)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <raaahh@HIDDEN>) id 1pL5qx-0005m5-ES
 for 60953 <at> debbugs.gnu.org; Thu, 26 Jan 2023 12:16:00 -0500
Received: by mail-ej1-f51.google.com with SMTP id rl14so6968480ejb.2
 for <60953 <at> debbugs.gnu.org>; Thu, 26 Jan 2023 09:15:59 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:from:to:cc:subject:date:message-id:reply-to;
 bh=/8JpnlPbbX1T4t/TLHZ4sc3Y+1T/wLznUvwd/XH3M8c=;
 b=hF/xBfZybUxCX8gixm2lLrvMo3OlmYqCKjCrraM7NV5Wh0twspPUxi6tYgm7zhK7MB
 +ctSH1iaSao3Psc2LQuZELKFTpaxEj9cvmend/HXwtTjZwnMdrf0cRCK0iGg3D1GvVhy
 IvIQX4RY4qb9M/+o7uRADeQkp5tSzZjuFw73Gwnn1sXGiKRdvPx4V4W+BmIOznQzlFdt
 6u7Z/okBK90EAFoORJ5SuuRmVGjpP6PaN8LSLN3VSHKv+sbO1hWRi9UMYsMWQ5yblF2Q
 t3PT99EASHPqnumUkr7TRBv2RQyO3x8PD3KMRm0nLSd07YOmv4gGnnzm9tBwNC+JfhNn
 gK2w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=/8JpnlPbbX1T4t/TLHZ4sc3Y+1T/wLznUvwd/XH3M8c=;
 b=XEzsd3vsmEqBMnuXkhyAyFas+NtLAK+eEo0sS0ZXvx2cqf+jIumMPBU6nU09G/uJB8
 HvhvOuakKmhTO+KBk0lUCS7aR5d738Mn+4Ppc3niyK6p/mvW+bbAtjAbOWmlaUhcaiEX
 XEr8ybRaOTeQM8KnWqeMTdAjAszc6vOENkTSt1U/KMWzi9qw1ol1yQZP62kkt8wNdX/k
 G9IHegeS2BWL/salps0b/YBWylji73gkA3THduAknHEDPXaJqxbt7Hv2yVP7HuXrlclk
 UOAy+7PkBbOErcIt2fa50L8dfqNqtEza9fzfqJTE7KAfPWwfqqye/0LkfbIsZn1dSC7X
 SNtA==
X-Gm-Message-State: AFqh2kpgMHuUZFChzb/7C5K6Zb8KRhsdxXUXMqiRZasQP4fCq/g3u6dF
 dng19Me0spK5VUCHIxOQZQw=
X-Google-Smtp-Source: AMrXdXuklpiKLg0ZuwjkGrOhdM5cvpzdvqTt5IkE7Iys4kDX6CyVbkeJes6U/ys77MgYD7dPlyHqrg==
X-Received: by 2002:a17:907:6021:b0:843:a9fe:f115 with SMTP id
 fs33-20020a170907602100b00843a9fef115mr34561411ejc.32.1674753353920; 
 Thu, 26 Jan 2023 09:15:53 -0800 (PST)
Received: from [192.168.0.2] ([46.251.119.176])
 by smtp.googlemail.com with ESMTPSA id
 kv15-20020a17090778cf00b007bd28b50305sm853625ejc.200.2023.01.26.09.15.52
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Thu, 26 Jan 2023 09:15:53 -0800 (PST)
Message-ID: <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN>
Date: Thu, 26 Jan 2023 19:15:51 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
Content-Language: en-US
To: Eli Zaretskii <eliz@HIDDEN>, Yuan Fu <casouri@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN>
From: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <83pmb1emxi.fsf@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: -0.9 (/)
X-Debbugs-Envelope-To: 60953
Cc: 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.9 (-)

On 26/01/2023 10:10, Eli Zaretskii wrote:
>> From: Yuan Fu<casouri@HIDDEN>
>> Date: Wed, 25 Jan 2023 23:17:25 -0800
>> Cc: Dmitry Gutov<dgutov@HIDDEN>,
>>   60953 <at> debbugs.gnu.org
>>
>>>> Switching to using :pred with function (like I did in commit
>>>> d94dc606a0934) which still uses buffer-substring inside is significantly
>>>> faster.
>>> If the performance issue is fixed, then the only aspect that we should
>>> perhaps try to improve is consing.  Consing a string each time you
>>> need to fontify increases the GC pressure, so if there's a good way of
>>> avoiding that without performance degradation, we should take it.  Is
>>> it possible to use your :pred technique in a way that doesn't need to
>>> produce strings from buffer text?
>> Why is :pred more performant though? They just use string-match-p. If anything, the :pred predicates should be more expensive, since they execute lisp functions and conses tree-sitter nodes into lisp objects.
> Yes, exactly my thoughts.
> 
> Perhaps Dmitry could present comparison of profiles from perf which
> would allow us to understand the reason(s)?

I believe I did that in the second message in this thread: 
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60953#8

To quote the specific profiles, it's

   15.30%  emacs         libtree-sitter.so.0.0       [.]
ts_tree_cursor_current_status
   14.92%  emacs         emacs                       [.] process_mark_stack
    9.75%  emacs         libtree-sitter.so.0.0       [.]
ts_tree_cursor_goto_next_sibling
    8.90%  emacs         libtree-sitter.so.0.0       [.]
ts_tree_cursor_goto_first_child
    3.87%  emacs         libtree-sitter.so.0.0       [.] ts_node_start_point

for :pred vs.

   23.72%  emacs         emacs                    [.] process_mark_stack
   12.33%  emacs         libtree-sitter.so.0.0    [.]
ts_tree_cursor_current_status
    7.96%  emacs         libtree-sitter.so.0.0    [.]
ts_tree_cursor_goto_next_sibling
    7.38%  emacs         libtree-sitter.so.0.0    [.]
ts_tree_cursor_goto_first_child
    3.37%  emacs         libtree-sitter.so.0.0    [.] ts_node_start_point

for :match.

And to continue the quote:

   Here's a significant jump in GC time which is almost the same as the
   difference in runtime. And all of it is spent marking?

   I suppose if the problem is allocation of a large string (many times
   over), the GC could be spending a lot of time scanning through the
   memory. Could this be avoided by passing some substitute handle to TS,
   instead of the full string? E.g. some kind of reference to it in the
   regexp cache.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 17:12:14 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 26 12:12:14 2023
Received: from localhost ([127.0.0.1]:36073 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pL5nK-0005gL-94
	for submit <at> debbugs.gnu.org; Thu, 26 Jan 2023 12:12:14 -0500
Received: from mail-ej1-f54.google.com ([209.85.218.54]:37674)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <raaahh@HIDDEN>) id 1pL5nI-0005g8-VP
 for 60953 <at> debbugs.gnu.org; Thu, 26 Jan 2023 12:12:13 -0500
Received: by mail-ej1-f54.google.com with SMTP id ud5so6900894ejc.4
 for <60953 <at> debbugs.gnu.org>; Thu, 26 Jan 2023 09:12:12 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:from:to:cc:subject:date:message-id:reply-to;
 bh=go50RHfZJjwyYHgB6J0z/W4Lq43QyL4Jf2dlZUIrPP0=;
 b=InAibHfsCrvQfPsVBV/iM2cSZRVLLFyYDYgxl44Q9unbH5GdA91agKf3xxJkE3PEHF
 LBszO5L5uATU8yMFTjYiK9+N3RXXoNn1ccFvMFoFohyJcygBUHoQ29SgziRmZkvViNwc
 BrkrO/hO2sKb5kI6xoYcoRjLlAxRQdQwJAMGZwkMtUpnsFYcYk+bEEiLXr5B3NYTswnK
 Zbg65yISQM9/CmGqIxJF9AbZAYHO7yiFV0+ER+/rnXm2AcLHCGlQB01TuBy+jUQjT3VX
 Xlz7/iQf8UWF22tJRwGUO1nFdeitJKBnymG3QdjzxvSI/v44GAQibmqHDEunscTSex/J
 28Og==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=go50RHfZJjwyYHgB6J0z/W4Lq43QyL4Jf2dlZUIrPP0=;
 b=cq1LkNQJ5Ee35FceEf6C5fF9Abj8Iz3q6L54r2YMYBs7UhBh9IDwx/7fyTgRx/l4bU
 RZvTXqPPoxVuCvTVAnZ3S6ooW/J71YwWor50+m47xU+t6G/lEcUVY2stBDxc67lQ6Yye
 cN0HHIHOClqdTIytsrXTd0k9hxnTqdd3eOa2eX13ylXCECrYVctIT3OgKG9ndQWQhRy/
 HF+6HXIjjA0CiLjKg4cPsWhd9uZbntx2xZOkQrk8WRBtvsQ0C0H/vvUzvO2BCNMBXtel
 0yWfei+8dS8ItzrILXh5qkBLg/IByGYVCXNkgkuYRCYA5IAx3uR/nClhk/R4bjbhnVSg
 e+fA==
X-Gm-Message-State: AFqh2koOnqPOmMMLjbTmgwum2211n3eibPoxRZEDjlRlb8HMeq23KfgO
 dKFcdbvi8lrlLOHpZ4jhKJs=
X-Google-Smtp-Source: AMrXdXvuBslwLyUQXmrpBn1zhkd0MxBqyAD94kZamVAOEjoCIZoQznXSoz2E2gh555EEjF443EpgUQ==
X-Received: by 2002:a17:907:8d18:b0:7c0:d6b6:1ee9 with SMTP id
 tc24-20020a1709078d1800b007c0d6b61ee9mr42763950ejc.11.1674753127032; 
 Thu, 26 Jan 2023 09:12:07 -0800 (PST)
Received: from [192.168.0.2] ([46.251.119.176])
 by smtp.googlemail.com with ESMTPSA id
 z4-20020a170906714400b0087223b8d6efsm872369ejj.16.2023.01.26.09.12.06
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Thu, 26 Jan 2023 09:12:06 -0800 (PST)
Message-ID: <62d8ea72-3cf6-7c1a-4fce-53f5ee435215@HIDDEN>
Date: Thu, 26 Jan 2023 19:12:05 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
Content-Language: en-US
To: Yuan Fu <casouri@HIDDEN>, Eli Zaretskii <eliz@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN>
From: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: -0.9 (/)
X-Debbugs-Envelope-To: 60953
Cc: 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.9 (-)

On 26/01/2023 09:17, Yuan Fu wrote:
> If anything, the :pred predicates should be more expensive, since they execute lisp functions and conses tree-sitter nodes into lisp objects.

Doesn't the :match predicate "cons tree-sitter nodes into lisp objects"?

IIUC the list of captures is produced inside treesit-query-capture 
exactly the same way before the predicates are processed -- whether they 
are :pred, or :match, or a combination.

But indeed, I (and most other users) would expect :match to be faster 
than :pred if the predicate does a regexp check anyway. That's the 
essence of this bug report.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 08:10:08 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 26 03:10:08 2023
Received: from localhost ([127.0.0.1]:60592 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pKxKi-0006Mi-DM
	for submit <at> debbugs.gnu.org; Thu, 26 Jan 2023 03:10:08 -0500
Received: from eggs.gnu.org ([209.51.188.92]:47522)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1pKxKg-0006M5-Lo
 for 60953 <at> debbugs.gnu.org; Thu, 26 Jan 2023 03:10:07 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pKxKb-00037F-ER; Thu, 26 Jan 2023 03:10:01 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=0pFttqP5ydyYtKWicBVAhbzETeqCqmA+Ytx8shNzBEI=; b=XSC1roB0DPpf
 xG/ZmIOKVB7dKGIhBRNRHBCN1x4u0Vfxo3jeG4DuO4qeUPqC72b2ISNYEL/rduHRbYQqNZlKfJvXP
 1T8I1+8c6Ma4nL1Xbjqecds8834pwpN3EIWj4cLRDjiilDwo2s0qP0zPB22djnGa65L/ZP4Zs7z82
 n4ubYxH1FnnX7v4bJP/yoMzC72CMN32gpu02VizRibvcbjjoZnYQPnVyQuu4i8Z5+1Te55zlcbiCe
 kLFJ3EpROAj8TH0oTQ+GKwm8ciLULJmEFf4tTBEHT7hAx+nB5ZFQoe46HsJRJX2nTslQGr3e+Rk4O
 99zY49OgkXsreSMkGGiaXA==;
Received: from [87.69.77.57] (helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pKxKa-00047H-P8; Thu, 26 Jan 2023 03:10:01 -0500
Date: Thu, 26 Jan 2023 10:10:17 +0200
Message-Id: <83pmb1emxi.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Yuan Fu <casouri@HIDDEN>
In-Reply-To: <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> (message from
 Yuan Fu on Wed, 25 Jan 2023 23:17:25 -0800)
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> 
 <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 60953
Cc: 60953 <at> debbugs.gnu.org, dgutov@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Yuan Fu <casouri@HIDDEN>
> Date: Wed, 25 Jan 2023 23:17:25 -0800
> Cc: Dmitry Gutov <dgutov@HIDDEN>,
>  60953 <at> debbugs.gnu.org
> 
> >> Switching to using :pred with function (like I did in commit 
> >> d94dc606a0934) which still uses buffer-substring inside is significantly 
> >> faster.
> > 
> > If the performance issue is fixed, then the only aspect that we should
> > perhaps try to improve is consing.  Consing a string each time you
> > need to fontify increases the GC pressure, so if there's a good way of
> > avoiding that without performance degradation, we should take it.  Is
> > it possible to use your :pred technique in a way that doesn't need to
> > produce strings from buffer text?
> 
> Why is :pred more performant though? They just use string-match-p. If anything, the :pred predicates should be more expensive, since they execute lisp functions and conses tree-sitter nodes into lisp objects.

Yes, exactly my thoughts.

Perhaps Dmitry could present comparison of profiles from perf which
would allow us to understand the reason(s)?




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 07:17:49 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 26 02:17:49 2023
Received: from localhost ([127.0.0.1]:60486 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pKwW4-0004tu-Ks
	for submit <at> debbugs.gnu.org; Thu, 26 Jan 2023 02:17:48 -0500
Received: from mail-pf1-f181.google.com ([209.85.210.181]:46688)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <casouri@HIDDEN>) id 1pKwW1-0004tb-3k
 for 60953 <at> debbugs.gnu.org; Thu, 26 Jan 2023 02:17:47 -0500
Received: by mail-pf1-f181.google.com with SMTP id 20so556737pfu.13
 for <60953 <at> debbugs.gnu.org>; Wed, 25 Jan 2023 23:17:45 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:from:to:cc:subject:date
 :message-id:reply-to;
 bh=X1LoNddQtKSBIcOu8XkeUEdfM3/ZGbglGZbldh5thaw=;
 b=XK8bHyS6C12NkFbl5YGcjC9xdFTV4uoA88b4TFj3X1r0RFja3XgJHbI6mEtwaV+wHm
 VraMMqR+aXTT1qn8A3EgUlc7q0WPJq2HUTk0jGHrxAUgFS5frKTC9y+8aBugqn7vs2Qb
 kJjrRg5IG9CteOss5tbw127Zfk3S7Xhef+eWHcvINugvuyzrQOr0eScHxbKUcK46uln4
 jwcbi2haimVpNMdp0toPiYEF9xiqCYi8V3i52ZFwe75SCC8IB4hRvpjYKYO1wMzGwn41
 /o9QhhP4QJ4VcEKfPo5mvkX874gh4n08Pvt7RjsoIkII0OEjmNLscF3+XDHxdnIYeWgf
 kN9A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=X1LoNddQtKSBIcOu8XkeUEdfM3/ZGbglGZbldh5thaw=;
 b=vHK5rWeMvSSZ/Hi5JJ9Zu8blRO+GyIczTdz/7xmUF9sdt1VnpmgH/vTMPl6amZ7PbD
 HbIRGGkasS9vbYRBGbmAYmolqk7H/Cct1QYDgNNhWBy39GPdW4hqcQawruS3vnPdhrGg
 pw1ymkNMO3aUvteBjaVxylpC11VWhNtOTwrriV2iw4mUDaboamaNMhXwmDSrWamXIvsx
 LKRZey7jhjtPwQYyfesb3jzVt2II7t4Dcr+42BDS8BOA9TCqECJeD0clU0i5xk9DYjgT
 KOmj9zXbMygO/p978FXQwOF6n0ge20ZhEeJYnTUUawcrDxZu5PH6J3v2CHOawJ7HNgV4
 KxWQ==
X-Gm-Message-State: AO0yUKXr8IkNqL0Guxc9vKh0+lJfwtK9Fh9ytBzYHcylPaVy0LmR63ts
 mWlUgPav2v/sXZZwfboXAtI=
X-Google-Smtp-Source: AK7set8FoN7ZKjbGnppSO1LJy6zJB49+9hGvhYlMf4uRkIiT+1oLZmxwXt6Mmy9PgEw+R2tG+V44yA==
X-Received: by 2002:a62:e110:0:b0:590:32a6:b6d6 with SMTP id
 q16-20020a62e110000000b0059032a6b6d6mr1178643pfh.32.1674717459294; 
 Wed, 25 Jan 2023 23:17:39 -0800 (PST)
Received: from smtpclient.apple (cpe-172-117-161-177.socal.res.rr.com.
 [172.117.161.177]) by smtp.gmail.com with ESMTPSA id
 a23-20020aa794b7000000b0058837da69edsm264665pfl.128.2023.01.25.23.17.38
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Wed, 25 Jan 2023 23:17:38 -0800 (PST)
Content-Type: text/plain;
	charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.300.101.1.3\))
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
From: Yuan Fu <casouri@HIDDEN>
In-Reply-To: <838rhpg57n.fsf@HIDDEN>
Date: Wed, 25 Jan 2023 23:17:25 -0800
Content-Transfer-Encoding: quoted-printable
Message-Id: <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN>
To: Eli Zaretskii <eliz@HIDDEN>
X-Mailer: Apple Mail (2.3731.300.101.1.3)
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 60953
Cc: 60953 <at> debbugs.gnu.org, Dmitry Gutov <dgutov@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)



> On Jan 25, 2023, at 10:50 PM, Eli Zaretskii <eliz@HIDDEN> wrote:
>=20
>> Date: Thu, 26 Jan 2023 01:21:08 +0200
>> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
>> From: Dmitry Gutov <dgutov@HIDDEN>
>>=20
>> Thank you. Unfortunately, the performance improvement from this patch =
is=20
>> still fairly negligible.
>=20
> This is quite strange, since all of the approaches basically use the
> same primitives under the hood.  Perhaps the reason for the slowness
> is that the code which computes the text span of a node is slow?
> Otherwise, I must be missing something here, since the rest of the
> code on the C level is basically the same, give or take some wrappers
> that should not change the overall picture.
>=20
> Yuan, do you have some insights here?

Sadly, no.

>=20
>> Switching to using :pred with function (like I did in commit=20
>> d94dc606a0934) which still uses buffer-substring inside is =
significantly=20
>> faster.
>=20
> If the performance issue is fixed, then the only aspect that we should
> perhaps try to improve is consing.  Consing a string each time you
> need to fontify increases the GC pressure, so if there's a good way of
> avoiding that without performance degradation, we should take it.  Is
> it possible to use your :pred technique in a way that doesn't need to
> produce strings from buffer text?

Why is :pred more performant though? They just use string-match-p. If =
anything, the :pred predicates should be more expensive, since they =
execute lisp functions and conses tree-sitter nodes into lisp objects.

Yuan=




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 06:49:59 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 26 01:49:59 2023
Received: from localhost ([127.0.0.1]:60447 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pKw58-0004Ai-TA
	for submit <at> debbugs.gnu.org; Thu, 26 Jan 2023 01:49:59 -0500
Received: from eggs.gnu.org ([209.51.188.92]:58342)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1pKw55-0004AR-JF
 for 60953 <at> debbugs.gnu.org; Thu, 26 Jan 2023 01:49:58 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pKw50-0003By-2j; Thu, 26 Jan 2023 01:49:50 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=z5bFN+3gETqH79q/fPgWogGuTp+jfDDTMEHztAodpag=; b=ba1XRKRDkBft
 WK7qxmGHiWScoRGZvSEq7teRE5Ie1OieybQtZSq5/130KM+jXZGCfQg86srVwFbug3LV9jnz9KDUd
 lcBp9OA3xYzltRhopRBx5LISAYGYszRaf6rvsIrqOEcSyR69EEt0vQ0FumciXtozEUYnpm0NCcBEr
 UIRZCZO3kTD9p0e4CUHdm141iY35We5s9CujOeop0SzbCq21FB3wTLF5JM30ZMRx4CXpAfG3NP9dD
 rcRRom0icQpYssdQHuHYmDjZqKokI/WUbjJ4g2bQXwixZ03fZttj6nkupmyCxAq63uylPn44SvJQE
 1ft0kvAKI9RN1eDVRZD8qQ==;
Received: from [87.69.77.57] (helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pKw4z-0004hS-Co; Thu, 26 Jan 2023 01:49:49 -0500
Date: Thu, 26 Jan 2023 08:50:04 +0200
Message-Id: <838rhpg57n.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> (message from
 Dmitry Gutov on Thu, 26 Jan 2023 01:21:08 +0200)
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
 <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> Date: Thu, 26 Jan 2023 01:21:08 +0200
> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
> From: Dmitry Gutov <dgutov@HIDDEN>
> 
> Thank you. Unfortunately, the performance improvement from this patch is 
> still fairly negligible.

This is quite strange, since all of the approaches basically use the
same primitives under the hood.  Perhaps the reason for the slowness
is that the code which computes the text span of a node is slow?
Otherwise, I must be missing something here, since the rest of the
code on the C level is basically the same, give or take some wrappers
that should not change the overall picture.

Yuan, do you have some insights here?

> Switching to using :pred with function (like I did in commit 
> d94dc606a0934) which still uses buffer-substring inside is significantly 
> faster.

If the performance issue is fixed, then the only aspect that we should
perhaps try to improve is consing.  Consing a string each time you
need to fontify increases the GC pressure, so if there's a good way of
avoiding that without performance degradation, we should take it.  Is
it possible to use your :pred technique in a way that doesn't need to
produce strings from buffer text?




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 25 Jan 2023 23:21:19 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 25 18:21:19 2023
Received: from localhost ([127.0.0.1]:60188 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pKp4w-0000oO-TX
	for submit <at> debbugs.gnu.org; Wed, 25 Jan 2023 18:21:19 -0500
Received: from mail-ed1-f52.google.com ([209.85.208.52]:36440)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <raaahh@HIDDEN>) id 1pKp4u-0000o6-CJ
 for 60953 <at> debbugs.gnu.org; Wed, 25 Jan 2023 18:21:18 -0500
Received: by mail-ed1-f52.google.com with SMTP id u21so400069edv.3
 for <60953 <at> debbugs.gnu.org>; Wed, 25 Jan 2023 15:21:16 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:from:to:cc:subject:date:message-id:reply-to;
 bh=pP7gDnrI4m9kj9KUZ4SIHlTchsrVVFxIvqqg+Wn5TJw=;
 b=oYN92WcECUxyazraxpd0PnPXfAXroOr4VMFPP7rJ/ZO8ZrZ+HffB6REvUjYl8q79G3
 S0RfVc/XAx8yC5XBf3WSvUukbSLAiLzQNHWlq1TgRRGXaoBkNWmgHN0ApE2CNJX1av3P
 GqbG7wJaTccry924U5tnfl1PYMCCli8SDaqPvv4yX/K6ld18674wqsHEFYE0u2q4qxxx
 MZoEVGbnAjZarLt4ajV4nAt8wUsh9pTcF5nXkJaA/NICQ8gQlngGgvpcS4ER+S93CZgC
 QhshT2pWdTe0/QAcMUqHY6w6skpKFU0dh/EMNSAHKc5NPM0vrug+7Csm6DLVTUQAwcTy
 d33Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=pP7gDnrI4m9kj9KUZ4SIHlTchsrVVFxIvqqg+Wn5TJw=;
 b=ovh8O0cvtRkpJ46U5HfsOSj+gQuKgGQVZQFVY6/umtUBuMnoHCFQ2MHwx6euF0WO3t
 5i+8GySVTYCMhQ3twWIqK8v4Ng10UgQ2XN1MiYN/l0FErYrmaEz8QNh2umyPXErv35zm
 rF6JEMx6UE5BjyzVrFljwlOOF7maYUz/6GZSHgpTz/kTHZmgxGErHcv0+CQKukam3exw
 V+vr4V/Nj2ntwZUarIt/ZRpgRXoPqFS6KsRZKezUJ7rM/iQQ+A3hQYVb2hf3aYPOwvcJ
 1XquEPOEhZ9ZZU7C/QS6Ou4Sx8ww/UjjZVCebyv9+4LfeuTjeDolbmPZ86HxOXgoiFBC
 nm/A==
X-Gm-Message-State: AO0yUKVEd+EIKVL8SqxL5CoXS43esAnE2h2C+PR7d53TIDHrukaGblIx
 q3j3cPxbGsUH7d/BErHLr1U=
X-Google-Smtp-Source: AK7set8NT8p/BbSNwO9eRnU9aR3AZEwd+1219nFucQK/pvW7dqNQCw7RmPWtArHNUTXhei442Hy5xg==
X-Received: by 2002:a05:6402:1a5a:b0:4a0:b72a:6552 with SMTP id
 bf26-20020a0564021a5a00b004a0b72a6552mr1411563edb.19.1674688870524; 
 Wed, 25 Jan 2023 15:21:10 -0800 (PST)
Received: from [192.168.0.2] ([46.251.119.176])
 by smtp.googlemail.com with ESMTPSA id
 k25-20020aa7c059000000b00499b6b50419sm2944297edo.11.2023.01.25.15.21.09
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Wed, 25 Jan 2023 15:21:09 -0800 (PST)
Message-ID: <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN>
Date: Thu, 26 Jan 2023 01:21:08 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
Content-Language: en-US
To: Eli Zaretskii <eliz@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN>
From: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <83wn5ag4nc.fsf@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: -0.9 (/)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.9 (-)

On 25/01/2023 14:49, Eli Zaretskii wrote:
>> Cc: 60953 <at> debbugs.gnu.org
>> Date: Wed, 25 Jan 2023 05:48:13 +0200
>> From: Dmitry Gutov <dgutov@HIDDEN>
>>
>>> We can probably match the regexp in-place, just limit the match to the range of the node.
>>
>> That's what I tried to do in the patch attached to the first message:
>> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60953#5
>>
>> But the effect on performance was surprisingly hard to notice. It also
>> broke the actual highlighting, but that's probably because the regexp
>> uses anchors \` and \', which don't really work for fast_looking_at
>> calls inside a buffer.
> 
> The condition for a match in that patch is not correct, AFAIU:
> 
>     if (val >= 0)
>       return true;
>     else
>       return false;
> 
> It should be "if (val > 0)" instead, since fast_looking_at returns the
> number of characters that matched (unlike fast_string_match in the
> original code, which returns the _index_ of the match).

Thank you. Unfortunately, the performance improvement from this patch is 
still fairly negligible.

Even though I got the highlighting to work -- by removing the \` and \' 
anchors from ruby-ts--builtin-methods (reducing the precision a little, 
but that's not important for the benchmark).

Switching to using :pred with function (like I did in commit 
d94dc606a0934) which still uses buffer-substring inside is significantly 
faster.

> Also, fast_string_match is capable of succeeding if the match begins
> not at the first character, whereas fast_looking_at does an anchored
> match.  Do we expect the text to match from its beginning in this
> case?  If not, I think the replacement didn't do what the original
> code does, and you should have used search_buffer or maybe
> search_buffer_re instead.

I suppose one could use a non-anchored regexp with :match, but that's 
not the case with the regexp I'm using currently.

Anyway, that's only going to be important if we find something that I 
missed here with this patch. Because otherwise the major bottleneck is 
somewhere else.

If we do end up using it and try to get it to 100% correctness, I 
suppose a combination of narrow-to-region (so that the \` and \' anchors 
work) with re-search-forward can do the trick.

Although I've tried using that combination inside 
ruby-ts--builtin-method-p (to avoid the buffer-substring call), and it 
wasn't much of an improvement in performance either.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 25 Jan 2023 12:49:56 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 25 07:49:56 2023
Received: from localhost ([127.0.0.1]:58568 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pKfDv-0006mm-OY
	for submit <at> debbugs.gnu.org; Wed, 25 Jan 2023 07:49:56 -0500
Received: from eggs.gnu.org ([209.51.188.92]:52420)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1pKfDt-0006mW-1T
 for 60953 <at> debbugs.gnu.org; Wed, 25 Jan 2023 07:49:54 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pKfDn-0005Hw-EK; Wed, 25 Jan 2023 07:49:47 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=HCBveevOT6IDrey944hhxzPO90v1QN2kvEJJY7mSGRs=; b=APk6a+KLz69C
 2LmUjS3JaZe0Bw7vIt84bmWLcuqO5MKAF4RwGkSzu0VAhqvyBUlsaOV7OWrJImvAGs/6UW3/56wzG
 nUZGR++f7tvj8kPgrkNBavIv2dEswsZiw2Sy2fc8rifjrJdBDEJbYMMt8iuUGDcPR20EKNs/y3/Vz
 rWJNnfoxwAOJfggC9hKMcwMq2S3GcIaWCjNanTPvw9YtUAiVV6N+TiuWsGcBLXXd9Rdf/JxkRPSN9
 1RPO49FEi/B6H1OaejsKDCwo6b3aoaLvyZA0fZgDEXp5ZyWnzPer5lZ8/A+20W871ZiijPoVTX1Zg
 2azxciqPaJe0dv4hrdbJ8w==;
Received: from [87.69.77.57] (helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1pKfDm-0003JD-PR; Wed, 25 Jan 2023 07:49:47 -0500
Date: Wed, 25 Jan 2023 14:49:59 +0200
Message-Id: <83wn5ag4nc.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> (message from
 Dmitry Gutov on Wed, 25 Jan 2023 05:48:13 +0200)
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
 <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 60953
Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> Cc: 60953 <at> debbugs.gnu.org
> Date: Wed, 25 Jan 2023 05:48:13 +0200
> From: Dmitry Gutov <dgutov@HIDDEN>
> 
> > We can probably match the regexp in-place, just limit the match to the range of the node.
> 
> That's what I tried to do in the patch attached to the first message: 
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60953#5
> 
> But the effect on performance was surprisingly hard to notice. It also 
> broke the actual highlighting, but that's probably because the regexp 
> uses anchors \` and \', which don't really work for fast_looking_at 
> calls inside a buffer.

The condition for a match in that patch is not correct, AFAIU:

   if (val >= 0)
     return true;
   else
     return false;

It should be "if (val > 0)" instead, since fast_looking_at returns the
number of characters that matched (unlike fast_string_match in the
original code, which returns the _index_ of the match).

Also, fast_string_match is capable of succeeding if the match begins
not at the first character, whereas fast_looking_at does an anchored
match.  Do we expect the text to match from its beginning in this
case?  If not, I think the replacement didn't do what the original
code does, and you should have used search_buffer or maybe
search_buffer_re instead.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 25 Jan 2023 03:48:24 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Jan 24 22:48:24 2023
Received: from localhost ([127.0.0.1]:58017 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pKWls-0007bS-DE
	for submit <at> debbugs.gnu.org; Tue, 24 Jan 2023 22:48:24 -0500
Received: from mail-ed1-f54.google.com ([209.85.208.54]:40544)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <raaahh@HIDDEN>) id 1pKWlq-0007ay-EO
 for 60953 <at> debbugs.gnu.org; Tue, 24 Jan 2023 22:48:23 -0500
Received: by mail-ed1-f54.google.com with SMTP id k20so3249128edj.7
 for <60953 <at> debbugs.gnu.org>; Tue, 24 Jan 2023 19:48:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:from:to:cc:subject:date:message-id:reply-to;
 bh=pklSNqMOsISod1Yi1O+oVyqipVv4EOh4x4dqzL7thlc=;
 b=PNyOYV5KgQEm8NF4TjhAZ/0+W7pBjEB5g+SVt09MvlNhIobIKlHzEqYAy3UmVEXXlT
 SQP/cRcvJ04S4qJqlvcrsyETKLtktSuKiMqBiOmB12e2F+yHoLkkcN6ZPSze/qBx5q13
 RolV74rh0K0NT1Cxs6zlfZYRA3GtlJhfDuq1JuHr2zSVVqqkpO2oyzqjehyM9DJg8FKP
 sQyTn2cfBsNs/eXPpIHyItwQ5RDINk/EPhOgk5pwW1pWFWHFcM/wcJJupGFD4Um8Yjcq
 oiK8LnFItDHonNahVHhBJWDgRZwDq4I07iX/rYZx2zVoHG8TYZmEh9Z3P4we0pynAHGK
 YcsQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=pklSNqMOsISod1Yi1O+oVyqipVv4EOh4x4dqzL7thlc=;
 b=7YioWKDEbKg9VfOIAsw74x/6ezrumNN9GemwLXm/TBj/JElI5fNEfY52ajU04W02P1
 DlQF4tlks+DGdZjWojyy2AdMqTIR6Yf7+NvoSoyj+Htp/WxwKV0aHo8y3eitEWW7xUzw
 +yosE2KmcV0zmMvWLL8I4IVhtDv6hBfpOiPGsxPS2AOXmUI3yoWVS/w6+RiE0cZJatFE
 5R7iVJRugIbw5r3dB50xDDgTNMqch17sWzlb234/jjkQanNO5ASo8eDH0TjvZtBoWaNw
 ny11yeXn+IYNx+KWZTy+gUc+uo2PPji3TElzpds/B/HYPmWOSVcgVr5W1ykx0aJrKVWT
 gDVA==
X-Gm-Message-State: AFqh2kqiMhEQczByCymPcywKdwQlTCD+FvFV2UkqBQi2YKiBKUU72TKr
 DcfvCmAVnX1WocjXyfreReA=
X-Google-Smtp-Source: AMrXdXvM+FMgcFjJyDxlq08cO29Z7CuC7exg6UK6i8lpcRmqNwFR7E6SALCCenMmG+mmmQ6qvJK3tA==
X-Received: by 2002:aa7:c44d:0:b0:46c:b919:997f with SMTP id
 n13-20020aa7c44d000000b0046cb919997fmr22998290edr.17.1674618495781; 
 Tue, 24 Jan 2023 19:48:15 -0800 (PST)
Received: from [192.168.0.2] ([46.251.119.176])
 by smtp.googlemail.com with ESMTPSA id
 q28-20020a056402033c00b0048789661fa2sm1761752edw.66.2023.01.24.19.48.14
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Tue, 24 Jan 2023 19:48:15 -0800 (PST)
Message-ID: <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN>
Date: Wed, 25 Jan 2023 05:48:13 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
Content-Language: en-US
To: Yuan Fu <casouri@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
 <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
From: Dmitry Gutov <dgutov@HIDDEN>
In-Reply-To: <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Score: -0.9 (/)
X-Debbugs-Envelope-To: 60953
Cc: 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.9 (-)

On 25/01/2023 05:13, Yuan Fu wrote:

> FYI the predicates are not processed by tree-sitter, but by us. For example, the #equal predicate is handled by treesit_predicate_equal. For #match, right now we create a string with Fbuffer_substring and pass it to fast_string_match, so it definitely causes a lot of gc’s, as you observed.

Right.

> We can probably match the regexp in-place, just limit the match to the range of the node.

That's what I tried to do in the patch attached to the first message: 
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60953#5

But the effect on performance was surprisingly hard to notice. It also 
broke the actual highlighting, but that's probably because the regexp 
uses anchors \` and \', which don't really work for fast_looking_at 
calls inside a buffer.

I also experimented with replacing the current 
buffer-substring+string-match-p scheme with looking-at. No difference in 
performance.

Reducing the size of the regexp, however, made a lot of difference. 
ruby-ts--builtin-methods is 721 characters long.

So my current hypothesis is that the extra GC is caused by copying the 
regexp string back and forth. Which seems a bit more difficult to avoid. 
But could be done if we replace that value with some indirection.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 25 Jan 2023 03:13:50 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Jan 24 22:13:50 2023
Received: from localhost ([127.0.0.1]:57981 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pKWEP-0006e0-RD
	for submit <at> debbugs.gnu.org; Tue, 24 Jan 2023 22:13:50 -0500
Received: from mail-pj1-f44.google.com ([209.85.216.44]:36564)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <casouri@HIDDEN>) id 1pKWEN-0006dk-Gr
 for 60953 <at> debbugs.gnu.org; Tue, 24 Jan 2023 22:13:48 -0500
Received: by mail-pj1-f44.google.com with SMTP id
 e10-20020a17090a630a00b0022bedd66e6dso647545pjj.1
 for <60953 <at> debbugs.gnu.org>; Tue, 24 Jan 2023 19:13:47 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:from:to:cc:subject:date
 :message-id:reply-to;
 bh=ZxnLshmRBdXkTCPmScFG4XWddbad1R3badRfjSwGSX8=;
 b=g5H2C1scYa2Uo71baaoVqV8jhRuqhNOVWcwBebrdGITKptG6wndHEgrWwOp0FEMYFy
 pgUZxtTl93aKJzq5zfDx2wwxyOzwXNMEKdRIsUyGoqtWYZnli35Sb+6nyq8w12fQyP/B
 wZIclQkykUPlBksexhaQMz38szuUMYTpD3Y0W+8TTrNtLxBIzfusrHe6k/F6TEL3jj0s
 4QhrukA6TO2HhzgoLLKg6v/krZT7NMCsM29n+YkGD9zfY0pbq7PoqYTSbTOWu613DPKa
 I16RYhpCogfEdCNDXeqHFE1RH7HptM8FFt28taFSUDUf02924+tWW4E4kwhozebGyVGu
 3oog==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=ZxnLshmRBdXkTCPmScFG4XWddbad1R3badRfjSwGSX8=;
 b=EZ1BZctgaLY8w7pLA91tjtOpNAUyylJ/lTHbX2KXnkqt2MbK3aP4aA42sbzCANSfij
 0HgcmJfwOpOzYgPOkjOAArKMH4EW+69cW0JPQxzME6nn9D2VfPmcvYvoOLmUed211cyG
 dAF3OQ2u/F5EoLxvhQBvFdMsb3V+eWGI50pV6WEsPfb7QwTRZ2CvOt1gR1uRJjI+nSxj
 PDj8IpWS5VGbghfBopK8NXLBwZn3dbsz+5c0CdWCYfLNwDiY+NIlnrrWAFZ2kuamlkOD
 THwadr11yz1IhCRtZUuHDyApqBBf1Eo+wSRSCom131xTG/36LmlgKzM4GRSm2J9NSkex
 91tA==
X-Gm-Message-State: AFqh2kr2kVA8nE3LGeeS42JbT4dOCxDeYJxdXI/6KKCy4SjGtdEFKufu
 1GGD0ibDi6eQMNuhcWqj+8Q=
X-Google-Smtp-Source: AMrXdXuTH2iBpfccSedj0iTUa5ZxKzvV/dr8neaokw32AZHJrHErZdnBlQbEuDVeJoXA2ObCL7E4hQ==
X-Received: by 2002:a05:6a21:788c:b0:b5:f6de:e299 with SMTP id
 bf12-20020a056a21788c00b000b5f6dee299mr40624567pzc.35.1674616421409; 
 Tue, 24 Jan 2023 19:13:41 -0800 (PST)
Received: from smtpclient.apple (cpe-172-117-161-177.socal.res.rr.com.
 [172.117.161.177]) by smtp.gmail.com with ESMTPSA id
 x3-20020a63b343000000b004b1b9e23790sm2076551pgt.92.2023.01.24.19.13.40
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 24 Jan 2023 19:13:41 -0800 (PST)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.300.101.1.3\))
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
From: Yuan Fu <casouri@HIDDEN>
In-Reply-To: <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
Date: Tue, 24 Jan 2023 19:13:29 -0800
Content-Transfer-Encoding: quoted-printable
Message-Id: <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
 <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
To: Dmitry Gutov <dgutov@HIDDEN>
X-Mailer: Apple Mail (2.3731.300.101.1.3)
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 60953
Cc: 60953 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)



> On Jan 23, 2023, at 8:04 PM, Dmitry Gutov <dgutov@HIDDEN> wrote:
>=20
> Cc-ing Yuan, just in case.
>=20
> On 20/01/2023 05:53, Dmitry Gutov wrote:
>> In my benchmarking -- using this form in =
test/lisp/progmodes/ruby-mode-resources/ruby.rb after enabling =
ruby-ts-mode:
>>   (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) =
(let (treesit--font-lock-fast-mode) (font-lock-ensure))))
>> the rule added to its font-lock in commit d66ac5285f7
>>    :language language
>>    :feature 'builtin-functions
>>    `((((identifier) @font-lock-builtin-face)
>>       (:match ,ruby-ts--builtin-methods
>>        @font-lock-builtin-face)))
>> ...seems to have made it 50% slower.
>> The profile looked like this:
>>   9454  84%                   - font-lock-fontify-region
>>   9454  84%                    - font-lock-default-fontify-region
>>   8862  79%                     - =
font-lock-fontify-syntactically-region
>>   8702  78%                      - treesit-font-lock-fontify-region
>>    128   1%                         treesit-fontify-with-override
>>    123   1%                         facep
>>     84   0% treesit--children-covering-range-recurse
>>     60   0%                       + ruby-ts--comment-font-lock
>>      4   0%                       + font-lock-unfontify-region
>>    568   5%                     + font-lock-fontify-keywords-region
>>     16   0%                     + font-lock-unfontify-region
>> So there's nothing on the Lisp level to look at.
>=20
> I've done some perf recordings now. It seems most/all of the =
difference comes down to garbage collection. Or more concretely, time =
spent inside process_mark_stack.
>=20
> Without the added query benchmark reports:
>=20
> (10.13723333 49 1.141649534999999)
>=20
> And the perf top5 is:
>=20
>  17.26%  emacs         libtree-sitter.so.0.0       [.] =
ts_tree_cursor_current_status
>  10.83%  emacs         libtree-sitter.so.0.0       [.] =
ts_tree_cursor_goto_next_sibling
>  10.18%  emacs         libtree-sitter.so.0.0       [.] =
ts_tree_cursor_goto_first_child
>   8.37%  emacs         emacs                       [.] =
process_mark_stack
>   4.63%  emacs         libtree-sitter.so.0.0       [.] =
ts_node_start_point
>=20
> With this simple query that colors everything:
>=20
>   :language language
>   :feature 'builtin-function
>   `((((identifier) @font-lock-builtin-face)))
>=20
> I get:
>=20
> (11.993968995 82 1.9326509270000045)
>=20
> Note the jump in runtime that's larger than the jump in GC.
>=20
>  17.26%  emacs         libtree-sitter.so.0.0       [.] =
ts_tree_cursor_current_status
>  10.83%  emacs         libtree-sitter.so.0.0       [.] =
ts_tree_cursor_goto_next_sibling
>  10.18%  emacs         libtree-sitter.so.0.0       [.] =
ts_tree_cursor_goto_first_child
>   8.37%  emacs         emacs                       [.] =
process_mark_stack
>   4.63%  emacs         libtree-sitter.so.0.0       [.] =
ts_node_start_point
>=20
> The current query looks like this:
>=20
>   :language language
>   :feature 'builtin-function
>   `((((identifier) @font-lock-builtin-face)
>      (:pred ruby-ts--builtin-method-p @font-lock-builtin-face)))
>=20
> Benchmarking:
>=20
> (12.493614359 107 2.558609025999999)
>=20
>  15.30%  emacs         libtree-sitter.so.0.0       [.] =
ts_tree_cursor_current_status
>  14.92%  emacs         emacs                       [.] =
process_mark_stack
>   9.75%  emacs         libtree-sitter.so.0.0       [.] =
ts_tree_cursor_goto_next_sibling
>   8.90%  emacs         libtree-sitter.so.0.0       [.] =
ts_tree_cursor_goto_first_child
>   3.87%  emacs         libtree-sitter.so.0.0       [.] =
ts_node_start_point
>=20
> Here we get the same jump in runtime as in GC. Even though this rule =
ends up coloring much fewer (almost none) nodes in the current buffer. I =
interpret the results like this:
>=20
> - The jump in runtime of the previous query was probably related to =
the number of nodes needed to be processed, but not with the resulting =
highlighting, even though every identifier in the buffer ends up being =
colored.
>=20
> - The GC overhead created by the predicates is non-negligible.
>=20
> And the original query that I tried:
>=20
>   :language language
>   :feature 'builtin-function
>   `((((identifier) @font-lock-builtin-face)
>      (:match ,ruby-ts--builtin-methods @font-lock-builtin-face)))
>=20
> Benchmarking:
>=20
> (16.433451865000002 249 5.908674810000001)
>=20
>  23.72%  emacs         emacs                    [.] process_mark_stack
>  12.33%  emacs         libtree-sitter.so.0.0    [.] =
ts_tree_cursor_current_status
>   7.96%  emacs         libtree-sitter.so.0.0    [.] =
ts_tree_cursor_goto_next_sibling
>   7.38%  emacs         libtree-sitter.so.0.0    [.] =
ts_tree_cursor_goto_first_child
>   3.37%  emacs         libtree-sitter.so.0.0    [.] =
ts_node_start_point
>=20
> Here's a significant jump in GC time which is almost the same as the =
difference in runtime. And all of it is spent marking?
>=20
> I suppose if the problem is allocation of a large string (many times =
over), the GC could be spending a lot of time scanning through the =
memory. Could this be avoided by passing some substitute handle to TS, =
instead of the full string? E.g. some kind of reference to it in the =
regexp cache.
>=20

FYI the predicates are not processed by tree-sitter, but by us. For =
example, the #equal predicate is handled by treesit_predicate_equal. For =
#match, right now we create a string with Fbuffer_substring and pass it =
to fast_string_match, so it definitely causes a lot of gc=E2=80=99s, as =
you observed. We can probably match the regexp in-place, just limit the =
match to the range of the node.

Yuan





Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at 60953 <at> debbugs.gnu.org:


Received: (at 60953) by debbugs.gnu.org; 24 Jan 2023 04:04:18 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 23 23:04:18 2023
Received: from localhost ([127.0.0.1]:55687 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pKAXi-0003vk-6e
	for submit <at> debbugs.gnu.org; Mon, 23 Jan 2023 23:04:18 -0500
Received: from mail-ed1-f44.google.com ([209.85.208.44]:35390)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <raaahh@HIDDEN>) id 1pKAXg-0003vY-27
 for 60953 <at> debbugs.gnu.org; Mon, 23 Jan 2023 23:04:16 -0500
Received: by mail-ed1-f44.google.com with SMTP id y19so16955453edc.2
 for <60953 <at> debbugs.gnu.org>; Mon, 23 Jan 2023 20:04:16 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=content-transfer-encoding:in-reply-to:references:to:from
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:from:to:cc:subject:date:message-id:reply-to;
 bh=5K5a7QtZ0imfO2URB0gWWkM0LaW8K3Lncn8FRoQ4V4I=;
 b=ZVapMInVnPF5uPy5Ud49OutEyb2zc8AFvnOkisrq4jI9tUhsx8vHztxxRbrh3knV2T
 WxSxwG3ycsaeze4R6p1nbwgLDcVRVgx0oIJntuH5JR8GgkOcdh3eK6q3B99UbcsjdHP5
 Ft0twTqxT/B01pnslXdTei9RCFUm2Gcxn5YnqHyb9ieqwe1rYXNSIxW+BlS9nVE/pv6t
 RO3dn9Hoh/1DEu9K2D6QQ8ZWLesSDVmCmBpVyoDnj9lTX28D4MijZrV5iJf+cYsN/t4U
 qyIXuuP+UOUZnaI39U+wkega+3FtNzGJn5K5InyHHlA9+9+7etvm7EipUd7njWD27toE
 1U+w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=content-transfer-encoding:in-reply-to:references:to:from
 :content-language:subject:user-agent:mime-version:date:message-id
 :sender:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=5K5a7QtZ0imfO2URB0gWWkM0LaW8K3Lncn8FRoQ4V4I=;
 b=Z0Dapm0skS/TJd6r/jJj1dEr+KUOhAC4n2VuK4yHF1vRQNlbHoLjOHwzNl+Aw+PbCA
 ECTRLg8xzzaGom/t/StXcYcXMdKM+ld9g6yJSEcUsMXRzeWkVtBjw+gq2gbZnLX+gdZ8
 cRT+FKF3pjxGS1uJPd27TzojXahQHsv4foCehotoJaPfYvfRDun7CdS8W8CRdoBWVFsj
 +N9w69AJ1GeEJG7xbhByqUEnoQaREdB8ZkR4jerJ0Gz+gDLHXpG1kd96MPmYAIsEJbN1
 aQFPCej8SQKdcsz2t5jO7DGMrHyRsPdaBNU0MUWrZ8aDGbJCbHXYBetpUNuUsW6GHnWm
 BdZg==
X-Gm-Message-State: AFqh2krORSjt43bS3gxFNV8Qk7xQ+Bo9RjgadnxEHAkWlCH7cySL2A+G
 LqxE3/fLNaGxl8b/g6SaMSRa4OCIZnM=
X-Google-Smtp-Source: AMrXdXvSnHBdXpZQTRB7lDi2NBg4HINbj6ZE1EuMyAY9kzgieyR2JkFO30B6lLhDYc0or0SNEejKAA==
X-Received: by 2002:a05:6402:4305:b0:49c:7aa2:55de with SMTP id
 m5-20020a056402430500b0049c7aa255demr36687914edc.1.1674533049913; 
 Mon, 23 Jan 2023 20:04:09 -0800 (PST)
Received: from [192.168.0.2] ([46.251.119.176])
 by smtp.googlemail.com with ESMTPSA id
 k13-20020a056402048d00b0049dc0123f29sm487955edv.61.2023.01.23.20.04.08
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Mon, 23 Jan 2023 20:04:09 -0800 (PST)
Message-ID: <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN>
Date: Tue, 24 Jan 2023 06:04:07 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter
 font-lock seems inefficient
Content-Language: en-US
From: Dmitry Gutov <dgutov@HIDDEN>
To: 60953 <at> debbugs.gnu.org, Yuan Fu <casouri@HIDDEN>
References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
In-Reply-To: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Score: -0.9 (/)
X-Debbugs-Envelope-To: 60953
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.9 (-)

Cc-ing Yuan, just in case.

On 20/01/2023 05:53, Dmitry Gutov wrote:
> In my benchmarking -- using this form in 
> test/lisp/progmodes/ruby-mode-resources/ruby.rb after enabling 
> ruby-ts-mode:
> 
>    (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) 
> (let (treesit--font-lock-fast-mode) (font-lock-ensure))))
> 
> the rule added to its font-lock in commit d66ac5285f7
> 
>     :language language
>     :feature 'builtin-functions
>     `((((identifier) @font-lock-builtin-face)
>        (:match ,ruby-ts--builtin-methods
>         @font-lock-builtin-face)))
> 
> ...seems to have made it 50% slower.
> 
> The profile looked like this:
> 
>    9454  84%                   - font-lock-fontify-region
>    9454  84%                    - font-lock-default-fontify-region
>    8862  79%                     - font-lock-fontify-syntactically-region
>    8702  78%                      - treesit-font-lock-fontify-region
>     128   1%                         treesit-fontify-with-override
>     123   1%                         facep
>      84   0% treesit--children-covering-range-recurse
>      60   0%                       + ruby-ts--comment-font-lock
>       4   0%                       + font-lock-unfontify-region
>     568   5%                     + font-lock-fontify-keywords-region
>      16   0%                     + font-lock-unfontify-region
> 
> So there's nothing on the Lisp level to look at.

I've done some perf recordings now. It seems most/all of the difference 
comes down to garbage collection. Or more concretely, time spent inside 
process_mark_stack.

Without the added query benchmark reports:

(10.13723333 49 1.141649534999999)

And the perf top5 is:

   17.26%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_current_status
   10.83%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_goto_next_sibling
   10.18%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_goto_first_child
    8.37%  emacs         emacs                       [.] process_mark_stack
    4.63%  emacs         libtree-sitter.so.0.0       [.] ts_node_start_point

With this simple query that colors everything:

    :language language
    :feature 'builtin-function
    `((((identifier) @font-lock-builtin-face)))

I get:

(11.993968995 82 1.9326509270000045)

Note the jump in runtime that's larger than the jump in GC.

   17.26%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_current_status
   10.83%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_goto_next_sibling
   10.18%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_goto_first_child
    8.37%  emacs         emacs                       [.] process_mark_stack
    4.63%  emacs         libtree-sitter.so.0.0       [.] ts_node_start_point

The current query looks like this:

    :language language
    :feature 'builtin-function
    `((((identifier) @font-lock-builtin-face)
       (:pred ruby-ts--builtin-method-p @font-lock-builtin-face)))

Benchmarking:

(12.493614359 107 2.558609025999999)

   15.30%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_current_status
   14.92%  emacs         emacs                       [.] process_mark_stack
    9.75%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_goto_next_sibling
    8.90%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_goto_first_child
    3.87%  emacs         libtree-sitter.so.0.0       [.] ts_node_start_point

Here we get the same jump in runtime as in GC. Even though this rule 
ends up coloring much fewer (almost none) nodes in the current buffer. I 
interpret the results like this:

- The jump in runtime of the previous query was probably related to the 
number of nodes needed to be processed, but not with the resulting 
highlighting, even though every identifier in the buffer ends up being 
colored.

- The GC overhead created by the predicates is non-negligible.

And the original query that I tried:

    :language language
    :feature 'builtin-function
    `((((identifier) @font-lock-builtin-face)
       (:match ,ruby-ts--builtin-methods @font-lock-builtin-face)))

Benchmarking:

(16.433451865000002 249 5.908674810000001)

   23.72%  emacs         emacs                    [.] process_mark_stack
   12.33%  emacs         libtree-sitter.so.0.0    [.] 
ts_tree_cursor_current_status
    7.96%  emacs         libtree-sitter.so.0.0    [.] 
ts_tree_cursor_goto_next_sibling
    7.38%  emacs         libtree-sitter.so.0.0    [.] 
ts_tree_cursor_goto_first_child
    3.37%  emacs         libtree-sitter.so.0.0    [.] ts_node_start_point

Here's a significant jump in GC time which is almost the same as the 
difference in runtime. And all of it is spent marking?

I suppose if the problem is allocation of a large string (many times 
over), the GC could be spending a lot of time scanning through the 
memory. Could this be avoided by passing some substitute handle to TS, 
instead of the full string? E.g. some kind of reference to it in the 
regexp cache.





Information forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 20 Jan 2023 03:53:21 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 19 22:53:21 2023
Received: from localhost ([127.0.0.1]:45102 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pIiSu-0002RN-SV
	for submit <at> debbugs.gnu.org; Thu, 19 Jan 2023 22:53:21 -0500
Received: from lists.gnu.org ([209.51.188.17]:59018)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <raaahh@HIDDEN>) id 1pIiSs-0002RF-Kp
 for submit <at> debbugs.gnu.org; Thu, 19 Jan 2023 22:53:19 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <raaahh@HIDDEN>) id 1pIiSs-0001i6-Cl
 for bug-gnu-emacs@HIDDEN; Thu, 19 Jan 2023 22:53:18 -0500
Received: from mail-ej1-x62d.google.com ([2a00:1450:4864:20::62d])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <raaahh@HIDDEN>) id 1pIiSq-0005ET-BX
 for bug-gnu-emacs@HIDDEN; Thu, 19 Jan 2023 22:53:18 -0500
Received: by mail-ej1-x62d.google.com with SMTP id u19so10844727ejm.8
 for <bug-gnu-emacs@HIDDEN>; Thu, 19 Jan 2023 19:53:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=subject:from:to:content-language:user-agent:mime-version:date
 :message-id:sender:from:to:cc:subject:date:message-id:reply-to;
 bh=8EUcOHn1OmXnlPRul+DiGBvti9B6k9GySpE5DXhQTzk=;
 b=qXsRHp5f9LA05Cjr3of2oAIGQtTl8plit3JKcLy9xt+VqF7KP8aKsvrXvKninIj70/
 i2C+76Oo74Tq+feSg6Byuqx09g/iUDCYThRUJFpQttfJZTcAcK8DpotiWht1k4Gdxa2M
 f7k6Ys/uNw7ggUTxFhivwsEGxL5xuh1hrJiQUk6wW6q/5VRav7eHD0qg9dZHosq8II6+
 5Bxzl1gygcCAcxFNkXZmI3hn/3OnOpXGla3DnA7cssATW1Ip/U6aOJvLKI7nJM4xRTZT
 2tTYK/bSa72/dyGSHujuJg731jX2Zfsp0z+xtrSgwC8O6yJC/pN2oI9LaR+6viYOcRvy
 yFTg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=subject:from:to:content-language:user-agent:mime-version:date
 :message-id:sender:x-gm-message-state:from:to:cc:subject:date
 :message-id:reply-to;
 bh=8EUcOHn1OmXnlPRul+DiGBvti9B6k9GySpE5DXhQTzk=;
 b=JOCGTOCljhCW3ZUzEmvHcJL9EUTq9uuXBCOTs9O9arSZVapfam5+AlVuniux0YTpca
 ACizkAWPRpnMyjJXGOoRxSQcfNjrQDYO0gS+W38JVG2BvNwTc7J6HQagXqL77SYBGwhj
 jnRdVWExFMCO8siA8GrWO54Soq5lhbfokR+jV4IDUuMCfkh98/2sLhnkUqJ8jOVFAVMK
 CTWK0e43u/2akOFCCEm6V05DifFNPm0XJbiKBZlVhwPRYv9ivqSV6Ab/Ld8Jgv2VkEwa
 Tys5sWs/YNYZhemujExHgf8olY2K/jvaR/qUgrA+FutS1Tftir9KBc0tXg/64XYi4PIS
 Gm7g==
X-Gm-Message-State: AFqh2kpN/ZV5mlZX9lUDSEd7scpTO6W6WcZqUtxoK/NH+2+G4cJ7Q5cO
 Lt0g+00ljiwkCxIQpBdDOiQspQyR7Kg=
X-Google-Smtp-Source: AMrXdXtSdHAVHurP2uS2AYjbF0lpu2fqdi4EVWZDHHaDOaZQ7N9jw84Em4UMLZqbpMbFGulgdWIltQ==
X-Received: by 2002:a17:906:1851:b0:86e:4067:b699 with SMTP id
 w17-20020a170906185100b0086e4067b699mr17804418eje.4.1674186794816; 
 Thu, 19 Jan 2023 19:53:14 -0800 (PST)
Received: from [192.168.0.2] ([46.251.119.176])
 by smtp.googlemail.com with ESMTPSA id
 k11-20020a1709062a4b00b0073022b796a7sm17567994eje.93.2023.01.19.19.53.13
 for <bug-gnu-emacs@HIDDEN>
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Thu, 19 Jan 2023 19:53:14 -0800 (PST)
Content-Type: multipart/mixed; boundary="------------UQx0D8HGepFSU5bBQqKxNJmr"
Message-ID: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN>
Date: Fri, 20 Jan 2023 05:53:12 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Content-Language: en-US
To: bug-gnu-emacs@HIDDEN
From: Dmitry Gutov <dgutov@HIDDEN>
Subject: The :match predicate with large regexp in tree-sitter font-lock seems
 inefficient
Received-SPF: pass client-ip=2a00:1450:4864:20::62d;
 envelope-from=raaahh@HIDDEN; helo=mail-ej1-x62d.google.com
X-Spam_score_int: -14
X-Spam_score: -1.5
X-Spam_bar: -
X-Spam_report: (-1.5 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.25,
 FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=no autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -1.1 (-)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.1 (--)

This is a multi-part message in MIME format.
--------------UQx0D8HGepFSU5bBQqKxNJmr
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

In my benchmarking -- using this form in 
test/lisp/progmodes/ruby-mode-resources/ruby.rb after enabling ruby-ts-mode:

   (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) 
(let (treesit--font-lock-fast-mode) (font-lock-ensure))))

the rule added to its font-lock in commit d66ac5285f7

    :language language
    :feature 'builtin-functions
    `((((identifier) @font-lock-builtin-face)
       (:match ,ruby-ts--builtin-methods
        @font-lock-builtin-face)))

...seems to have made it 50% slower.

The profile looked like this:

   9454  84%                   - font-lock-fontify-region
   9454  84%                    - font-lock-default-fontify-region
   8862  79%                     - font-lock-fontify-syntactically-region
   8702  78%                      - treesit-font-lock-fontify-region
    128   1%                         treesit-fontify-with-override
    123   1%                         facep
     84   0% 
treesit--children-covering-range-recurse
     60   0%                       + ruby-ts--comment-font-lock
      4   0%                       + font-lock-unfontify-region
    568   5%                     + font-lock-fontify-keywords-region
     16   0%                     + font-lock-unfontify-region

So there's nothing on the Lisp level to look at.

Looking at the code, apparently we get a cursor and basically iterate 
through all (identifier) nodes, running our predicate manually.

Without trying something more advanced like perf, I took a stab in the 
dark and tried to reduce string allocation in treesit_predicate_match 
(it currently ends up delegating to buffer-substring for every node), 
which seemed inefficient. But while my patch (attached) compiles and 
doesn't crash, it doesn't actually work (the rule's highlighting is 
missing), and the performance was unchanged.

This message was originally longer, but see commit d94dc606a09: I 
switched to using :pred -- thus avoiding embedding the 720-char long 
regexp in the query -- and the performance drop got reduced to like 20%.

As a baseline, this simplified query without predicates and colors every 
identifier in the buffer using the specified face, is still faster (just 
10% over the original):

    :language language
    :feature 'builtin-function
    `(((identifier) @font-lock-builtin-face))

The regexp matching itself doesn't seem to be the problem:

(benchmark 354100 '(string-match-p ruby-ts--builtin-methods "gsub"))

=> Elapsed time: 0.141681s

-- whereas the difference between the benchmarks is on the order of seconds.

I think the marshaling of the long regexp string back and forth could be 
the culprit. Would be nice to fix that somehow.

I also think that trying to reduce the string allocation overhead has 
potential, but so far all my experiments haven't moved the needle 
anywhere noticeable.
--------------UQx0D8HGepFSU5bBQqKxNJmr
Content-Type: text/x-patch; charset=UTF-8; name="treesit_predicate_match.diff"
Content-Disposition: attachment; filename="treesit_predicate_match.diff"
Content-Transfer-Encoding: base64

ZGlmZiAtLWdpdCBhL3NyYy90cmVlc2l0LmMgYi9zcmMvdHJlZXNpdC5jCmluZGV4IDkxN2Ri
NTgyNjc2Li43ZTI5NGEwYTY2ZiAxMDA2NDQKLS0tIGEvc3JjL3RyZWVzaXQuYworKysgYi9z
cmMvdHJlZXNpdC5jCkBAIC0yNDY2LDEwICsyNDY2LDI2IEBAIHRyZWVzaXRfcHJlZGljYXRl
X21hdGNoIChMaXNwX09iamVjdCBhcmdzLCBzdHJ1Y3QgY2FwdHVyZV9yYW5nZSBjYXB0dXJl
cykKIAkgICAgICBidWlsZF9zdHJpbmcgKCJUaGUgc2Vjb25kIGFyZ3VtZW50IHRvIGBtYXRj
aCcgc2hvdWxkICIKIAkJICAgICAgICAgICAgImJlIGEgY2FwdHVyZSBuYW1lLCBub3QgYSBz
dHJpbmciKSk7CiAKLSAgTGlzcF9PYmplY3QgdGV4dCA9IHRyZWVzaXRfcHJlZGljYXRlX2Nh
cHR1cmVfbmFtZV90b190ZXh0IChjYXB0dXJlX25hbWUsCi0JCQkJCQkJICAgICBjYXB0dXJl
cyk7CisgIExpc3BfT2JqZWN0IG5vZGUgPSB0cmVlc2l0X3ByZWRpY2F0ZV9jYXB0dXJlX25h
bWVfdG9fbm9kZSAoY2FwdHVyZV9uYW1lLCBjYXB0dXJlcyk7CiAKLSAgaWYgKGZhc3Rfc3Ry
aW5nX21hdGNoIChyZWdleHAsIHRleHQpID49IDApCisgIHN0cnVjdCBidWZmZXIgKm9sZF9i
dWZmZXIgPSBjdXJyZW50X2J1ZmZlcjsKKyAgc3RydWN0IGJ1ZmZlciAqYnVmZmVyID0gWEJV
RkZFUiAoWFRTX1BBUlNFUiAoWFRTX05PREUgKG5vZGUpLT5wYXJzZXIpLT5idWZmZXIpOwor
ICBzZXRfYnVmZmVyX2ludGVybmFsIChidWZmZXIpOworCisgIFRTTm9kZSB0cmVlc2l0X25v
ZGUgPSBYVFNfTk9ERSAobm9kZSktPm5vZGU7CisgIHB0cmRpZmZfdCB2aXNpYmxlX2JlZyA9
IFhUU19QQVJTRVIgKFhUU19OT0RFIChub2RlKS0+cGFyc2VyKS0+dmlzaWJsZV9iZWc7Cisg
IHVpbnQzMl90IHN0YXJ0X2J5dGVfb2Zmc2V0ID0gdHNfbm9kZV9zdGFydF9ieXRlICh0cmVl
c2l0X25vZGUpOworICB1aW50MzJfdCBlbmRfYnl0ZV9vZmZzZXQgPSB0c19ub2RlX2VuZF9i
eXRlICh0cmVlc2l0X25vZGUpOworICBwdHJkaWZmX3Qgc3RhcnRfYnl0ZSA9IHZpc2libGVf
YmVnICsgc3RhcnRfYnl0ZV9vZmZzZXQ7CisgIHB0cmRpZmZfdCBlbmRfYnl0ZSA9IHZpc2li
bGVfYmVnICsgZW5kX2J5dGVfb2Zmc2V0OworICBwdHJkaWZmX3Qgc3RhcnRfcG9zID0gYnVm
X2J5dGVwb3NfdG9fY2hhcnBvcyAoYnVmZmVyLCBzdGFydF9ieXRlKTsKKyAgcHRyZGlmZl90
IGVuZF9wb3MgPSBidWZfYnl0ZXBvc190b19jaGFycG9zIChidWZmZXIsIGVuZF9ieXRlKTsK
KworICBwdHJkaWZmX3QgdmFsID0gZmFzdF9sb29raW5nX2F0IChyZWdleHAsIHN0YXJ0X3Bv
cywgc3RhcnRfYnl0ZSwgZW5kX3BvcywgZW5kX2J5dGUsIFFuaWwpOworCisgIHNldF9idWZm
ZXJfaW50ZXJuYWwgKG9sZF9idWZmZXIpOworCisgIGlmICh2YWwgPj0gMCkKICAgICByZXR1
cm4gdHJ1ZTsKICAgZWxzZQogICAgIHJldHVybiBmYWxzZTsK

--------------UQx0D8HGepFSU5bBQqKxNJmr--




Acknowledgement sent to Dmitry Gutov <dgutov@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs@HIDDEN. Full text available.
Report forwarded to bug-gnu-emacs@HIDDEN:
bug#60953; Package emacs. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Thu, 2 Feb 2023 12:15:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.