Received: (at 60953) by debbugs.gnu.org; 2 Feb 2023 12:13:05 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Feb 02 07:13:04 2023 Received: from localhost ([127.0.0.1]:32779 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pNYSe-0003pe-FQ for submit <at> debbugs.gnu.org; Thu, 02 Feb 2023 07:13:04 -0500 Received: from mail-wm1-f49.google.com ([209.85.128.49]:42848) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <raaahh@HIDDEN>) id 1pNYSc-0003ow-TZ for 60953 <at> debbugs.gnu.org; Thu, 02 Feb 2023 07:13:03 -0500 Received: by mail-wm1-f49.google.com with SMTP id j29-20020a05600c1c1d00b003dc52fed235so1231594wms.1 for <60953 <at> debbugs.gnu.org>; Thu, 02 Feb 2023 04:13:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id:sender:from:to:cc:subject :date:message-id:reply-to; bh=Uk/X+tLx1wLXEzXQNVMJ5v/ig+gG4vumt+aUs+Ju3oM=; b=OQ7CS6iYaH8OWd9CJ4Etx8Ss5bQX7WYPiiS+wGcMTHUE9KYEHWOXKGNDeWOu2IvStA +q9W5WHY1E9WkwKms2krHPRMjU7y1j/QUJKOnVs6AJPiasYEflPYFc71gUJrxD3PXCUy 6Z3+KGSKUc9QxovZcZWRRyyKIIE0Szu+g71M+rlIbwFVQ/L4pCXuwPlCbqdCWYtsWCKD ELQlEUpd5DqmaLT4wBC9l81kmXjLx17TIS4sDvGDI3dBPMnNDIVEaEiqQPuRAHiBI0QW KrDb9y8pTgHb8YndD0TyYfG+Ga5HbI6zuhHBv+00bKS3NICHdOKvNP/jvFleqhcSadhz DDpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id:sender:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Uk/X+tLx1wLXEzXQNVMJ5v/ig+gG4vumt+aUs+Ju3oM=; b=8AAEGM8CfAU0fvfGIoZkHoT4PYhtR/pLaACr5FPqyk9wINIzg230VBvleeb2eHQm0o bewikXaeuHVffdoTJRprCPztqt2UWxjJ6VAd4jAG5TFLJ/+8hFVNJuXkx+EmMpkrhyF+ xtE/tQicyJMkozgsT8Y8l3lJuR2yTIPP9tOfd6eTXVUruKml3yLxsYo6XKdU8I6uAhQi BsdTczFhrE4T48NN9/lQuzjxzF0ZHDNys+7bOielmo3HJy7s9hi6GPpSo/WUq7EmsjMW 0/V1PpcUcJP6QvxFpggb/6NRSajYZoMSRmTzcZgnr8nMKFWy+7rKQcDqTayA9qmZKcF7 U8Zw== X-Gm-Message-State: AO0yUKWfil1l5HBhSqSnEU6AZrDX5/8WMP1qswgR7D23K/kNE8sqEmiA 2j/D0HGhCbSBON9EHWDVtA8= X-Google-Smtp-Source: AK7set9JsLgIkTXU0rxa3smom2oMHTTirvCyWLvMlXGZ/K0yTc0j7NlRM+6mPOILAtIF9qAG39TJBQ== X-Received: by 2002:a05:600c:35cd:b0:3d3:5319:b6d3 with SMTP id r13-20020a05600c35cd00b003d35319b6d3mr5733764wmq.38.1675339976868; Thu, 02 Feb 2023 04:12:56 -0800 (PST) Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id x9-20020a05600c21c900b003dc434b39c7sm9252253wmj.0.2023.02.02.04.12.55 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 02 Feb 2023 04:12:55 -0800 (PST) Content-Type: multipart/mixed; boundary="------------1UISxF3jfS6GiC00qsvGlSvW" Message-ID: <c6677ced-9538-2a1b-0025-4d90888f1246@HIDDEN> Date: Thu, 2 Feb 2023 14:12:54 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Content-Language: en-US To: Eli Zaretskii <eliz@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN> <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN> <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN> <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN> <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN> <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> <83sffr2xq2.fsf@HIDDEN> <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN> <30b047a9-dfe3-948a-a123-2e15221c239d@HIDDEN> <83tu05zeqe.fsf@HIDDEN> <957c7e3e-830e-0a03-562f-fdddc6bd1c06@HIDDEN> <835yckzic7.fsf@HIDDEN> From: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <835yckzic7.fsf@HIDDEN> X-Spam-Score: 0.1 (/) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.9 (-) This is a multi-part message in MIME format. --------------1UISxF3jfS6GiC00qsvGlSvW Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 02/02/2023 08:34, Eli Zaretskii wrote: >> Date: Wed, 1 Feb 2023 23:20:50 +0200 >> Cc:casouri@HIDDEN,60953 <at> debbugs.gnu.org >> From: Dmitry Gutov<dgutov@HIDDEN> >> >> On 01/02/2023 15:39, Eli Zaretskii wrote: >>>> Please see the attachment. >>>> >>>> To note the numbers: the first patch does quite well to improve the >>>> performance of modes which use :match in queries which match a lot of >>>> nodes (e.g. rust-ts-mode or ruby-ts-mode). There boost in those was on >>>> the order of 25%. >>>> >>>> The second one is longer, and the boost (on top of the first one) is >>>> around 5-6%, stable. Not as impressive, but at least it brings :match's >>>> performance a little above :pred's in my example. >>> Fine by me, if Yuan also approves. >> For emacs-29, right? > Yes. Please take a look at this additional patch on top of the other one. It is ok? I just remembered that if we don't want to auto-anchor the regexp, then 'looking-at' is not the appropriate semantic. So this switches to 'search_buffer'. --------------1UISxF3jfS6GiC00qsvGlSvW Content-Type: text/x-patch; charset=UTF-8; name="treesit_predicate_match_use_search_buffer.diff" Content-Disposition: attachment; filename="treesit_predicate_match_use_search_buffer.diff" Content-Transfer-Encoding: base64 ZGlmZiAtLWdpdCBhL3NyYy9saXNwLmggYi9zcmMvbGlzcC5oCmluZGV4IDcwNTU1YjNjZTkx Li4xMjc2Mjg1ZTJmMiAxMDA2NDQKLS0tIGEvc3JjL2xpc3AuaAorKysgYi9zcmMvbGlzcC5o CkBAIC00ODAyLDYgKzQ4MDIsOSBAQCBmYXN0X2Nfc3RyaW5nX21hdGNoX2lnbm9yZV9jYXNl IChMaXNwX09iamVjdCByZWdleHAsCiAJCQkJICAgICAgIHB0cmRpZmZfdCwgcHRyZGlmZl90 ICopOwogZXh0ZXJuIHB0cmRpZmZfdCBmaW5kX2JlZm9yZV9uZXh0X25ld2xpbmUgKHB0cmRp ZmZfdCwgcHRyZGlmZl90LAogCQkJCQkgICBwdHJkaWZmX3QsIHB0cmRpZmZfdCAqKTsKK2V4 dGVybiBFTUFDU19JTlQgc2VhcmNoX2J1ZmZlciAoTGlzcF9PYmplY3QsIHB0cmRpZmZfdCwg cHRyZGlmZl90LAorCQkJCXB0cmRpZmZfdCwgcHRyZGlmZl90LCBFTUFDU19JTlQsCisJCQkJ aW50LCBMaXNwX09iamVjdCwgTGlzcF9PYmplY3QsIGJvb2wpOwogZXh0ZXJuIHZvaWQgc3lt c19vZl9zZWFyY2ggKHZvaWQpOwogZXh0ZXJuIHZvaWQgY2xlYXJfcmVnZXhwX2NhY2hlICh2 b2lkKTsKIApkaWZmIC0tZ2l0IGEvc3JjL3NlYXJjaC5jIGIvc3JjL3NlYXJjaC5jCmluZGV4 IGRiYzVhODM5NDZmLi4wYmI1MmMwM2VlZiAxMDA2NDQKLS0tIGEvc3JjL3NlYXJjaC5jCisr KyBiL3NyYy9zZWFyY2guYwpAQCAtNjgsOSArNjgsNiBAQCAjZGVmaW5lIFJFR0VYUF9DQUNI RV9TSVpFIDIwCiBzdGF0aWMgRU1BQ1NfSU5UIGJveWVyX21vb3JlIChFTUFDU19JTlQsIHVu c2lnbmVkIGNoYXIgKiwgcHRyZGlmZl90LAogICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgTGlzcF9PYmplY3QsIExpc3BfT2JqZWN0LCBwdHJkaWZmX3QsCiAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICBwdHJkaWZmX3QsIGludCk7Ci1zdGF0aWMgRU1BQ1NfSU5UIHNl YXJjaF9idWZmZXIgKExpc3BfT2JqZWN0LCBwdHJkaWZmX3QsIHB0cmRpZmZfdCwKLSAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgcHRyZGlmZl90LCBwdHJkaWZmX3QsIEVNQUNT X0lOVCwgaW50LAotICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBMaXNwX09iamVj dCwgTGlzcF9PYmplY3QsIGJvb2wpOwogCiBMaXNwX09iamVjdCByZV9tYXRjaF9vYmplY3Q7 CiAKQEAgLTE1MTAsNyArMTUwNyw3IEBAIHNlYXJjaF9idWZmZXJfbm9uX3JlIChMaXNwX09i amVjdCBzdHJpbmcsIHB0cmRpZmZfdCBwb3MsCiAgIHJldHVybiByZXN1bHQ7CiB9CiAKLXN0 YXRpYyBFTUFDU19JTlQKK0VNQUNTX0lOVAogc2VhcmNoX2J1ZmZlciAoTGlzcF9PYmplY3Qg c3RyaW5nLCBwdHJkaWZmX3QgcG9zLCBwdHJkaWZmX3QgcG9zX2J5dGUsCiAJICAgICAgIHB0 cmRpZmZfdCBsaW0sIHB0cmRpZmZfdCBsaW1fYnl0ZSwgRU1BQ1NfSU5UIG4sCiAJICAgICAg IGludCBSRSwgTGlzcF9PYmplY3QgdHJ0LCBMaXNwX09iamVjdCBpbnZlcnNlX3RydCwgYm9v bCBwb3NpeCkKZGlmZiAtLWdpdCBhL3NyYy90cmVlc2l0LmMgYi9zcmMvdHJlZXNpdC5jCmlu ZGV4IDUxMDE3MGNhNjQwLi4wNjlmYTM2MDhiZCAxMDA2NDQKLS0tIGEvc3JjL3RyZWVzaXQu YworKysgYi9zcmMvdHJlZXNpdC5jCkBAIC0yNDkxLDcgKzI0OTEsOCBAQCB0cmVlc2l0X3By ZWRpY2F0ZV9tYXRjaCAoTGlzcF9PYmplY3QgYXJncywgc3RydWN0IGNhcHR1cmVfcmFuZ2Ug Y2FwdHVyZXMpCiAgIFpWID0gZW5kX3BvczsKICAgWlZfQllURSA9IGVuZF9ieXRlOwogCi0g IHB0cmRpZmZfdCB2YWwgPSBmYXN0X2xvb2tpbmdfYXQgKHJlZ2V4cCwgc3RhcnRfcG9zLCBz dGFydF9ieXRlLCBlbmRfcG9zLCBlbmRfYnl0ZSwgUW5pbCk7CisgIHB0cmRpZmZfdCB2YWwg PSBzZWFyY2hfYnVmZmVyIChyZWdleHAsIHN0YXJ0X3Bvcywgc3RhcnRfYnl0ZSwgZW5kX3Bv cywgZW5kX2J5dGUsCisJCQkJIDEsIDEsIFFuaWwsIFFuaWwsIGZhbHNlKTsKIAogICBCRUdW ID0gb2xkX2JlZ3Y7CiAgIEJFR1ZfQllURSA9IG9sZF9iZWd2X2J5dGU7Cg== --------------1UISxF3jfS6GiC00qsvGlSvW--
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 2 Feb 2023 06:34:26 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Feb 02 01:34:26 2023 Received: from localhost ([127.0.0.1]:60413 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pNTAv-0000d1-QE for submit <at> debbugs.gnu.org; Thu, 02 Feb 2023 01:34:26 -0500 Received: from eggs.gnu.org ([209.51.188.92]:54178) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1pNTAr-0000cn-Vf for 60953 <at> debbugs.gnu.org; Thu, 02 Feb 2023 01:34:24 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pNTAm-0003SY-9a; Thu, 02 Feb 2023 01:34:16 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=FQs895Incn6zWM0/+XUNfunRiTuEMmGMa5zqyED8NDA=; b=Y9bwBiNxyHu4 S+wfC87uz3c/2SrDhfTEJ2GZHvTfNAfiljwxrHtLgUSILcGkYm3bNUN2x/qbU2bXyflbxhPm2qlcl sHbtSoXSpj4bkpoUF1cK4ovoHt/6py411lVfLudGYhqirP8nd/H3fR6kDVWy9ttokFpTHT1yX7vT2 0iE/l4A+KXcBmeG3dUiCZo79hm8HSWq8aO4dqByrkrTfyc7t4dGZpGKmYkyP9oQn3OkO7g5GZ15+H VkBxOCC1kZsxL+oa50ReQZdb2C2j0YTOjdcdPAzmi3AuhSFi+YVT4oiW92ad6yNlIT13ZDMaAa8dF ZggY/wdXgHKhQVGNWBEEVQ==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pNTAl-0007gZ-Bi; Thu, 02 Feb 2023 01:34:15 -0500 Date: Thu, 02 Feb 2023 08:34:16 +0200 Message-Id: <835yckzic7.fsf@HIDDEN> From: Eli Zaretskii <eliz@HIDDEN> To: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <957c7e3e-830e-0a03-562f-fdddc6bd1c06@HIDDEN> (message from Dmitry Gutov on Wed, 1 Feb 2023 23:20:50 +0200) Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN> <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN> <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN> <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN> <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN> <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> <83sffr2xq2.fsf@HIDDEN> <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN> <30b047a9-dfe3-948a-a123-2e15221c239d@HIDDEN> <83tu05zeqe.fsf@HIDDEN> <957c7e3e-830e-0a03-562f-fdddc6bd1c06@HIDDEN> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) > Date: Wed, 1 Feb 2023 23:20:50 +0200 > Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org > From: Dmitry Gutov <dgutov@HIDDEN> > > On 01/02/2023 15:39, Eli Zaretskii wrote: > >> Please see the attachment. > >> > >> To note the numbers: the first patch does quite well to improve the > >> performance of modes which use :match in queries which match a lot of > >> nodes (e.g. rust-ts-mode or ruby-ts-mode). There boost in those was on > >> the order of 25%. > >> > >> The second one is longer, and the boost (on top of the first one) is > >> around 5-6%, stable. Not as impressive, but at least it brings :match's > >> performance a little above :pred's in my example. > > Fine by me, if Yuan also approves. > > For emacs-29, right? Yes.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 2 Feb 2023 02:16:24 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Feb 01 21:16:24 2023 Received: from localhost ([127.0.0.1]:60182 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pNP9E-00023H-7K for submit <at> debbugs.gnu.org; Wed, 01 Feb 2023 21:16:24 -0500 Received: from mail-pl1-f175.google.com ([209.85.214.175]:40829) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <casouri@HIDDEN>) id 1pNP9D-000233-3s for 60953 <at> debbugs.gnu.org; Wed, 01 Feb 2023 21:16:23 -0500 Received: by mail-pl1-f175.google.com with SMTP id be8so443860plb.7 for <60953 <at> debbugs.gnu.org>; Wed, 01 Feb 2023 18:16:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=JPx6jeI1Sa9CCyOeKgpaxHkYRohmeOwjbWyee+q70cY=; b=hKAxHoEcoQutZQlueg6dRRkKLHYFIS6ofSRq53HY9kRlZ0mj0EduEJ8jtN7QsKitG5 kyQ+JiSeWhc6clZE1UiUPprsTSH+dBga+Ru19zmUfCfI6Y5YHmnmDx4eRwYokxqnjyii K6KlORhxpM+YKVLxL5bqNiZjQ2UfnGP+/EP+zQvgjuwYzuLHUyYFexvY+PaF4y9wIE0t okP01sh0HO+z181jIcvwGXbgFWLWib/z7VAv4iWcJ/HYdDsUmHMlt8tqjjPS2n4VWxLv xYK0UzbSCIlVR3pDO5J26T0Xwtf7YG3BDOW4D6ZBmtmO2kfn2LD2yh4w89qT+FkMAivG uPIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JPx6jeI1Sa9CCyOeKgpaxHkYRohmeOwjbWyee+q70cY=; b=FsZR3refYOJDfSEqG0xUZg7v12p74+pnlHwaKfZe7yupuvnPANkefo9ipDhaTUzZ9S XO9G3mtrWg+lpCE4yzpcsLdjk1NO0eVD5X/Rxsi5eEYY8AaPDvkdcxiZJ5Kf5XMfCQpW c0Ranf8CyeA055NovMIFS43mp+lct8lopYy5lc0cogLJe0v6yl9o8s2y6CYEjmIM7lsE lvIcOF5r2LeYEXkYTYZr3DlnXqOhTwsXDZ7Wrl7LCBRO+NX03smkiwvgdwmLKjByfMCW XawczEUPV+NO7W8tjTZNtbF8bHkX8zWDcpF+7ptWjeqCLpeRsoUA12wIvUwhkAnluAfw 2g1Q== X-Gm-Message-State: AO0yUKUvCrWSKCbOJWTnEUSGRPiq782c7CjrRbXvuDVoKrU4rlr3eB3C 3jq2JVZHkibPF6YWkDHXJF4= X-Google-Smtp-Source: AK7set8DEVCCyhkxVd86Fzs6haQ6L+Qg2qzZnJ1NEA8LLLnpz0Vvl98Gq1qVP97X1hmsGLtTlel2Ww== X-Received: by 2002:a17:90a:1a46:b0:227:c69:3ca7 with SMTP id 6-20020a17090a1a4600b002270c693ca7mr5033665pjl.10.1675304177035; Wed, 01 Feb 2023 18:16:17 -0800 (PST) Received: from smtpclient.apple (cpe-172-117-161-177.socal.res.rr.com. [172.117.161.177]) by smtp.gmail.com with ESMTPSA id f22-20020a17090ac29600b00228f45d589fsm2010996pjt.29.2023.02.01.18.16.16 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 01 Feb 2023 18:16:16 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.300.101.1.3\)) Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient From: Yuan Fu <casouri@HIDDEN> In-Reply-To: <957c7e3e-830e-0a03-562f-fdddc6bd1c06@HIDDEN> Date: Wed, 1 Feb 2023 18:16:05 -0800 Content-Transfer-Encoding: 7bit Message-Id: <644EE883-9DC1-44B6-BD6C-2082319324F2@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN> <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN> <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN> <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN> <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN> <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> <83sffr2xq2.fsf@HIDDEN> <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN> <30b047a9-dfe3-948a-a123-2e15221c239d@HIDDEN> <83tu05zeqe.fsf@HIDDEN> <957c7e3e-830e-0a03-562f-fdddc6bd1c06@HIDDEN> To: Dmitry Gutov <dgutov@HIDDEN> X-Mailer: Apple Mail (2.3731.300.101.1.3) X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 60953 Cc: Eli Zaretskii <eliz@HIDDEN>, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) > On Feb 1, 2023, at 1:20 PM, Dmitry Gutov <dgutov@HIDDEN> wrote: > > On 01/02/2023 15:39, Eli Zaretskii wrote: >>> Please see the attachment. >>> >>> To note the numbers: the first patch does quite well to improve the >>> performance of modes which use :match in queries which match a lot of >>> nodes (e.g. rust-ts-mode or ruby-ts-mode). There boost in those was on >>> the order of 25%. >>> >>> The second one is longer, and the boost (on top of the first one) is >>> around 5-6%, stable. Not as impressive, but at least it brings :match's >>> performance a little above :pred's in my example. >> Fine by me, if Yuan also approves. > > For emacs-29, right? > > Waiting for Yuan's confirmation. Yeah please go ahead :-) Yuan
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 1 Feb 2023 21:21:01 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Feb 01 16:21:00 2023 Received: from localhost ([127.0.0.1]:60017 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pNKXM-0002zT-OP for submit <at> debbugs.gnu.org; Wed, 01 Feb 2023 16:21:00 -0500 Received: from mail-wr1-f54.google.com ([209.85.221.54]:44006) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <raaahh@HIDDEN>) id 1pNKXK-0002zC-Vy for 60953 <at> debbugs.gnu.org; Wed, 01 Feb 2023 16:20:59 -0500 Received: by mail-wr1-f54.google.com with SMTP id h12so18567240wrv.10 for <60953 <at> debbugs.gnu.org>; Wed, 01 Feb 2023 13:20:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=a6k/8yzXBVR8UiMQTrQMop4ghtbftoPYQVp0ZaZJzsg=; b=c8TmlCPimVrxVvYCvC4u5ntOhw92dNhc9CitcDsjM1TdofXwWacBTqgFcSos2TH0uY cx9HS062Aah9n5+gV0t838el6aY/oCa3RRA9xHfkNXsUU3JuoAaX0rHJcXnGjIV7uVNh LvdjQYbDDnCFgWVJIZqppbI3mNEGfmfFIW/U1HxbRbIpuWSkzaykBRpcgcz2CqFWx5Xl Vwhu5kaNqi/dmxJwx2rDR6fENhRDZ+UVUhJ4FhbTCDmNocZN0K8ANi6cl+ND7RTjd9k8 A8LCiV4JpzQifpizUVTW6vE8Hs05UPgy/iaYJrq3px1iLn1XVbMSlgApU7YOBdf9jTko VDvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=a6k/8yzXBVR8UiMQTrQMop4ghtbftoPYQVp0ZaZJzsg=; b=CYoy6greTDZjqu9gA3pVP2QLZ4rBX3WJrpPs+wpsLzdDqNbrAQw89YIDfM3OQ/X+Nc Mic83L03fxcMhKjVmwL7WXYBvvqDG5OBIIep7/ehMy8Js89pnO+eFQofxcL4ujmBH98a leXKbbTfmi9j3Xy7q1834l7EATISf4WLLjnLz6IKtYaTb97ykkK27aJo5my3IrHhs3Fr 0Spr8L08AHIWcCuhpOY8bksV4OPZH/aDUOBiQy+7bFfEkfY2943CXhR8qvbIR3Pp7wnU EhoNb9vycihZMQunfHVdnJdnD/dP05poO3OJe1wyzeIMPTZaNT3J/cmiwbvidxUnPaGx vI6g== X-Gm-Message-State: AO0yUKWcXDAkksTIHqSSuynxMQWRsQO7Cc/6ffHRmePDl0HCYS19YX01 7gd06uvIoR23cXdo2vq66SM= X-Google-Smtp-Source: AK7set/WHT3MNuCFUKkQIPHwJ+u0F4kjD+s0eEeYPo0ldQKT8U4TWKxjd2bL29XoanMbX2MnF1lhow== X-Received: by 2002:a05:6000:10c5:b0:242:1b0d:9c58 with SMTP id b5-20020a05600010c500b002421b0d9c58mr3117191wrx.69.1675286452827; Wed, 01 Feb 2023 13:20:52 -0800 (PST) Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id r6-20020adff106000000b002bfe05bf6dcsm13195858wro.88.2023.02.01.13.20.51 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 01 Feb 2023 13:20:51 -0800 (PST) Message-ID: <957c7e3e-830e-0a03-562f-fdddc6bd1c06@HIDDEN> Date: Wed, 1 Feb 2023 23:20:50 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Content-Language: en-US To: Eli Zaretskii <eliz@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN> <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN> <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN> <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN> <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN> <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> <83sffr2xq2.fsf@HIDDEN> <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN> <30b047a9-dfe3-948a-a123-2e15221c239d@HIDDEN> <83tu05zeqe.fsf@HIDDEN> From: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <83tu05zeqe.fsf@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.9 (/) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.9 (-) On 01/02/2023 15:39, Eli Zaretskii wrote: >> Please see the attachment. >> >> To note the numbers: the first patch does quite well to improve the >> performance of modes which use :match in queries which match a lot of >> nodes (e.g. rust-ts-mode or ruby-ts-mode). There boost in those was on >> the order of 25%. >> >> The second one is longer, and the boost (on top of the first one) is >> around 5-6%, stable. Not as impressive, but at least it brings :match's >> performance a little above :pred's in my example. > Fine by me, if Yuan also approves. For emacs-29, right? Waiting for Yuan's confirmation.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 1 Feb 2023 15:15:54 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Feb 01 10:15:54 2023 Received: from localhost ([127.0.0.1]:59626 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pNEq1-0001CZ-TF for submit <at> debbugs.gnu.org; Wed, 01 Feb 2023 10:15:54 -0500 Received: from mail-ej1-f53.google.com ([209.85.218.53]:44013) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <raaahh@HIDDEN>) id 1pNEpz-0001CL-6Y for 60953 <at> debbugs.gnu.org; Wed, 01 Feb 2023 10:15:52 -0500 Received: by mail-ej1-f53.google.com with SMTP id mc11so29862849ejb.10 for <60953 <at> debbugs.gnu.org>; Wed, 01 Feb 2023 07:15:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=jYXlKai+v24f9CvrcPXvjUu2TE94Y3TpFEQWFAkVg3I=; b=OfvcIrCrR812wLdc5Io9ZZFegUOPYDjQP9bhRsDq27SaDfpyps311maRkokfyfIs5+ 1zxHnBtJ0ALfi8D5X8H4GwW4apKhbsu6rifU9fdhUxH1xpEbH/5ipDSxwtJDP5SYx3cK +5Wj4Iz2tZ9I3/hB6JZUbX4IU2mkHKr+LTGVgKK8Yy3Q2H3X3nHFQJA1+sDxHe7rxX2v im+EFpOHpE0T2P9uPTF07zCwYz8bpt15AfzrSKuoU1LzP81cUg6MrF6fH3jddP9Blb7a B0d5LZIXf7jn616/W8D028tbRA78vGZnXsmBH9CmK8qcp/P97cgp5bEho4M2KLaM+hRZ dQkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=jYXlKai+v24f9CvrcPXvjUu2TE94Y3TpFEQWFAkVg3I=; b=c1cVcUEoU5vOaFpoxiuCgtZO7eQvbrvirDc9mx3LPCWj6MsWy4wmeyZMId4bSNdpEy 6S7kFP5Z6zoEJ1+mmxEETjUOxiDScfJ0eR8YypSOdfYr5E0bW1tdsIkG1Jq4VJhkIIU+ 21P4Fl5Qq4TAHP0JuP7I8y3WLTFFF8Kgf098nVvIeuUSYg2xJI3BI5IbpNdSDjIqZhci 93okA88iL3Im/FSY2QTyPJBO0/ChxFmPsmr1gIgeKuLRhkbjzDcvexmRJ+wyWUtlFtgP hBAwARyh0DdXm1rVvWBoXPGBLu2svZLbm120FZNLyz+jGx2GfUAlmljb1CI18Cu+vLEu GF1g== X-Gm-Message-State: AO0yUKXDAnwox0Pm8mHs9MzCwX6+LIRTiEOwusRp2U5fZ1i0GvDUuZyf iIEIla0tYAQsXcX3JrPqed8= X-Google-Smtp-Source: AK7set9VgUNa23ky7Mf2o4C2T2ITfAZ8cOzoDlepAmJAdiTepdx21k6LAp9enIeb5DJFgDVBXWkaSQ== X-Received: by 2002:a17:906:8a63:b0:888:7ce4:1dc1 with SMTP id hy3-20020a1709068a6300b008887ce41dc1mr2823022ejc.26.1675264545292; Wed, 01 Feb 2023 07:15:45 -0800 (PST) Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id h15-20020a170906584f00b00886ec4f2fc7sm5866902ejs.17.2023.02.01.07.15.44 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 01 Feb 2023 07:15:44 -0800 (PST) Message-ID: <85293540-cc52-1535-eb8b-85adf646c016@HIDDEN> Date: Wed, 1 Feb 2023 17:15:43 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Content-Language: en-US To: Eli Zaretskii <eliz@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN> <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN> <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN> <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN> <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN> <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> <83sffr2xq2.fsf@HIDDEN> <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN> <83zg9xzg35.fsf@HIDDEN> From: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <83zg9xzg35.fsf@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.1 (/) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.9 (-) On 01/02/2023 15:10, Eli Zaretskii wrote: >> Date: Tue, 31 Jan 2023 20:16:01 +0200 >> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org >> From: Dmitry Gutov <dgutov@HIDDEN> >> >>> Can you describe what that function should do? I don't think I have a >>> clear idea of that. >> >> In Lisp that function could be implemented as >> >> (defun buffer-substring-match (regexp &optional start end >> inhibit-modify) >> (string-match regexp >> (buffer-substring (or start (point-min)) >> (or end (point-max))) >> inhibit-modify)) >> >> Meaning, it matches the regexp against the buffer substring, with the >> string-start and string-end anchors working. >> >> But it would be implemented in C, meaning we could avoid the extra >> consing and funcall overhead. > > Now I'm confused, because I thought the C functions we were > considering all fit the above description. Except they don't match \` and \' at START and END. Only at actual point-min and point-max. I suppose the new function could be implemented with narrowing as well. If we decide it's the best method. >> Anyway, it seems like it might be too late as an addition to Emacs 29. >> And we can implement the match predicate using narrowing for this >> release, to be updated later. > > Right, this is not a catastrophe, IMO. The work on TS-based modes is > just beginning. I also don't see any particular slowdown from altering BEGV and ZV in C. So the current method might be the fastest possible anyway.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 1 Feb 2023 15:13:24 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Feb 01 10:13:24 2023 Received: from localhost ([127.0.0.1]:59619 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pNEnb-00018M-Pz for submit <at> debbugs.gnu.org; Wed, 01 Feb 2023 10:13:24 -0500 Received: from mail-ed1-f43.google.com ([209.85.208.43]:36527) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <raaahh@HIDDEN>) id 1pNEnZ-000189-EV for 60953 <at> debbugs.gnu.org; Wed, 01 Feb 2023 10:13:22 -0500 Received: by mail-ed1-f43.google.com with SMTP id u21so18010411edv.3 for <60953 <at> debbugs.gnu.org>; Wed, 01 Feb 2023 07:13:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=JCimCbq7bSkc8Y14IceIBWg93GEGP2jYRPo4vjT7DaE=; b=XjIeUfVZOBEofLAdGiucqrc2EbXe9XhxwSDrK70zXUucSAEsRfQtD4/qZOSFsSi/Hj 7+PKV9EU9wKlDfoBLcZ3sjtMTlDXn186DALobIDEvTkewIOgJ9sQYPez0qHSCdYajNpq PwY9f7+t8xGkJh6UNnMvVqwVyKHjmsdnuNm52XUnhWwAIGANs+r08qARP3hZkioVr+Sh sHgo9L3fQF1Bd8V1BS54IYZMKI6FsTaI8kJWA2lN/UdetGXbmQF1TAoF+6rq7eWKDjti GI3J/EzDP6/u4SWOeQIJCoFYa9BcYccJRpDDsh40VzLOvy2MePIPNee+VKwSJycAy+ae IlIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=JCimCbq7bSkc8Y14IceIBWg93GEGP2jYRPo4vjT7DaE=; b=ZxiQJbFppSvvpwjAkw/TVQEaf4bhO9bd1M+eSTmrEHRT/wMv+6uFS2VN1SjS9KJGtl R/lLz4efdWQ49fHb0pbfsnvWYJZ5Exy+xKGRb1tbUysvrCkzG9DREGJrEdD+RhJRAtR+ /XmMIt1jC/X+7U7z59P1sYcREInAIZ+NqvXlUgArmNhJyRSn7KD40symD5nXIOvkXKpE HlpUoQxsH+uNwHar1WfHffyQBwPbqcW1SJF9ZdaK4hO6SGTsK0bTtHW3Aq6TH8NNv0oz SOP60tGsp8piJbFpMYbIyT1Qt/vhGqw/8mo+QfBMZJddau2/3Tx4s/Q9PoEVFRyjhuyn itgg== X-Gm-Message-State: AO0yUKXfh9+4EUGO46zrtfLQw74TBVCixXQJkUzFbqB5BgJh+tfsrpX1 74VFWHcVq6ye/GtZXd5nDuw= X-Google-Smtp-Source: AK7set9xioumkZWMZ37VxoHo6NtfKMQDOlW9XVPGr/Q0hF32onQgGX96HY4FhYOPmb59JvxPNX/Iew== X-Received: by 2002:aa7:dac6:0:b0:46f:a6ea:202 with SMTP id x6-20020aa7dac6000000b0046fa6ea0202mr2711837eds.37.1675264395899; Wed, 01 Feb 2023 07:13:15 -0800 (PST) Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id dc19-20020a170906c7d300b00887a28ac01asm5384769ejb.31.2023.02.01.07.13.14 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 01 Feb 2023 07:13:15 -0800 (PST) Message-ID: <1103a065-ecd5-45f0-c5e7-d357d7674aaa@HIDDEN> Date: Wed, 1 Feb 2023 17:13:13 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Content-Language: en-US To: Eli Zaretskii <eliz@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN> <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN> <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN> <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN> <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN> <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> <83sffr2xq2.fsf@HIDDEN> <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN> <30b047a9-dfe3-948a-a123-2e15221c239d@HIDDEN> <83tu05zeqe.fsf@HIDDEN> From: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <83tu05zeqe.fsf@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.1 (/) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.9 (-) On 01/02/2023 15:39, Eli Zaretskii wrote: >> Date: Wed, 1 Feb 2023 04:39:29 +0200 >> From: Dmitry Gutov<dgutov@HIDDEN> >> Cc:casouri@HIDDEN,60953 <at> debbugs.gnu.org >> >> So here. I've installed the first patch, which didn't raise up too many >> concerns previously, and here's the new iteration on the second patch. > Thanks, but please in the future when you make changes which call > functions from external libraries that were not called previously, be > sure to do the DEF_DLL_FN/LOAD_DLL_FN and the #undef/#define dance > needed to avoid breaking the MS-Windows build. Will do, sorry about that.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 1 Feb 2023 13:40:07 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Feb 01 08:40:07 2023 Received: from localhost ([127.0.0.1]:56866 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pNDLK-0006UD-TZ for submit <at> debbugs.gnu.org; Wed, 01 Feb 2023 08:40:07 -0500 Received: from eggs.gnu.org ([209.51.188.92]:46670) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1pNDLF-0006TY-KN for 60953 <at> debbugs.gnu.org; Wed, 01 Feb 2023 08:40:05 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pNDLA-0007QJ-8k; Wed, 01 Feb 2023 08:39:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=nMEWfD7BlffeQdd58d00gQN5sPYsooACvOBlTDJqV94=; b=LXr/+oYqMj7c 9TImnSJw3/CSvZJWDDyeHyheHtYOTcL8DXit71rwMn8jbN7pkM/pLUMf0fQV6lBwmHK266y5RFplV mjyfR0G644v+Xs6jzxPCA5qL3JfJDzhEoMo2KH4UGLYJeZ2OTCSPhqg+ePiRtXi3B6QU9bzo1mfke R8jYmGnNpvTGtOArreggCqGlIUQ4RWk758lz3wZYIhKK6ADb1Z+vjkpVcJ7zNv14FyMZeZsz2cfQ+ /NLLdvshg91nVlNJuc406WOYT3KkuHhpB5oTwQ1bUN7lrj/ZDBY0/fIY3W/7H8iu8T91su0XiqnGn tgUN+25AvnYFckQ64go4bw==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pNDL9-0002oz-G8; Wed, 01 Feb 2023 08:39:56 -0500 Date: Wed, 01 Feb 2023 15:39:53 +0200 Message-Id: <83tu05zeqe.fsf@HIDDEN> From: Eli Zaretskii <eliz@HIDDEN> To: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <30b047a9-dfe3-948a-a123-2e15221c239d@HIDDEN> (message from Dmitry Gutov on Wed, 1 Feb 2023 04:39:29 +0200) Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN> <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN> <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN> <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN> <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN> <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> <83sffr2xq2.fsf@HIDDEN> <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN> <30b047a9-dfe3-948a-a123-2e15221c239d@HIDDEN> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) > Date: Wed, 1 Feb 2023 04:39:29 +0200 > From: Dmitry Gutov <dgutov@HIDDEN> > Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org > > So here. I've installed the first patch, which didn't raise up too many > concerns previously, and here's the new iteration on the second patch. Thanks, but please in the future when you make changes which call functions from external libraries that were not called previously, be sure to do the DEF_DLL_FN/LOAD_DLL_FN and the #undef/#define dance needed to avoid breaking the MS-Windows build. > Please see the attachment. > > To note the numbers: the first patch does quite well to improve the > performance of modes which use :match in queries which match a lot of > nodes (e.g. rust-ts-mode or ruby-ts-mode). There boost in those was on > the order of 25%. > > The second one is longer, and the boost (on top of the first one) is > around 5-6%, stable. Not as impressive, but at least it brings :match's > performance a little above :pred's in my example. Fine by me, if Yuan also approves.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 1 Feb 2023 13:10:53 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Feb 01 08:10:53 2023 Received: from localhost ([127.0.0.1]:56823 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pNCt2-0005jX-RA for submit <at> debbugs.gnu.org; Wed, 01 Feb 2023 08:10:53 -0500 Received: from eggs.gnu.org ([209.51.188.92]:48980) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1pNCsx-0005jC-N0 for 60953 <at> debbugs.gnu.org; Wed, 01 Feb 2023 08:10:50 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pNCsr-0001V8-3k; Wed, 01 Feb 2023 08:10:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=PpOa9Ez+uUuDLqDUYK2sOaDYnxAwdtCrC1nsTlmlLQU=; b=KgeRr6vcCl7b LsCOo7UqaLn0oMT2vy501lEENa8njVzEgsJTAH24DAB8TAAdAst5V2mQz4lHPBb8flb94n6Cn1w/x 50j3DappiaKQ4+YWSZtawqdZxekMmxokvdPBuukqq2W4S/i4BxRD+8nQO9qMkUlhNeqAvPLBpVYi7 4wt8DcOnG/EZhvqVXaHE+nnzX1CvuUZ3juJRUTZLF2qlUQZqlk0OmqUCr4zojYD2h1ESorU2Q3Jd0 ESM7LlaO5yPC0Q+cxecW754X8YDGL/vgRmsOUA+bOX55/3iFyvTjFCaIU7ugrfjU6+uEUr9c9jY7B NN4hW8EY2EnTflvbuPWJdA==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pNCsq-00042c-8T; Wed, 01 Feb 2023 08:10:40 -0500 Date: Wed, 01 Feb 2023 15:10:38 +0200 Message-Id: <83zg9xzg35.fsf@HIDDEN> From: Eli Zaretskii <eliz@HIDDEN> To: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN> (message from Dmitry Gutov on Tue, 31 Jan 2023 20:16:01 +0200) Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN> <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN> <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN> <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN> <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN> <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> <83sffr2xq2.fsf@HIDDEN> <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) > Date: Tue, 31 Jan 2023 20:16:01 +0200 > Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org > From: Dmitry Gutov <dgutov@HIDDEN> > > > Can you describe what that function should do? I don't think I have a > > clear idea of that. > > In Lisp that function could be implemented as > > (defun buffer-substring-match (regexp &optional start end > inhibit-modify) > (string-match regexp > (buffer-substring (or start (point-min)) > (or end (point-max))) > inhibit-modify)) > > Meaning, it matches the regexp against the buffer substring, with the > string-start and string-end anchors working. > > But it would be implemented in C, meaning we could avoid the extra > consing and funcall overhead. Now I'm confused, because I thought the C functions we were considering all fit the above description. > Anyway, it seems like it might be too late as an addition to Emacs 29. > And we can implement the match predicate using narrowing for this > release, to be updated later. Right, this is not a catastrophe, IMO. The work on TS-based modes is just beginning.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 1 Feb 2023 02:39:39 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Tue Jan 31 21:39:39 2023 Received: from localhost ([127.0.0.1]:55456 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pN32B-0008Bx-DB for submit <at> debbugs.gnu.org; Tue, 31 Jan 2023 21:39:39 -0500 Received: from mail-ej1-f46.google.com ([209.85.218.46]:38894) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <raaahh@HIDDEN>) id 1pN329-0008Bj-Hp for 60953 <at> debbugs.gnu.org; Tue, 31 Jan 2023 21:39:38 -0500 Received: by mail-ej1-f46.google.com with SMTP id gr7so22721701ejb.5 for <60953 <at> debbugs.gnu.org>; Tue, 31 Jan 2023 18:39:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:references:cc:to:from:content-language:subject :user-agent:mime-version:date:message-id:sender:from:to:cc:subject :date:message-id:reply-to; bh=05k4NAj9fq6J6DLfYpL6tR6z50ztcxv4xCMUbl6SESk=; b=OmfosSW3sYQGPP8Gauw9WSd2Yns9kWiG4tgP2237kqCjWmUq83tciY2BN5tzCQYTtU GnXM+si6yLV1TI/MIsBj9V+HBBXHYjoOi3M9UyXIF0m2Si4S6Q/sv/jwcvULoXXUMP3/ 55Fju1I5N9Dtow6A90ONdE79Yk0KeeZbpmWtt0O8mlvC1YB/FLbNwhE5D7UZ54DCJaP5 2yaWT4wWNGJBVs7lnmXuqfOLg5UPWHK3Olo+6HX5xIRXBU3WPeqmanI6nNxWz+jE2/Rg txOr0RxQxjqClog4POLkBDsRo0CMQrmvyqb+JRJ9otP3WzuTvSWeruLucrje5/SXTt46 EB8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:references:cc:to:from:content-language:subject :user-agent:mime-version:date:message-id:sender:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=05k4NAj9fq6J6DLfYpL6tR6z50ztcxv4xCMUbl6SESk=; b=0JJxTzmOQAph/RID5TBBzCXh6juPfEgssxYWjdeTJeM61otq73eeJveF50RHXVztJl AF8acGthwfh6eLyCKTbGz/MM7pDMzhPz4X4PSkRtqM/JLas0KsrovGfmZQHHWnj0QQ7F xzhn8WZyQ7Y7q7kNpeAoq72LiWXnjLLLc7Dugj7ECndQ/5aC0RXWJFXkE7LgbWbjWczw lK693j08P3TkpIDcrzsTdQ7g/v1y3DoN+cyiM29mNi0GvDxySUMPngROy4ydHF1rjBg4 aqu2PJIiV+/FxnAAMEQ2yKOL2BNck3jtCLrE2RLN7cGqCqDL2hjMlk8aL9uj4BokSh21 Vs3g== X-Gm-Message-State: AO0yUKXPB+Wr/VA5hOB6eBE5fj1Nk5r0+SdYmVHL1y6i4AEHXGPxwE/s voNE91BwGw3l8awt4Iadi5o= X-Google-Smtp-Source: AK7set/RZL5GoKsc0fJjo1ReGdEflhQaMu+fb0pjbgvhTDmUw97h0j6gEi1Cyp+bGjnNFsdq88IvWQ== X-Received: by 2002:a17:907:3e82:b0:878:5bce:291a with SMTP id hs2-20020a1709073e8200b008785bce291amr777047ejc.36.1675219171529; Tue, 31 Jan 2023 18:39:31 -0800 (PST) Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id u20-20020a50a414000000b004a08c52a2f0sm9242161edb.76.2023.01.31.18.39.30 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 31 Jan 2023 18:39:30 -0800 (PST) Content-Type: multipart/mixed; boundary="------------Lz0Fo1WiaymPHEGhkJvrYbnr" Message-ID: <30b047a9-dfe3-948a-a123-2e15221c239d@HIDDEN> Date: Wed, 1 Feb 2023 04:39:29 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Content-Language: en-US From: Dmitry Gutov <dgutov@HIDDEN> To: Eli Zaretskii <eliz@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN> <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN> <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN> <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN> <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN> <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> <83sffr2xq2.fsf@HIDDEN> <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN> In-Reply-To: <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN> X-Spam-Score: -0.9 (/) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.9 (-) This is a multi-part message in MIME format. --------------Lz0Fo1WiaymPHEGhkJvrYbnr Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 31/01/2023 20:16, Dmitry Gutov wrote: > Anyway, it seems like it might be too late as an addition to Emacs 29. > And we can implement the match predicate using narrowing for this > release, to be updated later. So here. I've installed the first patch, which didn't raise up too many concerns previously, and here's the new iteration on the second patch. Please see the attachment. To note the numbers: the first patch does quite well to improve the performance of modes which use :match in queries which match a lot of nodes (e.g. rust-ts-mode or ruby-ts-mode). There boost in those was on the order of 25%. The second one is longer, and the boost (on top of the first one) is around 5-6%, stable. Not as impressive, but at least it brings :match's performance a little above :pred's in my example. --------------Lz0Fo1WiaymPHEGhkJvrYbnr Content-Type: text/x-patch; charset=UTF-8; name="treesit_predicate_match.diff" Content-Disposition: attachment; filename="treesit_predicate_match.diff" Content-Transfer-Encoding: base64 ZGlmZiAtLWdpdCBhL3NyYy90cmVlc2l0LmMgYi9zcmMvdHJlZXNpdC5jCmluZGV4IGIxNjM2 ODU0MTlmLi41MTAxNzBjYTY0MCAxMDA2NDQKLS0tIGEvc3JjL3RyZWVzaXQuYworKysgYi9z cmMvdHJlZXNpdC5jCkBAIC0yNDY2LDEwICsyNDY2LDQxIEBAIHRyZWVzaXRfcHJlZGljYXRl X21hdGNoIChMaXNwX09iamVjdCBhcmdzLCBzdHJ1Y3QgY2FwdHVyZV9yYW5nZSBjYXB0dXJl cykKIAkgICAgICBidWlsZF9zdHJpbmcgKCJUaGUgc2Vjb25kIGFyZ3VtZW50IHRvIGBtYXRj aCcgc2hvdWxkICIKIAkJICAgICAgICAgICAgImJlIGEgY2FwdHVyZSBuYW1lLCBub3QgYSBz dHJpbmciKSk7CiAKLSAgTGlzcF9PYmplY3QgdGV4dCA9IHRyZWVzaXRfcHJlZGljYXRlX2Nh cHR1cmVfbmFtZV90b190ZXh0IChjYXB0dXJlX25hbWUsCisgIExpc3BfT2JqZWN0IG5vZGUg PSB0cmVlc2l0X3ByZWRpY2F0ZV9jYXB0dXJlX25hbWVfdG9fbm9kZSAoY2FwdHVyZV9uYW1l LAogCQkJCQkJCSAgICAgY2FwdHVyZXMpOwogCi0gIGlmIChmYXN0X3N0cmluZ19tYXRjaCAo cmVnZXhwLCB0ZXh0KSA+PSAwKQorICBzdHJ1Y3QgYnVmZmVyICpvbGRfYnVmZmVyID0gY3Vy cmVudF9idWZmZXI7CisgIHN0cnVjdCBidWZmZXIgKmJ1ZmZlciA9IFhCVUZGRVIgKFhUU19Q QVJTRVIgKFhUU19OT0RFIChub2RlKS0+cGFyc2VyKS0+YnVmZmVyKTsKKyAgc2V0X2J1ZmZl cl9pbnRlcm5hbCAoYnVmZmVyKTsKKworICBUU05vZGUgdHJlZXNpdF9ub2RlID0gWFRTX05P REUgKG5vZGUpLT5ub2RlOworICBwdHJkaWZmX3QgdmlzaWJsZV9iZWcgPSBYVFNfUEFSU0VS IChYVFNfTk9ERSAobm9kZSktPnBhcnNlciktPnZpc2libGVfYmVnOworICB1aW50MzJfdCBz dGFydF9ieXRlX29mZnNldCA9IHRzX25vZGVfc3RhcnRfYnl0ZSAodHJlZXNpdF9ub2RlKTsK KyAgdWludDMyX3QgZW5kX2J5dGVfb2Zmc2V0ID0gdHNfbm9kZV9lbmRfYnl0ZSAodHJlZXNp dF9ub2RlKTsKKyAgcHRyZGlmZl90IHN0YXJ0X2J5dGUgPSB2aXNpYmxlX2JlZyArIHN0YXJ0 X2J5dGVfb2Zmc2V0OworICBwdHJkaWZmX3QgZW5kX2J5dGUgPSB2aXNpYmxlX2JlZyArIGVu ZF9ieXRlX29mZnNldDsKKyAgcHRyZGlmZl90IHN0YXJ0X3BvcyA9IGJ1Zl9ieXRlcG9zX3Rv X2NoYXJwb3MgKGJ1ZmZlciwgc3RhcnRfYnl0ZSk7CisgIHB0cmRpZmZfdCBlbmRfcG9zID0g YnVmX2J5dGVwb3NfdG9fY2hhcnBvcyAoYnVmZmVyLCBlbmRfYnl0ZSk7CisgIHB0cmRpZmZf dCBvbGRfYmVndiA9IEJFR1Y7CisgIHB0cmRpZmZfdCBvbGRfYmVndl9ieXRlID0gQkVHVl9C WVRFOworICBwdHJkaWZmX3Qgb2xkX3p2ID0gWlY7CisgIHB0cmRpZmZfdCBvbGRfenZfYnl0 ZSA9IFpWX0JZVEU7CisKKyAgQkVHViA9IHN0YXJ0X3BvczsKKyAgQkVHVl9CWVRFID0gc3Rh cnRfYnl0ZTsKKyAgWlYgPSBlbmRfcG9zOworICBaVl9CWVRFID0gZW5kX2J5dGU7CisKKyAg cHRyZGlmZl90IHZhbCA9IGZhc3RfbG9va2luZ19hdCAocmVnZXhwLCBzdGFydF9wb3MsIHN0 YXJ0X2J5dGUsIGVuZF9wb3MsIGVuZF9ieXRlLCBRbmlsKTsKKworICBCRUdWID0gb2xkX2Jl Z3Y7CisgIEJFR1ZfQllURSA9IG9sZF9iZWd2X2J5dGU7CisgIFpWID0gb2xkX3p2OworICBa Vl9CWVRFID0gb2xkX3p2X2J5dGU7CisKKyAgc2V0X2J1ZmZlcl9pbnRlcm5hbCAob2xkX2J1 ZmZlcik7CisKKyAgaWYgKHZhbCA+IDApCiAgICAgcmV0dXJuIHRydWU7CiAgIGVsc2UKICAg ICByZXR1cm4gZmFsc2U7Cg== --------------Lz0Fo1WiaymPHEGhkJvrYbnr--
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 31 Jan 2023 18:16:15 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Tue Jan 31 13:16:15 2023 Received: from localhost ([127.0.0.1]:54981 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pMvB0-0007Jy-7L for submit <at> debbugs.gnu.org; Tue, 31 Jan 2023 13:16:15 -0500 Received: from mail-wm1-f48.google.com ([209.85.128.48]:52101) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <raaahh@HIDDEN>) id 1pMvAv-0007Jg-Qs for 60953 <at> debbugs.gnu.org; Tue, 31 Jan 2023 13:16:13 -0500 Received: by mail-wm1-f48.google.com with SMTP id o36so4957840wms.1 for <60953 <at> debbugs.gnu.org>; Tue, 31 Jan 2023 10:16:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=kYS85V8bkH8LrnH73VDcQemOm0IqW61ijT8idpnzcvI=; b=g5236Emlx5AUwpInH0QAXZXfkRkQInxzO+qfcXd9tqObvcMNNc9TbbV2TcbaVFk3yN 6YGI1vIghgwVk/2QRlPRSA9Nn4f7hfeTL7SsnNYog0MC12ZY9bYI1mfV3xCB/9RupzNX 80w8q3qQ/9kuYv3PaZqWYxlpyLlAtPcw7fkLjviZiqMzFdBE/s4fn5sQzTVXOW0Zzhsv aJN1H50DtNTSK91H52Euwdy8J9minsRykacGwRfLxsMAfhRJLkJd2mD+Y15mCaUpDh4H JeW9/fw0gaHYwN5OyYDH+8/dvQ3vd/8SM+d5uYcTMtkIbIDknFexDWNRnvkrWh+VvUMO Sirg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=kYS85V8bkH8LrnH73VDcQemOm0IqW61ijT8idpnzcvI=; b=go3pjEyfyeTc9QiO/ZfJU9DRxZdX2vr+eMpfapD7nwQ2GU1fDkAceiB0RaRDchrfIl 4gv5CqZCSF27rN+5pW3Cvq7rw/qHe4ufHqIuxphf3MwNayyAmFRLt6zg1jvTnTaOC5B5 82Z9n/zu54qg8guNsamr/uPqm0cYZIBFyms1jsXrGzdQujA7wg/pkiZaKy2fhKrHj4NB SZbZBrrFeSFxjqZelF3cNRne+cWegdUQ7wm4A3+dgdXCe04sw3GwAktLN/RGN8SjnH5B 4iUfgucqWkAnSo8Dgxtf5N415oCO4EJVMmzyM/dy282tahTnlQqG7/7n6DlDjOSvk3PT Glzw== X-Gm-Message-State: AO0yUKXJlKiraEnOyK0GzZFP3oWoEgvWoq+8YDoRz1DQWjWPZvEfA+bf dgfSQuWCXgt8suepdzGP3IA= X-Google-Smtp-Source: AK7set9qIBXkzJXWNpuh2XDadif4oDvZLiW2GgQPesP12VHxtQ1XauTO8GcOtJNh/HUnp/p460zlfA== X-Received: by 2002:a1c:4c01:0:b0:3db:2c8:d7e1 with SMTP id z1-20020a1c4c01000000b003db02c8d7e1mr6113wmf.20.1675188963986; Tue, 31 Jan 2023 10:16:03 -0800 (PST) Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id bh6-20020a05600c3d0600b003daffc2ecdesm20166157wmb.13.2023.01.31.10.16.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 31 Jan 2023 10:16:03 -0800 (PST) Message-ID: <c6cd0fe6-ee70-cb2e-229d-1515b11436a9@HIDDEN> Date: Tue, 31 Jan 2023 20:16:01 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Content-Language: en-US To: Eli Zaretskii <eliz@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN> <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN> <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN> <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN> <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN> <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> <83sffr2xq2.fsf@HIDDEN> From: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <83sffr2xq2.fsf@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.1 (/) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.9 (-) On 31/01/2023 05:23, Eli Zaretskii wrote: >> Date: Mon, 30 Jan 2023 21:58:22 +0200 >> Cc:casouri@HIDDEN,60953 <at> debbugs.gnu.org >> From: Dmitry Gutov<dgutov@HIDDEN> >> >> On 30/01/2023 21:05, Eli Zaretskii wrote: >>>> Date: Mon, 30 Jan 2023 21:01:02 +0200 >>>> Cc:casouri@HIDDEN,60953 <at> debbugs.gnu.org >>>> From: Dmitry Gutov<dgutov@HIDDEN> >>>> >>>> But that doesn't answer the question "Could it?". >>> I don't understand what you are asking. "Could" in what sense? >> Like, would it make sense to try to modify it that way, or extract a >> function that would do that, without writing it from scratch. >> >> Or create a new function which would reuse some common code. >> >> We would call the new function something like match_buffer_substring. >> Optionally, also expose it to Lisp. > Can you describe what that function should do? I don't think I have a > clear idea of that. In Lisp that function could be implemented as (defun buffer-substring-match (regexp &optional start end inhibit-modify) (string-match regexp (buffer-substring (or start (point-min)) (or end (point-max))) inhibit-modify)) Meaning, it matches the regexp against the buffer substring, with the string-start and string-end anchors working. But it would be implemented in C, meaning we could avoid the extra consing and funcall overhead. It might also be handy to use from Lisp in other cases, where we don't need the anchors, but it's easier to call (buffer-substring-match "foo") rather than (save-excursion (goto-char (point-min)) (re-search-forward "foo" nil t) (point)) Probably a little faster, too. Anyway, it seems like it might be too late as an addition to Emacs 29. And we can implement the match predicate using narrowing for this release, to be updated later.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 31 Jan 2023 03:24:04 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 22:24:04 2023 Received: from localhost ([127.0.0.1]:51067 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pMhFb-0008P1-L5 for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 22:24:03 -0500 Received: from eggs.gnu.org ([209.51.188.92]:37268) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1pMhFa-0008OU-H5 for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 22:24:03 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pMhFU-0000mU-3T; Mon, 30 Jan 2023 22:23:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=F1NsgQf2/PXar52dCSQ94rJTGuwq4WUgLRn+RX/OVOc=; b=fDYYW5q5vsYm wI5Z7mxjAbstXzwcWj0ajkhMq9gH/D/goao6WfnzsPSsR4vLR7NUyPZ5yFVhxAeDJigw39dkZ+2rq 3ZSF3FUGspeW5he1E4H1aSmo5k4XyawgUYDVUaROWkVbAl4Ik1jVfoh9evlwABKXsfKEYsXmglr6L UMxwn83P17lyZ1QnylKkzDRApAgt1PsmHI5PjX9WAYIPf96/h2C8hCCfn3S8zNtoYfuHk+Vs6dL33 HIufJz1O41XtXPhtGbgNrKvUjacD0fFVHEvp0GZ2j7JHTpLDomdfTw/+lbqGeCzkyCPpjOflc5mgU 4nBFAV7fF5BTxC/Oumhp9w==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pMhFT-0000b9-Jj; Mon, 30 Jan 2023 22:23:55 -0500 Date: Tue, 31 Jan 2023 05:23:49 +0200 Message-Id: <83sffr2xq2.fsf@HIDDEN> From: Eli Zaretskii <eliz@HIDDEN> To: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> (message from Dmitry Gutov on Mon, 30 Jan 2023 21:58:22 +0200) Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN> <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN> <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN> <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN> <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN> <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) > Date: Mon, 30 Jan 2023 21:58:22 +0200 > Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org > From: Dmitry Gutov <dgutov@HIDDEN> > > On 30/01/2023 21:05, Eli Zaretskii wrote: > >> Date: Mon, 30 Jan 2023 21:01:02 +0200 > >> Cc:casouri@HIDDEN,60953 <at> debbugs.gnu.org > >> From: Dmitry Gutov<dgutov@HIDDEN> > >> > >> But that doesn't answer the question "Could it?". > > I don't understand what you are asking. "Could" in what sense? > > Like, would it make sense to try to modify it that way, or extract a > function that would do that, without writing it from scratch. > > Or create a new function which would reuse some common code. > > We would call the new function something like match_buffer_substring. > Optionally, also expose it to Lisp. Can you describe what that function should do? I don't think I have a clear idea of that.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 31 Jan 2023 00:45:04 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 19:45:03 2023 Received: from localhost ([127.0.0.1]:50938 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pMelj-0004Gh-BR for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 19:45:03 -0500 Received: from mail-wm1-f45.google.com ([209.85.128.45]:34671) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <raaahh@HIDDEN>) id 1pMelh-0004Fv-GL for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 19:45:02 -0500 Received: by mail-wm1-f45.google.com with SMTP id q10-20020a1cf30a000000b003db0edfdb74so36669wmq.1 for <60953 <at> debbugs.gnu.org>; Mon, 30 Jan 2023 16:45:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=Uk/JF6SFA8vlxSzOJ+gI01Cx1Low37nl9eKiJxXDVUc=; b=LaM/2P8QitDBO5amsRAEO+krnaC2sl3YppYckdtsU+kV3fTyrTRffHyJUXohOnXHpM e/E/TiCZsJBNbdKvlvdyIFnNTE8BvO4Dx6smAzw9eSQT+8hC7nQxCl8/GLQoKgnggFiT Z0JR4MKhM7FGMREaBd0L/f415eoSmGdNQt3CdDAmBdG1HiNGMdoXile+PzUfCeemJWGS hx+H6pQxZtBWHiDdce6U0OS+GFeNQGxWLWYlnATKSVcfkFgHuxgAfohUehGKYpy9UyDy 6F+g+OHXwMV7IficOUNCkVNGFkopbFNvQLf++9raJDLahmQlJKwcaWcM9WohkJxbu+VR DK0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Uk/JF6SFA8vlxSzOJ+gI01Cx1Low37nl9eKiJxXDVUc=; b=5jOojnc1MWJnw/ErhNBMZ+uWzJ0cBLKLUR8+7VbosYnpCdryyFfWOxm8Pgc50F0vxs W2SlQQzopZaD2ZUshec5AlwG64JJAIJdzF9g/eo2DRxzJtUG5Q0+LHqjhR8JN3m1KjWS 6Mc2gI4HoEx2P1JL7ZOMVa0UskB0RbAjchgky0EYZu/uyzpWttDfGXhe/QQv/eN/oxTq ecB9Q6ZEqfFW4k/gaO9eG50/lNB1GrlCabX44f/eI++gGQYgtpBoFSA9K89xSWsALjX0 EhBLs9YlTqd1I2YWbw8j3I+TYWb71SN1h884a+h1ECEOy4QKxM27XmlnBpQhsHOU8xCX lnrQ== X-Gm-Message-State: AO0yUKUwDqm/0ZgA4YRfHLD4Nlok8GMEQ+3VUmxdES00an5lz9gI3VWv UMIlzy6OSi3qyG+tM3A541w= X-Google-Smtp-Source: AK7set+gPewOxAq38/g5RAYfmOe9TD1/j1FSlaTw9xqMynj8u2a5YHW5dkCGElV5sh2U7Ci7yiJfIQ== X-Received: by 2002:a05:600c:1d97:b0:3dc:5009:bc74 with SMTP id p23-20020a05600c1d9700b003dc5009bc74mr8611250wms.7.1675125895454; Mon, 30 Jan 2023 16:44:55 -0800 (PST) Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id i27-20020a05600c4b1b00b003dc54d9aeeasm5816107wmp.36.2023.01.30.16.44.54 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 30 Jan 2023 16:44:54 -0800 (PST) Message-ID: <a8dc0f23-92cc-69d7-c308-2ea970119d8e@HIDDEN> Date: Tue, 31 Jan 2023 02:44:53 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Content-Language: en-US To: Yuan Fu <casouri@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN> <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN> <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN> <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN> <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN> <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> <A9D3AD21-2057-4964-801C-B8966326F17F@HIDDEN> From: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <A9D3AD21-2057-4964-801C-B8966326F17F@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Score: 0.1 (/) X-Debbugs-Envelope-To: 60953 Cc: Eli Zaretskii <eliz@HIDDEN>, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.9 (-) On 31/01/2023 01:57, Yuan Fu wrote: > > >> On Jan 30, 2023, at 11:58 AM, Dmitry Gutov <dgutov@HIDDEN> wrote: >> >> On 30/01/2023 21:05, Eli Zaretskii wrote: >>>> Date: Mon, 30 Jan 2023 21:01:02 +0200 >>>> Cc:casouri@HIDDEN,60953 <at> debbugs.gnu.org >>>> From: Dmitry Gutov<dgutov@HIDDEN> >>>> >>>> But that doesn't answer the question "Could it?". >>> I don't understand what you are asking. "Could" in what sense? >> >> Like, would it make sense to try to modify it that way, or extract a function that would do that, without writing it from scratch. >> >> Or create a new function which would reuse some common code. >> >> We would call the new function something like match_buffer_substring. Optionally, also expose it to Lisp. > > Another option is to change user/programmer’s expectation of the anchor: we could say that the regexp must match the entirety of the node text. IOW, \\` \\' are implied. Huh, I guess that's an option too. A couple reasons not to do that would be: - Potential breakage in all existing TS modes, a week (?) before we're going to release Emacs 29 pretest. Maybe that's okay, I can't say. But the breakage from that kind of change could be subtle. - Compatibility reasons? People writing TS modes for Emacs might be coming from other editors/TS integrations. While TreeSitter docs say the predicates are not handled by it, it does show this example: (#match? @constant "^[A-Z][A-Z_]+") The use of '^' anchor seems to imply that the regexp doesn't have to otherwise match the whole node text (OTOH it's not clear why the example doesn't just say "^[A-Z]" or "^[A-Z][A-Z_]"). The doc also references the Rust crate and WebAssembly binding which support #match?. IIUC Rust uses "re.is_match", which is documented to use "implicit .*? at the beginning and end". Which matches our current semantics. WebAssembly uses "regex.test", same effect.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 23:57:30 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 18:57:30 2023 Received: from localhost ([127.0.0.1]:50901 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pMe1i-00035K-8k for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 18:57:30 -0500 Received: from mail-pl1-f172.google.com ([209.85.214.172]:37760) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <casouri@HIDDEN>) id 1pMe1e-000355-Lr for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 18:57:28 -0500 Received: by mail-pl1-f172.google.com with SMTP id m2so8377904plg.4 for <60953 <at> debbugs.gnu.org>; Mon, 30 Jan 2023 15:57:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=yjQQFaQfpyr9lplHnNfVgiLjPVrqTiVDWVcaazpeOMk=; b=GYVTlTp/FYMXQEuB1yF6TJvmDa6/LJHWkGnmhWqqE8yEG2B+HeABW8R3wBRrTwix7J ojvnkq8YdZWhcHQked6pBBgL1Dm4gpdbIkCG7/hzk6Dg8iFowdtmE68JMSnnrXn/D3mc QIHMeKg9J/eubUzUuv7jUoShptA2byymcwdrKh3aREJQajbBq8Bco1R4pmq6XUk2I9Y/ 4AVtuqWKA9bfi6ilPAy74XO9m/eRVs72qYXuJCRC+4QxZzCdnp/ISxrjQ9G0F/suaz5C 7yhE5A2UYLEhOBrsOWCmnKIHlqv32Cqz4l6TDHK/iCnzXpeCq4aohoCRCdzevMp3HUAV 1HLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yjQQFaQfpyr9lplHnNfVgiLjPVrqTiVDWVcaazpeOMk=; b=r1Dki1D8LtRRTnv9Y9G52H6F99jd/I+4SWGfc0WTR2J5Fe1uqv7nIIPG0X6aP8oZdn gJOm4B5U82yBs2hzlONznJLGOdf8G4/r74ObawWFkgQiP+Avg5d7Ku266UAicZN/1oa/ QZMDF5j5ZJ+BauiiFXKgmV8jW/VYq+C9jqw6wTE5y+u1EJ1DWDoSO3HYH5XMCI3PnJp1 mUNOPBmumIofJzZrP+qPY4SL0aJOXO6TJJuAqN8jEYM/erTCKZzKScgv/csyBnpZ15dU RhI0VaEEKRePog0T7yvVPBTYA9zCUIDEUHWym/M7CYA9qbNiV+SB7I69V0yepbJIPytV FK6Q== X-Gm-Message-State: AO0yUKWx6ggyX/LUTBhoY8JlT/jGA9cH0egfnPgnGMPH6n+BMImlyE2i McKNgnT/ij28aXUBHMO2vIg= X-Google-Smtp-Source: AK7set9D3O6ZZ68wllvMhLAjwpiusqiNaOAJhoSyR8Ecq09xud7y/oAKEvJddoknlWB0hdbvcOWyVg== X-Received: by 2002:a17:902:cf51:b0:196:5540:3972 with SMTP id e17-20020a170902cf5100b0019655403972mr8187308plg.3.1675123040587; Mon, 30 Jan 2023 15:57:20 -0800 (PST) Received: from smtpclient.apple (cpe-172-117-161-177.socal.res.rr.com. [172.117.161.177]) by smtp.gmail.com with ESMTPSA id f6-20020a17090274c600b001885d15e3c1sm2275808plt.26.2023.01.30.15.57.19 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 30 Jan 2023 15:57:20 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.300.101.1.3\)) Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient From: Yuan Fu <casouri@HIDDEN> In-Reply-To: <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> Date: Mon, 30 Jan 2023 15:57:05 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <A9D3AD21-2057-4964-801C-B8966326F17F@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN> <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN> <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN> <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN> <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN> <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> To: Dmitry Gutov <dgutov@HIDDEN> X-Mailer: Apple Mail (2.3731.300.101.1.3) X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 60953 Cc: Eli Zaretskii <eliz@HIDDEN>, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) > On Jan 30, 2023, at 11:58 AM, Dmitry Gutov <dgutov@HIDDEN> wrote: >=20 > On 30/01/2023 21:05, Eli Zaretskii wrote: >>> Date: Mon, 30 Jan 2023 21:01:02 +0200 >>> Cc:casouri@HIDDEN,60953 <at> debbugs.gnu.org >>> From: Dmitry Gutov<dgutov@HIDDEN> >>>=20 >>> But that doesn't answer the question "Could it?". >> I don't understand what you are asking. "Could" in what sense? >=20 > Like, would it make sense to try to modify it that way, or extract a = function that would do that, without writing it from scratch. >=20 > Or create a new function which would reuse some common code. >=20 > We would call the new function something like match_buffer_substring. = Optionally, also expose it to Lisp. Another option is to change user/programmer=E2=80=99s expectation of the = anchor: we could say that the regexp must match the entirety of the node = text. IOW, \\` \\' are implied.=20 Yuan=
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 19:58:33 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 14:58:33 2023 Received: from localhost ([127.0.0.1]:50629 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pMaIT-00053P-0i for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 14:58:33 -0500 Received: from mail-wr1-f50.google.com ([209.85.221.50]:33643) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <raaahh@HIDDEN>) id 1pMaIQ-00053B-U0 for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 14:58:31 -0500 Received: by mail-wr1-f50.google.com with SMTP id q5so12269772wrv.0 for <60953 <at> debbugs.gnu.org>; Mon, 30 Jan 2023 11:58:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=zdyJEOaOarsf3xWTapG3ME3CQkijW8JG0Wz97+FHVlQ=; b=UApEza5MD/svMoITz5+7N7YRdf2vXkLKVuTC/sWT7AoA1Pap7A5/OFGrnCrHt81w/8 8a6s1dlmjctdQkZTfcHmmH0mFz80/S4WudJZWyP2F/CK6c+joC07LQ9dSmrBPskwCQXZ pHk0t+US3Pxr4mqanrFSlwo0sQapBd0MTBpJmAdV4sR24yufCzzqCUb6PY0HLc5NwkY6 3GsVNxhPR469oj9lXRDY9t5lsgtfotpjgGJ0lcJnTC6yGIYuH04U9y66h0QOGBuqGyhV e3QflqcAd82ovFH+DneCROCZfqbubhDrFNiuZzrgt2tu+ZacKYQv1HYdwKRNfDLr9j72 GzAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=zdyJEOaOarsf3xWTapG3ME3CQkijW8JG0Wz97+FHVlQ=; b=0JPC5Ni/Ho9E7VyqfCt/S/FwcEAMMcKU43aUVq4P/AbBsYux/kYeRPuyk9C12Yu+Fs W10OfgdlJqzhsoLG2BvPBvxQ+Jr9qoIEqBamE35JWEj+zkYfWoMm7Y9+8lc5JRrZiIp7 IrUA367FlggAjBnj4vIPb9qt/ms2o82jphpyMJP/veGto5kKhwiuhXw+RPKv1YjhHYRU do89XwB1KHZ8LFiK2+z5F/1Nb0rx8Xsslb7inSEF3IqVqpy7C9fWOAcCE/K1lE6PWzeH ve9JOR/55E2Wyag+APwPYPJqI7k8lTM+2I2Suy+LslFgOelFOLsAJqwCoBR1Z6LvmzbL 3T8w== X-Gm-Message-State: AO0yUKWSf2nS7oTiq/D/bGTe+K0noIPmnolaW68reIQ1H2fVOPUwxm4q wiz+F9iC6O641nlM7sYEmCQ= X-Google-Smtp-Source: AK7set9TFYuRac3a22AVe31Iw7tqy2RRBJG/hkjGLdaiy/Mcy8CJq8D+gJd5kPekcglj9nKhdq/zvg== X-Received: by 2002:a5d:65d1:0:b0:2bf:cefa:fd99 with SMTP id e17-20020a5d65d1000000b002bfcefafd99mr13173906wrw.8.1675108705172; Mon, 30 Jan 2023 11:58:25 -0800 (PST) Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id f28-20020a5d58fc000000b002be5401ef5fsm13197350wrd.39.2023.01.30.11.58.23 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 30 Jan 2023 11:58:24 -0800 (PST) Message-ID: <def799c9-c2d0-81b7-117b-40c35c1417fa@HIDDEN> Date: Mon, 30 Jan 2023 21:58:22 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Content-Language: en-US To: Eli Zaretskii <eliz@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN> <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN> <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN> <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN> <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> <83tu073ksp.fsf@HIDDEN> From: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <83tu073ksp.fsf@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.1 (/) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.9 (-) On 30/01/2023 21:05, Eli Zaretskii wrote: >> Date: Mon, 30 Jan 2023 21:01:02 +0200 >> Cc:casouri@HIDDEN,60953 <at> debbugs.gnu.org >> From: Dmitry Gutov<dgutov@HIDDEN> >> >> But that doesn't answer the question "Could it?". > I don't understand what you are asking. "Could" in what sense? Like, would it make sense to try to modify it that way, or extract a function that would do that, without writing it from scratch. Or create a new function which would reuse some common code. We would call the new function something like match_buffer_substring. Optionally, also expose it to Lisp.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 19:05:54 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 14:05:54 2023 Received: from localhost ([127.0.0.1]:50601 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pMZTW-0003k4-28 for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 14:05:54 -0500 Received: from eggs.gnu.org ([209.51.188.92]:55422) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1pMZTV-0003js-6b for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 14:05:53 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pMZTP-0002cb-MD; Mon, 30 Jan 2023 14:05:47 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=+NEbNL0+Y6BQoRuxRI3iqeYeZNzM7z2YtXUMcOB+Kjg=; b=gHs1GKA64lGL jh/ec8xIqriNvVa2FoVubTW46PGapdv9aQydRFvV6Gg/0F2fCJgnEOXzDSDak6zeITF5Ph+JasZ9B ZuA5ZyHSyaD0GedLyE9LcKAyvKG6lcoke430S8MBEqhERz4UGcgWiOijguWSxM6qQ0UzXHLaCO3Jb VH0Zpf8wUZVbPlyj+P6HyuGk1DPSAMVnJwMsI8TLHNpHkdi2kH6ZD4AD34ixc0EsHdaVfPmmOXTeo fvnGboffOGXmLIzp5wW1STlEE76h8Zq1oUF6ppH47Glxh3umnLCTyFEgCUR+zdIZRu/zLB4lakofa J3aVk650qzvWfDdV5XHUWg==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pMZTB-0004TO-06; Mon, 30 Jan 2023 14:05:47 -0500 Date: Mon, 30 Jan 2023 21:05:26 +0200 Message-Id: <83tu073ksp.fsf@HIDDEN> From: Eli Zaretskii <eliz@HIDDEN> To: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> (message from Dmitry Gutov on Mon, 30 Jan 2023 21:01:02 +0200) Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN> <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN> <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN> <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN> <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) > Date: Mon, 30 Jan 2023 21:01:02 +0200 > Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org > From: Dmitry Gutov <dgutov@HIDDEN> > > But that doesn't answer the question "Could it?". I don't understand what you are asking. "Could" in what sense?
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 19:01:11 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 14:01:11 2023 Received: from localhost ([127.0.0.1]:50589 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pMZOw-0003cw-SG for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 14:01:11 -0500 Received: from mail-ed1-f48.google.com ([209.85.208.48]:35592) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <raaahh@HIDDEN>) id 1pMZOv-0003ci-Gu for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 14:01:10 -0500 Received: by mail-ed1-f48.google.com with SMTP id q19so2030682edd.2 for <60953 <at> debbugs.gnu.org>; Mon, 30 Jan 2023 11:01:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=L2W4tBXaiW27sCnLUMSNyyCRdSO1nrUxaud8Rz75M+M=; b=OtvJIaeaEOtkMI6Rrv+pmMiW5OpMnkJEtpL2L/U8grqO1ItmXEMo09B7ZXaEv6vqcc rSzF1N3bNfgFHKQXt3QYYF3NiKKdb9RQpEmv2yzlP0Tz0cgm82dyXiis84Q4bl5zgcaY LPgeleoaAZ3NE03GQR27QyUwn0ItTgQgBXFBKrcQlF51QfzTBXLq1axFzxjSkifmgdoi tMLLIPmkU3E2reSfQjDLbHiK1sl51EMLreBOInvI/sYm3vuNrIWl44lPGYXubsdNfgRv wvL9d84Y9kIrEVjP67ou0kdPSQrfv3QISGfTl9EGzjQEkA+aOulmcHaTC5QVU/hwNzMb jLFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=L2W4tBXaiW27sCnLUMSNyyCRdSO1nrUxaud8Rz75M+M=; b=IHZpb1vRdGir6iXTBZ9bqNScc5FSIEVa+Ze+MyXqsIqf4Y1P+6kUkThTh9Z4XYCPRQ riWMG8jFDI472qR/YMUq8ZR6ESJ5gxVSejV4R0oJDMoJHmrrZQO+C1Evweh99V7myucA a9nc3WA1ii8MUUQdNBBK3zwfXKYz+sTS0gnAkH57bLBymLThGLLbh57mVd0rNHVZ47ag O8L6tVHQxbBBqYXgu+Z+TXU/1lVjmQa+RVgJ0Ff8mm8lYNWLe4edWMERdgh4luUm7dVi zWKiaVJUjBXtROmx0aAmiDEwlKl9tFjmeOoaBp+6MV0wt8AG3EAW8BQAant/Dxv8TnCP 5qsQ== X-Gm-Message-State: AO0yUKUP0ACHTBOfKXV3bNoz11cImS898dE4mFRBeJzsGGYs2Wv5f1Gg G35A78fUHhzssB2ozvR8SSU= X-Google-Smtp-Source: AK7set98emk4JfztjdjZ1Ra+mdf2j4T9oDUu4Z7Vq3CZ4CBNJmfaKqcA3F5HvVDtPcZ5GVYOY53mCA== X-Received: by 2002:a05:6402:4003:b0:4a2:2fa:ead4 with SMTP id d3-20020a056402400300b004a202faead4mr16462840eda.17.1675105263811; Mon, 30 Jan 2023 11:01:03 -0800 (PST) Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id a16-20020aa7d910000000b00463bc1ddc76sm7131949edr.28.2023.01.30.11.01.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 30 Jan 2023 11:01:03 -0800 (PST) Message-ID: <19892f55-498e-1109-2229-2ddd984849a4@HIDDEN> Date: Mon, 30 Jan 2023 21:01:02 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Content-Language: en-US To: Eli Zaretskii <eliz@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN> <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN> <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN> <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> <83wn533lut.fsf@HIDDEN> From: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <83wn533lut.fsf@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.1 (/) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.9 (-) On 30/01/2023 20:42, Eli Zaretskii wrote: >> Date: Mon, 30 Jan 2023 20:20:46 +0200 >> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org >> From: Dmitry Gutov <dgutov@HIDDEN> >> >> On 30/01/2023 19:49, Eli Zaretskii wrote: >>>> Date: Mon, 30 Jan 2023 19:15:07 +0200 >>>> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org >>>> From: Dmitry Gutov <dgutov@HIDDEN> >>>> >>>>> fast_looking_at already does an anchored match, so I'm not sure I >>>>> follow. I don't even understand why you need th \` part, when the >>>>> match will either always start from the first position or fail. >>>> >>>> The regexp might include the anchors, or it might not. >>>> >>>> It might also use a different anchor like ^ or $ or \b. >>> >>> OK, but it always goes only forward, so narrowing to the beginning >>> shouldn't be necessary. Right? >> >> Are you saying that fast_looking_at ("\\`", ...) will always succeed? >> >> And fast_looking_at ("^", ...), etc. > > For example, for "^", if you hint that it must look back to make sure > there's a newline there, then your narrowing will also prevent it from > doing that, right? fast_looking_at ("^", ...) succeeds inside a narrowing because it always succeeds at BOB. Even though there are no physical newlines before BOB. >>>> One possible alternative, I suppose, would be to create a raw pointer to >>>> a part of the buffer text and call re_search directly specifying the >>>> known length of the node in bytes. If buffer text is one contiguous >>>> region in memory, that is. >>> >>> It isn't, though: there's the gap. Which is why doing this is not >>> recommended; instead, use something like search_buffer_re, which >>> already handles this complication for you. (Except that >>> search_buffer_re is a static function, so only code in search.c can >>> use it. So you'd need to make it non-static.) >> >> Interesting. Does search_buffer_re match the \` anchor at POS and \' at >> LIM? IOW, does in treat the rest of the buffer as non-existing? Or could it? > > That is the low-level subroutine called by re-search-forward, so you > know the answers already, I think? IOW, that function behaves exactly > like re-search-forward in those situations. So, I suppose not? But that doesn't answer the question "Could it?".
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 18:42:50 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 13:42:50 2023 Received: from localhost ([127.0.0.1]:50547 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pMZ7B-00035I-Kp for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 13:42:49 -0500 Received: from eggs.gnu.org ([209.51.188.92]:32958) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1pMZ79-00034e-UP for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 13:42:48 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pMZ74-0004dg-Du; Mon, 30 Jan 2023 13:42:42 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=lYR8aEeZdetOsZPsCuVf+Y150WL2e43FUhhXqz43rm0=; b=d+t5TOLL7gLk rDISKa5OI4/4NKSz7DECrfx8lRCgLRvKQtI8+cWsV93rXGZVe/ikRXMErVYcF4/RxYvjSGA7pRq7p TAx0HHYwIN2c8r4uEz1Vg8Whenb19tbGYExUf+QwVp3FF/QpUArewAn7h3blD19Wxm8lgmzNnoO6L Zd9o2o2JzWn65YDwyraqIbWe2gvSlD82emIP4zIkpHtInsG70uYY9TklHp6YHJlbgQE3xXYXC0Jnr HaQtYsWIkDJQR31FOOAO3Tyn0x/Ltx9ZO3kvMpvQrCQ3TYy0M73w0N9zlCOp/k4XMNNoWRnmQhlJB zsNSG//cVJTuI4+Vtu67XA==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pMZ72-0003cX-OK; Mon, 30 Jan 2023 13:42:42 -0500 Date: Mon, 30 Jan 2023 20:42:34 +0200 Message-Id: <83wn533lut.fsf@HIDDEN> From: Eli Zaretskii <eliz@HIDDEN> To: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> (message from Dmitry Gutov on Mon, 30 Jan 2023 20:20:46 +0200) Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN> <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN> <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN> <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) > Date: Mon, 30 Jan 2023 20:20:46 +0200 > Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org > From: Dmitry Gutov <dgutov@HIDDEN> > > On 30/01/2023 19:49, Eli Zaretskii wrote: > >> Date: Mon, 30 Jan 2023 19:15:07 +0200 > >> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org > >> From: Dmitry Gutov <dgutov@HIDDEN> > >> > >>> fast_looking_at already does an anchored match, so I'm not sure I > >>> follow. I don't even understand why you need th \` part, when the > >>> match will either always start from the first position or fail. > >> > >> The regexp might include the anchors, or it might not. > >> > >> It might also use a different anchor like ^ or $ or \b. > > > > OK, but it always goes only forward, so narrowing to the beginning > > shouldn't be necessary. Right? > > Are you saying that fast_looking_at ("\\`", ...) will always succeed? > > And fast_looking_at ("^", ...), etc. For example, for "^", if you hint that it must look back to make sure there's a newline there, then your narrowing will also prevent it from doing that, right? > >> One possible alternative, I suppose, would be to create a raw pointer to > >> a part of the buffer text and call re_search directly specifying the > >> known length of the node in bytes. If buffer text is one contiguous > >> region in memory, that is. > > > > It isn't, though: there's the gap. Which is why doing this is not > > recommended; instead, use something like search_buffer_re, which > > already handles this complication for you. (Except that > > search_buffer_re is a static function, so only code in search.c can > > use it. So you'd need to make it non-static.) > > Interesting. Does search_buffer_re match the \` anchor at POS and \' at > LIM? IOW, does in treat the rest of the buffer as non-existing? Or could it? That is the low-level subroutine called by re-search-forward, so you know the answers already, I think? IOW, that function behaves exactly like re-search-forward in those situations.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 18:20:57 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 13:20:57 2023 Received: from localhost ([127.0.0.1]:50478 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pMYm1-0002VO-4G for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 13:20:57 -0500 Received: from mail-ed1-f41.google.com ([209.85.208.41]:47094) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <raaahh@HIDDEN>) id 1pMYly-0002V7-SU for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 13:20:55 -0500 Received: by mail-ed1-f41.google.com with SMTP id cw4so6716990edb.13 for <60953 <at> debbugs.gnu.org>; Mon, 30 Jan 2023 10:20:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=EojLxhbQEL/7sn8leakNI7N9j/pX+sTDPguao9F/wLE=; b=KmjNrmslHsMwE7HZHuz8jGVsEWhbSZcZRpx2DB3LbWTpDA/F4zOPXqm5gg1YmWRUK2 8JO1i64jeaSEY5T0t+eWq6TrAzqjRqz8hXR4NWkhn37aIBGd52cn9HP7l513Hj0x4VC1 GvqsHD1A21ZRnVX7ZLyt4uivBztBp8gnoDiWJRosGlCnrjj0oTXfnOT8ZZ4/XphPj3Kg xdhlBxhqmlut3acwD9aEidzm2C3GKGQmouJm8v7hRop3TunlM9rpqFKQ2HEl7kJ8tXuo aC2rypsPQBNOvAnIpU81BpqjthFf12QTbKu6ZQGpwzhZEuzcdSCrfrofjoM+zqV/kr5X PL/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=EojLxhbQEL/7sn8leakNI7N9j/pX+sTDPguao9F/wLE=; b=w8RXMwxCru8iBYS9R6GZQxQewDBNCgAXFCfNA+QMS2ynWobclnu2G8dpFFj0arYtlU aPtA/s4AO35gMNsoOmNGSiVGzJD5Bc01dwQ0jSi/K3zKYS2s3oj6pn+JFL29qjv/UmU+ oJepLLgMOJgrfK261UzYqYs1RBkB9FvCC1TPshyGai18IXu6k3X40NOoyIaIXiUkiaNG ebUj/Maj9mdEWxXPyBiiY9fveE1AR41OC6rUVRsdwvXqUqif+AoUolBzLF/58iA8Bte0 lJ+XsWgecL1DuHuFlj1KUo2IuFxvoPx3buMUWjxi9pIZcBxPgg9wuX4gK0bO1wftazuC AK0w== X-Gm-Message-State: AO0yUKXsUXBNT3T/BVM1tR4qfkLNNy0TKC1n46AowtFyfxuSUlT9vnS4 qQ4Oap8cqLRJSbgw9I1gqeY= X-Google-Smtp-Source: AK7set/Un6rvk+QOS8aRR0/o4N8bto5A06VrkmG+T01TI6Bb4w7jXRxYu472yFJ68yJhwmp3oxoJEg== X-Received: by 2002:a50:9f43:0:b0:4a2:2d79:dce2 with SMTP id b61-20020a509f43000000b004a22d79dce2mr9544896edf.10.1675102849018; Mon, 30 Jan 2023 10:20:49 -0800 (PST) Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id z3-20020a50eb43000000b0045b4b67156fsm7048912edp.45.2023.01.30.10.20.47 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 30 Jan 2023 10:20:48 -0800 (PST) Message-ID: <33cad4c1-4af5-37bb-05bc-79a4d9c1a700@HIDDEN> Date: Mon, 30 Jan 2023 20:20:46 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Content-Language: en-US To: Eli Zaretskii <eliz@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN> <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN> <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> <83bkmf52va.fsf@HIDDEN> From: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <83bkmf52va.fsf@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.1 (/) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.9 (-) On 30/01/2023 19:49, Eli Zaretskii wrote: >> Date: Mon, 30 Jan 2023 19:15:07 +0200 >> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org >> From: Dmitry Gutov <dgutov@HIDDEN> >> >>> fast_looking_at already does an anchored match, so I'm not sure I >>> follow. I don't even understand why you need th \` part, when the >>> match will either always start from the first position or fail. >> >> The regexp might include the anchors, or it might not. >> >> It might also use a different anchor like ^ or $ or \b. > > OK, but it always goes only forward, so narrowing to the beginning > shouldn't be necessary. Right? Are you saying that fast_looking_at ("\\`", ...) will always succeed? And fast_looking_at ("^", ...), etc. I would imagine that only fast_looking_at ("\\=", ...) is guaranteed to succeed. > And you can use the LIMIT argument to > limit how far it goes forward, right? So once again, why narrow? I tried to explain that there is a certain expectation (on the part of the user/programmer) which anchors are allowed in the :match regexp, and what their effects are, and those seem hard to support without narrowing. >>> And for \', just compare the length of the match returned by >>> fast_looking_at with the length of the text. >> >> This seems to work, i.e. even when before "carpet", >> >> (and (looking-at (regexp-opt '("car" "cardigan" "carpet"))) >> (match-string 0)) >> >> returns the full match. I was expecting that it could return just "car" >> -- not sure why it doesn't stop there. > > Because regex search is greedy? Cool. TIL, thanks. That's not going to help here, but might in other situations when my code controls the regexp as well. >> One possible alternative, I suppose, would be to create a raw pointer to >> a part of the buffer text and call re_search directly specifying the >> known length of the node in bytes. If buffer text is one contiguous >> region in memory, that is. > > It isn't, though: there's the gap. Which is why doing this is not > recommended; instead, use something like search_buffer_re, which > already handles this complication for you. (Except that > search_buffer_re is a static function, so only code in search.c can > use it. So you'd need to make it non-static.) Interesting. Does search_buffer_re match the \` anchor at POS and \' at LIM? IOW, does in treat the rest of the buffer as non-existing? Or could it?
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 17:50:00 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 12:50:00 2023 Received: from localhost ([127.0.0.1]:50306 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pMYI4-0001UT-Da for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 12:50:00 -0500 Received: from eggs.gnu.org ([209.51.188.92]:48006) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1pMYI2-0001UE-4L for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 12:49:58 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pMYHw-0004Eu-DE; Mon, 30 Jan 2023 12:49:52 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=knZ8xvmGmarAskBzn1bopQ9SFrdn1i5UVZdsQuJ8imM=; b=kk97wdW9mmpn 8+3SHx0/IHOhtf9DIxxm+XJlmqFjK1F8a/uNJD8ys7slxibZR0OqSWha+vM/DgEtDfR3tc+pDY0GO T2JI++fLQMCHjyGkZeuG5jEZi+J9CE/i9JPDpI2lAqT6DnJorfQuUigktJsoC0WuuMQ6AHZmQ2mTY Uaa5xcZ5kJCfKEuAw2IMUXgVQvdo30BTSXcqZ0B0nF2JhcpXJl6C7az229/t5nuU5SB1znsXL7ZUL E4cjD52hcKUtk2UgHq1xFkiLVuSCG7mqbz0JZwURJcy3ZAgGdI+6sm1TiW6CwVWcy1g93DT1TKlHs hPubm2T+HPpRUYNdqcn40Q==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pMYHv-0004dt-Sk; Mon, 30 Jan 2023 12:49:52 -0500 Date: Mon, 30 Jan 2023 19:49:45 +0200 Message-Id: <83bkmf52va.fsf@HIDDEN> From: Eli Zaretskii <eliz@HIDDEN> To: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> (message from Dmitry Gutov on Mon, 30 Jan 2023 19:15:07 +0200) Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN> <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN> <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) > Date: Mon, 30 Jan 2023 19:15:07 +0200 > Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org > From: Dmitry Gutov <dgutov@HIDDEN> > > > fast_looking_at already does an anchored match, so I'm not sure I > > follow. I don't even understand why you need th \` part, when the > > match will either always start from the first position or fail. > > The regexp might include the anchors, or it might not. > > It might also use a different anchor like ^ or $ or \b. OK, but it always goes only forward, so narrowing to the beginning shouldn't be necessary. Right? And you can use the LIMIT argument to limit how far it goes forward, right? So once again, why narrow? > > And for \', just compare the length of the match returned by > > fast_looking_at with the length of the text. > > This seems to work, i.e. even when before "carpet", > > (and (looking-at (regexp-opt '("car" "cardigan" "carpet"))) > (match-string 0)) > > returns the full match. I was expecting that it could return just "car" > -- not sure why it doesn't stop there. Because regex search is greedy? > One possible alternative, I suppose, would be to create a raw pointer to > a part of the buffer text and call re_search directly specifying the > known length of the node in bytes. If buffer text is one contiguous > region in memory, that is. It isn't, though: there's the gap. Which is why doing this is not recommended; instead, use something like search_buffer_re, which already handles this complication for you. (Except that search_buffer_re is a static function, so only code in search.c can use it. So you'd need to make it non-static.)
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 17:15:18 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 12:15:18 2023 Received: from localhost ([127.0.0.1]:50234 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pMXkT-0006Ue-SC for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 12:15:18 -0500 Received: from mail-wm1-f46.google.com ([209.85.128.46]:54845) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <raaahh@HIDDEN>) id 1pMXkS-0006UP-05 for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 12:15:16 -0500 Received: by mail-wm1-f46.google.com with SMTP id n13so1516093wmr.4 for <60953 <at> debbugs.gnu.org>; Mon, 30 Jan 2023 09:15:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=c6eoin3lV+OngQFheRXNZDPsJeRwUEn5wHklsXe4LZ4=; b=nmywdIpp4P+pUB8BecJYZYlEuDMVSYikn4RjdZUB+TgKtNV53TBBl0mLaxTcS5ZEIo RuEfDJr2+ua0evUISrWBc3m6nhVOzmdJL6h0gBOQoPp7ud5Ss3nGIrnPNVH3R7dH9eXQ XrQ40Qy6H0WMEYjgRnTIpBux8fTXqhd3Dii0OngNUg81SMtj8yVaw20OTX9ryccSSUmz 5QO1w9e/Ib2zaka2LNJwli9pK8BVvX6rxA7UiU0uZoXIbOTDvK0JwE852fErZfApB4jZ Wv6tmZFQHlw9Ez1ccK8DZmvKCZfuE32oedsyhFcU2Gm1fLULa265RfTFvN0NN7daOul3 Vrhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=c6eoin3lV+OngQFheRXNZDPsJeRwUEn5wHklsXe4LZ4=; b=pIc+QCzMqoRqm+ThtVEMrTA5rLEIJ0iJ81QxWqO76PHCZLekO79yRXDpLafB3d9+0x 6VzAauIeRFHomwlowsMsC1EAEJOmNCgvDxcGMdvm0MixhY8GP3NTqwBTiJfehMRDPuJt 3d77HrmfCSP3fmIRlraDDFx97fCErckbT3AKe7I/1co9+m73M7l/Zg8cqDUQyx5i0sFR nAj1FAbTTLM4cfxJLhU396agLrmrU+KG/cL35mUzjn9EhqaWS4fu4RtjcjRn9+cQdYXW G3pZAiClOnt14znpPqmHM9Pemg32FgnjAOq4AeB/poPCky+tZGc6Ykzvufk2rfzeMlSr K2sw== X-Gm-Message-State: AFqh2krHIloKxu24ewTRwZRGHVKwvdcDIQgqq/5ly/zMZtABtRlsFZ0k KErZZLO/KaDBQY/BslaJHxQ= X-Google-Smtp-Source: AMrXdXviseF0baJfN+Rpk9lF5nN5SMeATLkpJIx/37COD4U5YRTxEnHzlz4CqGMrlLP56uZrnLZ1sQ== X-Received: by 2002:a05:600c:4f83:b0:3db:eab:a600 with SMTP id n3-20020a05600c4f8300b003db0eaba600mr45974749wmq.7.1675098909912; Mon, 30 Jan 2023 09:15:09 -0800 (PST) Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id fl22-20020a05600c0b9600b003d1e3b1624dsm17650099wmb.2.2023.01.30.09.15.08 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 30 Jan 2023 09:15:09 -0800 (PST) Message-ID: <83e58a1b-2e4a-356a-36d8-c756ff105b62@HIDDEN> Date: Mon, 30 Jan 2023 19:15:07 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Content-Language: en-US To: Eli Zaretskii <eliz@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN> <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> <83mt603vrc.fsf@HIDDEN> From: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <83mt603vrc.fsf@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.1 (/) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.9 (-) On 30/01/2023 17:08, Eli Zaretskii wrote: >> Date: Mon, 30 Jan 2023 16:47:01 +0200 >> Cc:casouri@HIDDEN,60953 <at> debbugs.gnu.org >> From: Dmitry Gutov<dgutov@HIDDEN> >> >> On 30/01/2023 16:06, Eli Zaretskii wrote: >> >>> But why do you need to narrow there? fast_looking_at will not go >>> beyond end_pos/end_byte anyway, there's no need to restrict it. >> The reason for that is to be able to support the \` and \' markers in >> REGEXP. I haven't found any alternative approach that doesn't call >> 'substring'. > fast_looking_at already does an anchored match, so I'm not sure I > follow. I don't even understand why you need th \` part, when the > match will either always start from the first position or fail. The regexp might include the anchors, or it might not. It might also use a different anchor like ^ or $ or \b. See these examples from the documentation: ((_) @bob (#match \"^B.b$\" @bob)) '(( (compound_expression :anchor (_) @@first (_) :* @@rest) (:match "love" @@first) )) > And for \', just compare the length of the match returned by > fast_looking_at with the length of the text. This seems to work, i.e. even when before "carpet", (and (looking-at (regexp-opt '("car" "cardigan" "carpet"))) (match-string 0)) returns the full match. I was expecting that it could return just "car" -- not sure why it doesn't stop there. But again, to find out whether we need to use the end anchor at all, we'd have to parse the regexp, remove the actual anchor before calling fast_looking_at, and then add the above check. One possible alternative, I suppose, would be to create a raw pointer to a part of the buffer text and call re_search directly specifying the known length of the node in bytes. If buffer text is one contiguous region in memory, that is. This way we would regexp test against a string (not a buffer), but without creating a separate string object.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 15:09:14 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 10:09:13 2023 Received: from localhost ([127.0.0.1]:50098 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pMVmT-0006j4-FB for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 10:09:13 -0500 Received: from eggs.gnu.org ([209.51.188.92]:53518) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1pMVmS-0006ir-0E for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 10:09:12 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pMVmM-0003Ss-Kk; Mon, 30 Jan 2023 10:09:06 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=E1PYpnZZSoslwBDvomEqejc+Qjck5tg9K1LIHs0kJ9M=; b=kIy12gUkRhwp ZFAAMDaWa6WRf2nAui/HnAvX+BbwWsxWb6ZLoPyIekshTWM+DRHOD3obWx8NIdOYzvZVh3nYY9wbW i+Yz/5E/9US+o2LDCEnN5KIiN9KoiC1Cc4q141s7/6MgSBW2p24bqnIvgQQu/lvGcWmBnIOK7c5kX dwIFDRjfYXlfQ3knmerzzzaifswNRoafC0mQ8ghft8TK4FXfsI1fA40gmNxzOOPSyav+XB8OxOsk6 wiXKHN8W1CON/7DpSGtr708yOHTR/74HTO5lIv9v8BG18fya45FbOF9nNkvfDc5akRzT0GFbZ9c1o Mea/uLi4VDuu3FC1+pFtxA==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pMVmA-00089b-GZ; Mon, 30 Jan 2023 10:09:06 -0500 Date: Mon, 30 Jan 2023 17:08:39 +0200 Message-Id: <83mt603vrc.fsf@HIDDEN> From: Eli Zaretskii <eliz@HIDDEN> To: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> (message from Dmitry Gutov on Mon, 30 Jan 2023 16:47:01 +0200) Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN> <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) > Date: Mon, 30 Jan 2023 16:47:01 +0200 > Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org > From: Dmitry Gutov <dgutov@HIDDEN> > > On 30/01/2023 16:06, Eli Zaretskii wrote: > > > But why do you need to narrow there? fast_looking_at will not go > > beyond end_pos/end_byte anyway, there's no need to restrict it. > > The reason for that is to be able to support the \` and \' markers in > REGEXP. I haven't found any alternative approach that doesn't call > 'substring'. fast_looking_at already does an anchored match, so I'm not sure I follow. I don't even understand why you need th \` part, when the match will either always start from the first position or fail. And for \', just compare the length of the match returned by fast_looking_at with the length of the text. What am I missing? > > And here I suggest an additional optimization, since you already know > > the byte positions: > > No real objection from me if you're sure, but I tried that, and the > benchmarks showed no difference. Sheer luck: you force SET_BUF_BEGV etc. to call buf_charpos_to_bytepos for no reason: you already have the byte positions in hand. > (I suppose we also could save the previous values of BEGV_BYTE and > ZV_BYTE to use when restoring.) Yes. > Either way, we would use this *instead of* Fnarrow_to_region, right? Fnarrow_to_region does little else. But I need to understand first why you need to change the restriction.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 14:47:16 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 09:47:16 2023 Received: from localhost ([127.0.0.1]:46816 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pMVRD-0005H5-Tt for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 09:47:16 -0500 Received: from mail-ed1-f42.google.com ([209.85.208.42]:39479) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <raaahh@HIDDEN>) id 1pMVR8-0005Gh-2r for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 09:47:10 -0500 Received: by mail-ed1-f42.google.com with SMTP id y11so11170616edd.6 for <60953 <at> debbugs.gnu.org>; Mon, 30 Jan 2023 06:47:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=ZWqCmWvC7lDdWhhtUYU2g7FLAYhzBg0Wokh3Ji1e0bI=; b=XVEeEClz4tTorrRt6ebjRWjv1K9X7evRUbjiJOuLe9rYj0UhcRXL0AIvfa1ZfMkqW2 mdi1yZnzG2WBO5LhJJ6qpVQujZaTYVBQbyRNBoL4lqr2MBiiKuKgzW34iLiH9VusCOJB PX3u9hD9YbcgbGIkgDDf+rl+CQGls0pskR5cZQc6VAu2Qa15FPzPZJiLYehegkGG1sSq LFoVGvSZoW7yKTRmzwOdnI56ioIlkKR5E4Cn6I+5j4hwdlOYXPBuuM6FwO/6HRfbge7d WQ3RvPCAlC3qWcuXqbQjcQIauD3DTseZMnDq00yBcTi1vJSj1B6hH/ebe9GIsTSr4nkD bBxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ZWqCmWvC7lDdWhhtUYU2g7FLAYhzBg0Wokh3Ji1e0bI=; b=pzrnsp0mBEmFg2A/ijflJMVxldmndRCjd1vyVsUhkszADAWfYFmr5AF6V501V3PwWy XoaVxCEmlgoBmsGRT9gKrZ9pD1I7C1vQt4VKQv3RakCvJA1r1SOzwdoWtuIC6JRjXUHg 11T2Zkp08SlFxZ+9BSnulVk5C+p2qjmDy4j3bZyWTgMCtCtTZQ8Vv1bRmXjIxW0qQkxl sCQNgad7d5RppWcf6Z/UuO48F0tnJFQXfJCw/B52j34nX1lYzH+bcRTttKccBoJxo41M PAYVPUGE8DXS7ABGFz7pKrHrM0lCyt5Wtcy35hIAjkGeW1NXwGhWQYayCkHp8Ot4Q+zj L6LA== X-Gm-Message-State: AFqh2koyajSNyR5EDHlmPzGhGDqmihcCZF4BMnb6+lNHc8MwImqZeKiW uhj57gequgb0R0K3soOYe2U= X-Google-Smtp-Source: AMrXdXtxp5uCylt5XuqJFCkJGFVjEeFOpLgo68hM7QNIbUb+yh1qvZzup5LN8f1NAtkwvqtWEF47uA== X-Received: by 2002:a50:c005:0:b0:49e:f062:99e6 with SMTP id r5-20020a50c005000000b0049ef06299e6mr37718932edb.28.1675090024105; Mon, 30 Jan 2023 06:47:04 -0800 (PST) Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id z7-20020a05640240c700b0046c4553010fsm6941353edb.1.2023.01.30.06.47.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 30 Jan 2023 06:47:03 -0800 (PST) Message-ID: <373a575f-c683-1581-c3e6-502e9897fb04@HIDDEN> Date: Mon, 30 Jan 2023 16:47:01 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Content-Language: en-US To: Eli Zaretskii <eliz@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> <83zga03yne.fsf@HIDDEN> From: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <83zga03yne.fsf@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.9 (/) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.9 (-) On 30/01/2023 16:06, Eli Zaretskii wrote: > Our style is to leave a blank between ASET and the left parenthesis. Sure, thanks. > Mmm... no. You should use Fnarrow_to_region, I think. Last time I tried that (from Lisp, with save-restriction), I think the result was measurably slower. I can try it again from C, though. > But why do you need to narrow there? fast_looking_at will not go > beyond end_pos/end_byte anyway, there's no need to restrict it. The reason for that is to be able to support the \` and \' markers in REGEXP. I haven't found any alternative approach that doesn't call 'substring'. > And here I suggest an additional optimization, since you already know > the byte positions: No real objection from me if you're sure, but I tried that, and the benchmarks showed no difference. So I submitted the shorter version. (I suppose we also could save the previous values of BEGV_BYTE and ZV_BYTE to use when restoring.) Either way, we would use this *instead of* Fnarrow_to_region, right? Because it only accepts two arguments.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 14:06:30 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 30 09:06:29 2023 Received: from localhost ([127.0.0.1]:46735 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pMUnl-0001u4-IG for submit <at> debbugs.gnu.org; Mon, 30 Jan 2023 09:06:29 -0500 Received: from eggs.gnu.org ([209.51.188.92]:39794) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1pMUni-0001tr-SL for 60953 <at> debbugs.gnu.org; Mon, 30 Jan 2023 09:06:28 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pMUnd-0007Sr-Hd; Mon, 30 Jan 2023 09:06:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=Wyq9ne4NlBwxBTb9CaVmeT/DuUxYnO30s70SqEbIgoo=; b=Z7nA98vW+1U3 chSFLQ2Nzmro7SbRU3z8azVu6K/GkJ2Rpwa8kfYA1OhDxcMIr+zfn1yI7TwpsrLmMkeIQZwNEHbxg KwlYujHlVyiSfyQfB/HQeBuNuX17w9irG215NFK4eFD6E/0aojWAkM4qs/dynlF35JXYYBn+8GaSd Yla8uQ6v7TmtMktcxaggDG8bjU8m7AWWqjnd8j0vJsobuHNb1ELsq1Pin5rUndTT0Pc6MCD/ZB9oO tBXYS4TwjyUXuu7Vz1UWb/w1lP8BMbFfWg5KrqQqvP908/T8P/vvmnaEKI2OgliFo8QudmmiBZxYF YCtDVsYevpSMCkxP4JWwGw==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pMUnc-0006UU-Qv; Mon, 30 Jan 2023 09:06:21 -0500 Date: Mon, 30 Jan 2023 16:06:13 +0200 Message-Id: <83zga03yne.fsf@HIDDEN> From: Eli Zaretskii <eliz@HIDDEN> To: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> (message from Dmitry Gutov on Mon, 30 Jan 2023 02:49:47 +0200) Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) > Date: Mon, 30 Jan 2023 02:49:47 +0200 > From: Dmitry Gutov <dgutov@HIDDEN> > Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org > > Code review welcome. See some below. > Is applying (and undoing) the narrowing this way legal enough? Or should > I go through some error handlers, or ensure blocks, etc? Mmm... no. You should use Fnarrow_to_region, I think. But why do you need to narrow there? fast_looking_at will not go beyond end_pos/end_byte anyway, there's no need to restrict it. Or are you thinking about widening a buffer that is already narrowed? But if so, can we have parser data beyond the restriction? > + Lisp_Object predicates = AREF(predicates_table, match.pattern_index); > + if (EQ (predicates, Qt)) > + { > + predicates = treesit_predicates_for_pattern (treesit_query, 0); > + ASET(predicates_table, match.pattern_index, predicates); Our style is to leave a blank between ASET and the left parenthesis. > + set_buffer_internal (buffer); > + > + TSNode treesit_node = XTS_NODE (node)->node; > + ptrdiff_t visible_beg = XTS_PARSER (XTS_NODE (node)->parser)->visible_beg; > + uint32_t start_byte_offset = ts_node_start_byte (treesit_node); > + uint32_t end_byte_offset = ts_node_end_byte (treesit_node); > + ptrdiff_t start_byte = visible_beg + start_byte_offset; > + ptrdiff_t end_byte = visible_beg + end_byte_offset; > + ptrdiff_t start_pos = buf_bytepos_to_charpos (buffer, start_byte); > + ptrdiff_t end_pos = buf_bytepos_to_charpos (buffer, end_byte); > + ptrdiff_t old_begv = BEGV; > + ptrdiff_t old_zv = ZV; Since you switch to BUFFER, you can use BYTE_TO_CHAR, no need for buf_bytepos_to_charpos. > + SET_BUF_BEGV(buffer, start_pos); > + SET_BUF_ZV(buffer, end_pos); And here I suggest an additional optimization, since you already know the byte positions: BEGV = start_pos; BEGV_BYTE = start_byte; ZV = end_pos; ZV_BYTE = end_byte;
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 00:49:59 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Sun Jan 29 19:49:59 2023 Received: from localhost ([127.0.0.1]:45616 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pMIMx-0007ru-3i for submit <at> debbugs.gnu.org; Sun, 29 Jan 2023 19:49:59 -0500 Received: from mail-wm1-f53.google.com ([209.85.128.53]:51140) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <raaahh@HIDDEN>) id 1pMIMu-0007rg-Rw for 60953 <at> debbugs.gnu.org; Sun, 29 Jan 2023 19:49:57 -0500 Received: by mail-wm1-f53.google.com with SMTP id bg26so1142981wmb.0 for <60953 <at> debbugs.gnu.org>; Sun, 29 Jan 2023 16:49:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:references:cc:to:from:content-language:subject :user-agent:mime-version:date:message-id:sender:from:to:cc:subject :date:message-id:reply-to; bh=ccMJ3JxQbMNi8a0+irMIdsxaWvzxPV5RMJaE5jeftp0=; b=Qs9To49G1i0hEEl2RdkNgPCAcgi3rMNzz03fMYW9/NldMSoLudpp3FB4e22qmLHVVH zKbA3JeVGFg6y4wO9mbnzFO3hEDw3o5Y9/sSoe39+g1o9vcEeGVTwPnm8ik5fvFm7D1G krfTzRFueyWRg6ZlhWLIOBLg0Y45rfqqnZv8tXZ2CkBOfyRjbqrTNrwyYmMCvY4ZmzBm 1InLYTB8IuHzU40FsT0gsEHMQBwa0eoBV5Cfj2eYkZOl9zs+Z45sXgaHEN9Wkx+AcdMv HSe3sLWfcZSGRDa7bh+t+ycRH3MlsapA7gdGrCOuZxfyhGVyueXImJDydOnL24twJVEF nzfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:references:cc:to:from:content-language:subject :user-agent:mime-version:date:message-id:sender:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=ccMJ3JxQbMNi8a0+irMIdsxaWvzxPV5RMJaE5jeftp0=; b=FiJN4M3tbj/kA7r0fAiqBWwPDLeEjCuJtaMd/KbrswvkXCsjBF792pyy9ZProw95b1 dy8AuC/JHv8IK7efsRks3KgK9W9phLi2G7jlA8/GTku+ukFh7tzsOP5UMZfwXOl9dcD1 zpjZr21a84lxincNrzM+QN5ypGFsFfHFM6+sf8hM7YfrxL9OY6qz9Z3KKG0F1lMlgKY9 LpO0YGAzJR2PIIWJo0OvXYCdtONfXRkv4IkU4MVVVuMQTaEjiNRw9+kv+LzjMMMi5Eun NYkWN+frJvH4x/VV7SqH8/6URb7Be4O02cJm2URSfx2Rg2MPh/P/chvdF6F9dDFF6+H6 Y4Uw== X-Gm-Message-State: AFqh2kqxG5RHbf2N2F2FU+76WmGvcl7bzIGH/61iyP0nq0e9yv7k0nuO DJ2OX9oOwbFpr7S30C6I1QE= X-Google-Smtp-Source: AMrXdXscN4xgE4HJaKMvzP3NYA33iT0iAVy/7/ksuuAseJtluurk0rMJl4vcZrH/ojnT16JtxOO39w== X-Received: by 2002:a05:600c:601c:b0:3d9:ee01:60a4 with SMTP id az28-20020a05600c601c00b003d9ee0160a4mr48658374wmb.1.1675039790753; Sun, 29 Jan 2023 16:49:50 -0800 (PST) Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id h15-20020a05600c2caf00b003d974076f13sm12525596wmc.3.2023.01.29.16.49.49 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 29 Jan 2023 16:49:49 -0800 (PST) Content-Type: multipart/mixed; boundary="------------FtNCE7uFJJvPd9u0gZl5ikKW" Message-ID: <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@HIDDEN> Date: Mon, 30 Jan 2023 02:49:47 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Content-Language: en-US From: Dmitry Gutov <dgutov@HIDDEN> To: Eli Zaretskii <eliz@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> In-Reply-To: <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> X-Spam-Score: -0.9 (/) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.9 (-) This is a multi-part message in MIME format. --------------FtNCE7uFJJvPd9u0gZl5ikKW Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 26/01/2023 23:26, Dmitry Gutov wrote: >>>> (But I thought you concluded that GC alone cannot explain the >>>> difference in performance?) >>> I'm inclined to think the difference is related to copying of the regexp >>> string, but whether the time is spent in actually copying it, or >>> scanning its copies for garbage later, it was harder to say. Seems like >>> it's the latter, though. >> If we can avoid the copying, I think it's desirable in any case. They >> are constant regexps, aren't they? > > Yes, but how? > > Memoization is one possible step, but then we only avoid re-creating the > predicate structures for each match. We still send a pretty large query > and, apparently, get it back..? Might be some copying involved there. > > TBH the moderate success the memoization patch shows has me stumped. Okay, I have cleaned up both experiments that I had. And when combined, they make the :match approach a little faster than the :pred one. I'm still not sure why the difference is so little, given that the :pred one has Lisp funcalls and extra allocation, and :match does not. Still, if nobody has any better ideas, I suggest we install both of these changes now. They are attached in separate patches. memoize_vector.diff improves the performance of both cases. For :pred, it's roughly 10%; for :match, it's more. treesit_predicate_match.diff improves the performance of the latter, though only a little: maybe 3-4%. Code review welcome. Is applying (and undoing) the narrowing this way legal enough? Or should I go through some error handlers, or ensure blocks, etc? Speaking of pref, the profile looks like this now (very similar to what it was before the added rule): 17.25% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 10.93% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 9.89% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child 9.01% emacs emacs [.] process_mark_stack 4.80% emacs libtree-sitter.so.0.0 [.] ts_node_start_point 3.84% emacs emacs [.] re_match_2_internal 3.82% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_parent_node 3.06% emacs libtree-sitter.so.0.0 [.] ts_language_symbol_metadata --------------FtNCE7uFJJvPd9u0gZl5ikKW Content-Type: text/x-patch; charset=UTF-8; name="memoize_vector.diff" Content-Disposition: attachment; filename="memoize_vector.diff" Content-Transfer-Encoding: base64 ZGlmZiAtLWdpdCBhL3NyYy90cmVlc2l0LmMgYi9zcmMvdHJlZXNpdC5jCmluZGV4IGIyMTBl YzA5MjNhLi43MWFmZjMyMDJhZSAxMDA2NDQKLS0tIGEvc3JjL3RyZWVzaXQuYworKysgYi9z cmMvdHJlZXNpdC5jCkBAIC0yNzIwLDggKzI3NDQsMTAgQEAgREVGVU4gKCJ0cmVlc2l0LXF1 ZXJ5LWNhcHR1cmUiLAogICAgICBldmVyeSBmb3IgbG9vcCBhbmQgbmNvbmMgaXQgdG8gUkVT VUxUIGV2ZXJ5IHRpbWUuICBUaGF0IGlzIGluZGVlZAogICAgICB0aGUgaW5pdGlhbCBpbXBs ZW1lbnRhdGlvbiBpbiB3aGljaCBZb2F2IGZvdW5kIG5jb25jIGJlaW5nIHRoZQogICAgICBi b3R0bGVuZWNrICg5OC40JSBvZiB0aGUgcnVubmluZyB0aW1lIHNwZW50IG9uIG5jb25jKS4g ICovCisgIHVpbnQzMl90IHBhdHRlcm5zX2NvdW50ID0gdHNfcXVlcnlfcGF0dGVybl9jb3Vu dCh0cmVlc2l0X3F1ZXJ5KTsKICAgTGlzcF9PYmplY3QgcmVzdWx0ID0gUW5pbDsKICAgTGlz cF9PYmplY3QgcHJldl9yZXN1bHQgPSByZXN1bHQ7CisgIExpc3BfT2JqZWN0IHByZWRpY2F0 ZXNfdGFibGUgPSBtYWtlX3ZlY3RvcihwYXR0ZXJuc19jb3VudCwgUXQpOwogICB3aGlsZSAo dHNfcXVlcnlfY3Vyc29yX25leHRfbWF0Y2ggKGN1cnNvciwgJm1hdGNoKSkKICAgICB7CiAg ICAgICAvKiBSZWNvcmQgdGhlIGNoZWNrcG9pbnQgdGhhdCB3ZSBtYXkgcm9sbCBiYWNrIHRv LiAgKi8KQEAgLTI3NTAsOSArMjc3NiwxMiBAQCBERUZVTiAoInRyZWVzaXQtcXVlcnktY2Fw dHVyZSIsCiAJICByZXN1bHQgPSBGY29ucyAoY2FwLCByZXN1bHQpOwogCX0KICAgICAgIC8q IEdldCBwcmVkaWNhdGVzLiAgKi8KLSAgICAgIExpc3BfT2JqZWN0IHByZWRpY2F0ZXMKLQk9 IHRyZWVzaXRfcHJlZGljYXRlc19mb3JfcGF0dGVybiAodHJlZXNpdF9xdWVyeSwKLQkJCQkJ ICBtYXRjaC5wYXR0ZXJuX2luZGV4KTsKKyAgICAgIExpc3BfT2JqZWN0IHByZWRpY2F0ZXMg PSBBUkVGKHByZWRpY2F0ZXNfdGFibGUsIG1hdGNoLnBhdHRlcm5faW5kZXgpOworICAgICAg aWYgKEVRIChwcmVkaWNhdGVzLCBRdCkpCisJeworCSAgcHJlZGljYXRlcyA9IHRyZWVzaXRf cHJlZGljYXRlc19mb3JfcGF0dGVybiAodHJlZXNpdF9xdWVyeSwgMCk7CisJICBBU0VUKHBy ZWRpY2F0ZXNfdGFibGUsIG1hdGNoLnBhdHRlcm5faW5kZXgsIHByZWRpY2F0ZXMpOworCX0K IAogICAgICAgLyogY2FwdHVyZXNfbGlzcCA9IEZucmV2ZXJzZSAoY2FwdHVyZXNfbGlzcCk7 ICovCiAgICAgICBzdHJ1Y3QgY2FwdHVyZV9yYW5nZSBjYXB0dXJlc19yYW5nZSA9IHsgcmVz dWx0LCBwcmV2X3Jlc3VsdCB9Owo= --------------FtNCE7uFJJvPd9u0gZl5ikKW Content-Type: text/x-patch; charset=UTF-8; name="treesit_predicate_match.diff" Content-Disposition: attachment; filename="treesit_predicate_match.diff" Content-Transfer-Encoding: base64 ZGlmZiAtLWdpdCBhL3NyYy90cmVlc2l0LmMgYi9zcmMvdHJlZXNpdC5jCmluZGV4IGIyMTBl YzA5MjNhLi4zNjMwZGI0MmY1ZSAxMDA2NDQKLS0tIGEvc3JjL3RyZWVzaXQuYworKysgYi9z cmMvdHJlZXNpdC5jCkBAIC0yNDY2LDEwICsyNDY2LDM0IEBAIHRyZWVzaXRfcHJlZGljYXRl X21hdGNoIChMaXNwX09iamVjdCBhcmdzLCBzdHJ1Y3QgY2FwdHVyZV9yYW5nZSBjYXB0dXJl cykKIAkgICAgICBidWlsZF9zdHJpbmcgKCJUaGUgc2Vjb25kIGFyZ3VtZW50IHRvIGBtYXRj aCcgc2hvdWxkICIKIAkJICAgICAgICAgICAgImJlIGEgY2FwdHVyZSBuYW1lLCBub3QgYSBz dHJpbmciKSk7CiAKLSAgTGlzcF9PYmplY3QgdGV4dCA9IHRyZWVzaXRfcHJlZGljYXRlX2Nh cHR1cmVfbmFtZV90b190ZXh0IChjYXB0dXJlX25hbWUsCi0JCQkJCQkJICAgICBjYXB0dXJl cyk7CisgIExpc3BfT2JqZWN0IG5vZGUgPSB0cmVlc2l0X3ByZWRpY2F0ZV9jYXB0dXJlX25h bWVfdG9fbm9kZSAoY2FwdHVyZV9uYW1lLCBjYXB0dXJlcyk7CiAKLSAgaWYgKGZhc3Rfc3Ry aW5nX21hdGNoIChyZWdleHAsIHRleHQpID49IDApCisgIHN0cnVjdCBidWZmZXIgKm9sZF9i dWZmZXIgPSBjdXJyZW50X2J1ZmZlcjsKKyAgc3RydWN0IGJ1ZmZlciAqYnVmZmVyID0gWEJV RkZFUiAoWFRTX1BBUlNFUiAoWFRTX05PREUgKG5vZGUpLT5wYXJzZXIpLT5idWZmZXIpOwor ICBzZXRfYnVmZmVyX2ludGVybmFsIChidWZmZXIpOworCisgIFRTTm9kZSB0cmVlc2l0X25v ZGUgPSBYVFNfTk9ERSAobm9kZSktPm5vZGU7CisgIHB0cmRpZmZfdCB2aXNpYmxlX2JlZyA9 IFhUU19QQVJTRVIgKFhUU19OT0RFIChub2RlKS0+cGFyc2VyKS0+dmlzaWJsZV9iZWc7Cisg IHVpbnQzMl90IHN0YXJ0X2J5dGVfb2Zmc2V0ID0gdHNfbm9kZV9zdGFydF9ieXRlICh0cmVl c2l0X25vZGUpOworICB1aW50MzJfdCBlbmRfYnl0ZV9vZmZzZXQgPSB0c19ub2RlX2VuZF9i eXRlICh0cmVlc2l0X25vZGUpOworICBwdHJkaWZmX3Qgc3RhcnRfYnl0ZSA9IHZpc2libGVf YmVnICsgc3RhcnRfYnl0ZV9vZmZzZXQ7CisgIHB0cmRpZmZfdCBlbmRfYnl0ZSA9IHZpc2li bGVfYmVnICsgZW5kX2J5dGVfb2Zmc2V0OworICBwdHJkaWZmX3Qgc3RhcnRfcG9zID0gYnVm X2J5dGVwb3NfdG9fY2hhcnBvcyAoYnVmZmVyLCBzdGFydF9ieXRlKTsKKyAgcHRyZGlmZl90 IGVuZF9wb3MgPSBidWZfYnl0ZXBvc190b19jaGFycG9zIChidWZmZXIsIGVuZF9ieXRlKTsK KyAgcHRyZGlmZl90IG9sZF9iZWd2ID0gQkVHVjsKKyAgcHRyZGlmZl90IG9sZF96diA9IFpW OworCisgIFNFVF9CVUZfQkVHVihidWZmZXIsIHN0YXJ0X3Bvcyk7CisgIFNFVF9CVUZfWlYo YnVmZmVyLCBlbmRfcG9zKTsKKworICBwdHJkaWZmX3QgdmFsID0gZmFzdF9sb29raW5nX2F0 IChyZWdleHAsIHN0YXJ0X3Bvcywgc3RhcnRfYnl0ZSwgZW5kX3BvcywgZW5kX2J5dGUsIFFu aWwpOworCisgIFNFVF9CVUZfQkVHVihidWZmZXIsIG9sZF9iZWd2KTsKKyAgU0VUX0JVRl9a VihidWZmZXIsIG9sZF96dik7CisKKyAgc2V0X2J1ZmZlcl9pbnRlcm5hbCAob2xkX2J1ZmZl cik7CisKKyAgaWYgKHZhbCA+IDApCiAgICAgcmV0dXJuIHRydWU7CiAgIGVsc2UKICAgICBy ZXR1cm4gZmFsc2U7Cg== --------------FtNCE7uFJJvPd9u0gZl5ikKW--
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 21:27:05 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 26 16:27:05 2023 Received: from localhost ([127.0.0.1]:36340 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pL9lx-00041Y-1g for submit <at> debbugs.gnu.org; Thu, 26 Jan 2023 16:27:05 -0500 Received: from mail-ej1-f43.google.com ([209.85.218.43]:42826) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <raaahh@HIDDEN>) id 1pL9lu-000414-S1 for 60953 <at> debbugs.gnu.org; Thu, 26 Jan 2023 16:27:03 -0500 Received: by mail-ej1-f43.google.com with SMTP id bk15so8663703ejb.9 for <60953 <at> debbugs.gnu.org>; Thu, 26 Jan 2023 13:27:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=BS3DGF0sSLKQtAa0ql9SoQjS5x+j1bcnb6YbOErguac=; b=eDBmkJe/kjYLsTCejgZAThIFGHFumDuhOGE4MwdROUAtuIKWJcIv0HKaUFlOJIKkje laMnzTX0nn9Ge9yoiTDsssvG+n5mZQc/g9rtzncbN9p8aukbKXbtTwp+UabRiPNyvzvS tO4IPHOFP6IhmeURQHO2Er/ikBMkipr9K00ygMoCpJ5N74gxmRoqeyydXYEh5m/4lGAF B0oIFh/licKsQAM7cF0ndFcYuHyp4oHm6SlVdAW2syh1MQgdHv0agjjzrmoWvVO+scK8 bzm334vn2PvUKAdnMoVxag+7BVrKF734NwQtyWpYOjgnT348aBjWL0zy5rmoT8lEieoS incA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=BS3DGF0sSLKQtAa0ql9SoQjS5x+j1bcnb6YbOErguac=; b=XacnzGBxpMQaa4wdRdnmECruNopGQfYKTkOdX7dTOvVb913UbLTyHxZsJoHngOEWFx bdBjLrPx22wFOu9h5V2OXkms6g0Q4df5qwdcTiS96VrHDaPu3BCM7hSsiZc3IIBGrS5x OFae1bqRn6YcaAg+CGVzvkAGUELH9mE5jL6AsHWyt5o8Y6DbNwMpqMKjLC8b0NMd20eH PAECCYgEO24mtYuEC6o0zG8OXBoqoeaWuze4GQOifCpg2BqIKvAHG2gcvPbN2LhPbN/Z m3pwqE26LtpzjUIjB4PciGUOTp486qdyWI9Vwjo/Q69FLuNvvxc8pqvOEMX+5Zgh04DU wTLQ== X-Gm-Message-State: AFqh2kqHXTEltVUcq7b7fIj5GskAdiudb6qg6+V84WYJ0wE44V2YZTCx V0DVc9S8mSb650QFPzkz2EA= X-Google-Smtp-Source: AMrXdXsmtwgb8B0cUUQv+2/LS3QZ6WipMm7C5J7hS8wYm6FSwNzSRQm+jn611kQKzsPQckAxzgQmGg== X-Received: by 2002:a17:906:175a:b0:877:6713:7e99 with SMTP id d26-20020a170906175a00b0087767137e99mr31234882eje.58.1674768416834; Thu, 26 Jan 2023 13:26:56 -0800 (PST) Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id w9-20020a170906184900b007c0f217aadbsm1121822eje.24.2023.01.26.13.26.55 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 26 Jan 2023 13:26:56 -0800 (PST) Message-ID: <2da844d3-ea31-289e-2821-aa174e365ffd@HIDDEN> Date: Thu, 26 Jan 2023 23:26:54 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient To: Eli Zaretskii <eliz@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> <83pmb1cbg5.fsf@HIDDEN> Content-Language: en-US From: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <83pmb1cbg5.fsf@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.1 (/) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.9 (-) On 26/01/2023 22:01, Eli Zaretskii wrote: >> Date: Thu, 26 Jan 2023 21:35:55 +0200 >> Cc:casouri@HIDDEN,60953 <at> debbugs.gnu.org >> From: Dmitry Gutov<dgutov@HIDDEN> >> >>> If you are saying that GC is responsible, then running the benchmark >>> with gc-cons-threshold set to most-positive-fixnum should produce a >>> more interesting profile and perhaps a more interesting comparison. >> That really helps: >> >> (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (let >> (treesit--font-lock-fast-mode) (font-lock-ensure)))) >> >> => (16.078430587 251 5.784299419999996) >> >> (let ((gc-cons-threshold most-positive-fixnum)) (benchmark-run 1000 >> (progn (font-lock-mode -1) (font-lock-mode 1) (let >> (treesit--font-lock-fast-mode) (font-lock-ensure))))) >> >> => (10.369389725 0 0.0) >> >> Do you want a perf profile for the latter? It might not be very useful. > I'd be interested in comparing the profiles of the two techniques, the > :pred and the :match, with GC disabled like that. Curiously, :pred is still faster, but the difference is much smaller: pred: (9.212951344 0 0.0) 18.23% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 11.61% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 11.43% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child 5.00% emacs libtree-sitter.so.0.0 [.] ts_node_start_point 4.02% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_parent_node 3.97% emacs emacs [.] re_match_2_internal 3.36% emacs libtree-sitter.so.0.0 [.] ts_language_symbol_metadata 2.45% emacs emacs [.] parse_str_as_multibyte 1.95% emacs emacs [.] exec_byte_code 1.66% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_node 1.66% emacs libtree-sitter.so.0.0 [.] ts_node_end_point 1.30% emacs emacs [.] allocate_vectorlike 1.24% emacs emacs [.] find_interval match: (10.059083317 0 0.0) 19.23% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 12.41% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 11.22% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child 5.21% emacs libtree-sitter.so.0.0 [.] ts_node_start_point 4.22% emacs emacs [.] re_match_2_internal 3.97% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_parent_node 3.64% emacs libtree-sitter.so.0.0 [.] ts_language_symbol_metadata 2.36% emacs emacs [.] exec_byte_code 1.66% emacs libtree-sitter.so.0.0 [.] ts_node_end_point 1.62% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_node 1.34% emacs libtree-sitter.so.0.0 [.] ts_node_end_byte 1.28% emacs emacs [.] allocate_vectorlike 0.95% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_parent This is with the current code and disabled GC. No additional changes to treesit.c. >>> (But I thought you concluded that GC alone cannot explain the >>> difference in performance?) >> I'm inclined to think the difference is related to copying of the regexp >> string, but whether the time is spent in actually copying it, or >> scanning its copies for garbage later, it was harder to say. Seems like >> it's the latter, though. > If we can avoid the copying, I think it's desirable in any case. They > are constant regexps, aren't they? Yes, but how? Memoization is one possible step, but then we only avoid re-creating the predicate structures for each match. We still send a pretty large query and, apparently, get it back..? Might be some copying involved there. TBH the moderate success the memoization patch shows has me stumped.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 20:46:41 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 26 15:46:40 2023 Received: from localhost ([127.0.0.1]:36303 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pL98q-0002za-II for submit <at> debbugs.gnu.org; Thu, 26 Jan 2023 15:46:40 -0500 Received: from mail-ej1-f41.google.com ([209.85.218.41]:33523) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <raaahh@HIDDEN>) id 1pL98p-0002zO-Ja for 60953 <at> debbugs.gnu.org; Thu, 26 Jan 2023 15:46:40 -0500 Received: by mail-ej1-f41.google.com with SMTP id tz11so8597884ejc.0 for <60953 <at> debbugs.gnu.org>; Thu, 26 Jan 2023 12:46:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:references:cc:to:from:content-language:subject :user-agent:mime-version:date:message-id:sender:from:to:cc:subject :date:message-id:reply-to; bh=rQazVJ8+UDAi+tFvCR2cvCHtaDjBGT0q5PiObKJfIuc=; b=NbR/5O8rQsr4Nbx/JFcD+FVsqq3zT2nwUA0qzBvvgEjUsKwwsHaV6X0u4O/FqhErp4 22vYNlfAl0ZLa/J6pwOhDik3Q8CuwoTBWw1B7x0dCCIgfz5fLI5VCT28gTcDdM+fzLrl lio9XdVPi5trbtH+0jggwPhrVjVQsKbIFs9U5X5za+tYAYlWvziXsUxA64Z+sPRZI43W 24kir6ArBnkTzDy+pp8bwYbiMMLohvKCsVCir7KNcQ8NY2yreJ9bjKE+vJrYKTx6HTGN dGJZcF8T/lF21vdMs1+5hcazPCQaTUy5LLgSGBTC+x4iWrT54phMY8aqQnD7kmDoIQBV 1hwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:references:cc:to:from:content-language:subject :user-agent:mime-version:date:message-id:sender:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=rQazVJ8+UDAi+tFvCR2cvCHtaDjBGT0q5PiObKJfIuc=; b=wKSpPq8bqK2W4z8iMT75V7QWl2wKMuzjvoyS8hrDdbMTMoh2XMmnCn0E887x+c33ff IDolJGgneX23LKa2by/8dCUBf50jugFJLtph8Oqh3zQq+MIHxeOKfRSGWs2kFmgdcmVM Ea8gVaU9SYqpMzzrfBlXqsBny3r6Obl7W3OUeClrzk19fJFDjov56UwFWYuMoXxdJqTG W1RUfR+dywHPMJwgObpzL8YQDmZayXzBJZnf822IOhW5s+cLpF+9tIPp1YmVXouaMW37 ui4XbKNFKoRa8/yl4AzT6YbgLNU1IyvXQV/ClvYDnvZXBTP+uT+2lqi85GZgE1oK+6Yh TG4A== X-Gm-Message-State: AO0yUKUOQArfnx4FifaIVyQnf2AnAOPhshRPuU4SwCRKlsQmKsmkKmMZ G4uKE+04Vut6ZRGE7ueR8ek= X-Google-Smtp-Source: AK7set+oOKNx/cs/16eCytpJ/YkfwgDfIuiQ0kher+QjqXUJoIvbuM8gDPeB36SLNHES6ofukM41zw== X-Received: by 2002:a17:906:d974:b0:878:7a0e:5730 with SMTP id rp20-20020a170906d97400b008787a0e5730mr2297979ejb.56.1674765993519; Thu, 26 Jan 2023 12:46:33 -0800 (PST) Received: from [10.115.253.32] ([138.199.34.134]) by smtp.googlemail.com with ESMTPSA id e22-20020a17090658d600b0085214114218sm1089170ejs.185.2023.01.26.12.46.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 26 Jan 2023 12:46:32 -0800 (PST) Content-Type: multipart/mixed; boundary="------------sHXGqMYlVXefA1o6sqxFpSea" Message-ID: <1d7aaf56-6130-c0f0-446f-4bc2c5cafa28@HIDDEN> Date: Thu, 26 Jan 2023 22:46:30 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Content-Language: en-US From: Dmitry Gutov <dgutov@HIDDEN> To: Eli Zaretskii <eliz@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <31559c1f-1a12-691d-3d03-f566019a0aab@HIDDEN> In-Reply-To: <31559c1f-1a12-691d-3d03-f566019a0aab@HIDDEN> X-Spam-Score: -0.9 (/) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.9 (-) This is a multi-part message in MIME format. --------------sHXGqMYlVXefA1o6sqxFpSea Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 26/01/2023 20:07, Dmitry Gutov wrote: > One could hope to avoid recreating the list of predicates on every > match, but that seems to be a limitation of the TS API: > ts_query_predicates_for_pattern requires a second argument, > match.pattern_index. Maybe we could memoize that, though? Speaking of memoization, here is a POC patch. It's a definite improvement: with the attached :match almost reaches the performance of :pred. Not sure why it's still not faster, though. (I also tried a more comprehensive memoization using a hash table, the resulting performance was slightly worse.) --------------sHXGqMYlVXefA1o6sqxFpSea Content-Type: text/x-patch; charset=UTF-8; name="memoize_simple.diff" Content-Disposition: attachment; filename="memoize_simple.diff" Content-Transfer-Encoding: base64 ZGlmZiAtLWdpdCBhL3NyYy90cmVlc2l0LmMgYi9zcmMvdHJlZXNpdC5jCmluZGV4IDkxN2Ri NTgyNjc2Li42OWY1NDk3NjUwOSAxMDA2NDQKLS0tIGEvc3JjL3RyZWVzaXQuYworKysgYi9z cmMvdHJlZXNpdC5jCkBAIC0yNzIyLDYgKzI3MjIsNyBAQCBERUZVTiAoInRyZWVzaXQtcXVl cnktY2FwdHVyZSIsCiAgICAgIGJvdHRsZW5lY2sgKDk4LjQlIG9mIHRoZSBydW5uaW5nIHRp bWUgc3BlbnQgb24gbmNvbmMpLiAgKi8KICAgTGlzcF9PYmplY3QgcmVzdWx0ID0gUW5pbDsK ICAgTGlzcF9PYmplY3QgcHJldl9yZXN1bHQgPSByZXN1bHQ7CisgIExpc3BfT2JqZWN0IHBy ZWRpY2F0ZXNfZm9yXzAgPSBOVUxMOwogICB3aGlsZSAodHNfcXVlcnlfY3Vyc29yX25leHRf bWF0Y2ggKGN1cnNvciwgJm1hdGNoKSkKICAgICB7CiAgICAgICAvKiBSZWNvcmQgdGhlIGNo ZWNrcG9pbnQgdGhhdCB3ZSBtYXkgcm9sbCBiYWNrIHRvLiAgKi8KQEAgLTI3NTAsOSArMjc1 MSwxOCBAQCBERUZVTiAoInRyZWVzaXQtcXVlcnktY2FwdHVyZSIsCiAJICByZXN1bHQgPSBG Y29ucyAoY2FwLCByZXN1bHQpOwogCX0KICAgICAgIC8qIEdldCBwcmVkaWNhdGVzLiAgKi8K LSAgICAgIExpc3BfT2JqZWN0IHByZWRpY2F0ZXMKLQk9IHRyZWVzaXRfcHJlZGljYXRlc19m b3JfcGF0dGVybiAodHJlZXNpdF9xdWVyeSwKLQkJCQkJICBtYXRjaC5wYXR0ZXJuX2luZGV4 KTsKKyAgICAgIExpc3BfT2JqZWN0IHByZWRpY2F0ZXM7CisgICAgICBpZiAobWF0Y2gucGF0 dGVybl9pbmRleCA9PSAwKQorCXsKKwkgIGlmIChwcmVkaWNhdGVzX2Zvcl8wID09IE5VTEwp CisJICAgIHByZWRpY2F0ZXNfZm9yXzAgPSB0cmVlc2l0X3ByZWRpY2F0ZXNfZm9yX3BhdHRl cm4gKHRyZWVzaXRfcXVlcnksIDApOworCisJICBwcmVkaWNhdGVzID0gcHJlZGljYXRlc19m b3JfMDsKKwl9CisgICAgICBlbHNlCisJeworCSAgcHJlZGljYXRlcyA9IHRyZWVzaXRfcHJl ZGljYXRlc19mb3JfcGF0dGVybiAodHJlZXNpdF9xdWVyeSwgbWF0Y2gucGF0dGVybl9pbmRl eCk7CisJfQogCiAgICAgICAvKiBjYXB0dXJlc19saXNwID0gRm5yZXZlcnNlIChjYXB0dXJl c19saXNwKTsgKi8KICAgICAgIHN0cnVjdCBjYXB0dXJlX3JhbmdlIGNhcHR1cmVzX3Jhbmdl ID0geyByZXN1bHQsIHByZXZfcmVzdWx0IH07Cg== --------------sHXGqMYlVXefA1o6sqxFpSea--
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 20:01:37 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 26 15:01:37 2023 Received: from localhost ([127.0.0.1]:36262 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pL8RF-0001vr-8J for submit <at> debbugs.gnu.org; Thu, 26 Jan 2023 15:01:37 -0500 Received: from eggs.gnu.org ([209.51.188.92]:45500) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1pL8RD-0001ve-Fd for 60953 <at> debbugs.gnu.org; Thu, 26 Jan 2023 15:01:36 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pL8R7-0006vy-Sx; Thu, 26 Jan 2023 15:01:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=jh/ylYjk4N1pyci2YZyOauxjzn6+c99gp03GFmTG+ac=; b=QPo8V3v4uW8I i/BdOA+Gd3og27Qxv2NJjDPJQTW3Q5CV+bXas4hIvT6YNCD9eyvOtetPj5txdiaUB9FkYW2TGOkyU gRKE5S/C60sBRn3af01GjNXwg5gu3YQmIkxTBK4SSeMNXghFOfznzw/kJARDh8f3d4qbWx3wnfWvb MBhLNCrOnGFGd2+sd9oGsfEhCHHs6rCkTXnuGVANA7D/pPU4nbMTDKAjvDFnTXxl25Bdt4QDcKmpz 1VycX4XX9/ZH2HVwFtc+G109UT75ouWJAiuda0ivAqPKou9mpDJG76sJsS5D26PrB0ewQYYzzjHjU eRLpFVYWXkt6JwnoTUM6dA==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pL8R4-0006Rm-CA; Thu, 26 Jan 2023 15:01:28 -0500 Date: Thu, 26 Jan 2023 22:01:14 +0200 Message-Id: <83pmb1cbg5.fsf@HIDDEN> From: Eli Zaretskii <eliz@HIDDEN> To: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> (message from Dmitry Gutov on Thu, 26 Jan 2023 21:35:55 +0200) Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) > Date: Thu, 26 Jan 2023 21:35:55 +0200 > Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org > From: Dmitry Gutov <dgutov@HIDDEN> > > > If you are saying that GC is responsible, then running the benchmark > > with gc-cons-threshold set to most-positive-fixnum should produce a > > more interesting profile and perhaps a more interesting comparison. > > That really helps: > > (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (let > (treesit--font-lock-fast-mode) (font-lock-ensure)))) > > => (16.078430587 251 5.784299419999996) > > (let ((gc-cons-threshold most-positive-fixnum)) (benchmark-run 1000 > (progn (font-lock-mode -1) (font-lock-mode 1) (let > (treesit--font-lock-fast-mode) (font-lock-ensure))))) > > => (10.369389725 0 0.0) > > Do you want a perf profile for the latter? It might not be very useful. I'd be interested in comparing the profiles of the two techniques, the :pred and the :match, with GC disabled like that. > > (But I thought you concluded that GC alone cannot explain the > > difference in performance?) > > I'm inclined to think the difference is related to copying of the regexp > string, but whether the time is spent in actually copying it, or > scanning its copies for garbage later, it was harder to say. Seems like > it's the latter, though. If we can avoid the copying, I think it's desirable in any case. They are constant regexps, aren't they?
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 19:36:06 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 26 14:36:06 2023 Received: from localhost ([127.0.0.1]:36241 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pL82Y-0001Cc-2h for submit <at> debbugs.gnu.org; Thu, 26 Jan 2023 14:36:06 -0500 Received: from mail-ej1-f42.google.com ([209.85.218.42]:36516) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <raaahh@HIDDEN>) id 1pL82V-0001C8-MD for 60953 <at> debbugs.gnu.org; Thu, 26 Jan 2023 14:36:04 -0500 Received: by mail-ej1-f42.google.com with SMTP id kt14so7986388ejc.3 for <60953 <at> debbugs.gnu.org>; Thu, 26 Jan 2023 11:36:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=uoMPe98iMFQ+ydLt/Iw2fyylgcLnQkR3iEyT4x7zKfw=; b=d8Q+2gUl9emJ1ki63OJGVu+aaJMeiIrX7kKBpkxHZy21ZG+OXp3wFSPq7hlTrPHqRu 4EpU59dFEHooq90cZWmrtbcc1tPfEz8r0KFnXpdACaPjCd1EyaGK1/dARUGRWaznfWaQ feT6QnkoJ5dMyEIVKc/5Z03qh9dOLdKafzSBxN88afDQFpfpFlzGRQcWheJllJmBCyIg 21sKQiCt+bt1gEQlxynqaQL6ZWWdcxQlkDRFt2zFOTJNaVle4jmjmPKH6CKSj/5g3lvc xMmS3ybZUQjNKw9Isg+3p+RzOH7psUrmbReE29JCOdNX7Xcya4eLmfXul4LqLLdaM7xG 9Rrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=uoMPe98iMFQ+ydLt/Iw2fyylgcLnQkR3iEyT4x7zKfw=; b=zFJLBt5EhXAU3on3lVhAbgtk+sGfWGzx5W6mrxR0o+xYcNLiivVNb+r2E9IN69udQ+ aLYev7lZS7yFtx90rDDkUvkJej3JiBnizdYX/jSgJIyGuKSEpgePWb/trGIIsbMyvU7i e7iyzf01qgCpdxp0FEuchUqPWzzNWATlO+YpedDpBJcP+IQrp0C1pQwpiO5EvXj4wBrA V1rkPa3QhTsUGdxTrxmue0nhhuMple+FEANcU9/b8gcJmmT2T9uLPK6hsVE36Gmo35SQ Fe4Fr8/vF6cZDIihAfkjAmZnBiOeeFOV61Do3BfHR4p3H7be+nKiR6VpD35TLlHj9KLr eqMA== X-Gm-Message-State: AFqh2kpin718+4YMWZCMwLsKiDw3nBHsdnji4szQw153n1Sn/n2/nQeF FL3YGOUuXEPKiUWS6T6iTmQ= X-Google-Smtp-Source: AMrXdXuG1JW8ocRKBUZRz5rG7Djq9Zw4aFvjde0YmvggdX18SraYKeW2feoP79SevEk3exBGbCrd6A== X-Received: by 2002:a17:907:d089:b0:7ad:aed7:a5da with SMTP id vc9-20020a170907d08900b007adaed7a5damr42422134ejc.28.1674761757744; Thu, 26 Jan 2023 11:35:57 -0800 (PST) Received: from [10.115.253.32] ([138.199.34.134]) by smtp.googlemail.com with ESMTPSA id t24-20020a170906269800b007c16e083b01sm1030770ejc.9.2023.01.26.11.35.56 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 26 Jan 2023 11:35:57 -0800 (PST) Message-ID: <d8cf01fa-d829-fe83-0bbf-4a5f1965a107@HIDDEN> Date: Thu, 26 Jan 2023 21:35:55 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Content-Language: en-US To: Eli Zaretskii <eliz@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> <83sffxcfxw.fsf@HIDDEN> From: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <83sffxcfxw.fsf@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.9 (/) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.9 (-) On 26/01/2023 20:24, Eli Zaretskii wrote: >> Date: Thu, 26 Jan 2023 19:15:51 +0200 >> Cc:60953 <at> debbugs.gnu.org >> From: Dmitry Gutov<dgutov@HIDDEN> >> >> On 26/01/2023 10:10, Eli Zaretskii wrote: >>> Perhaps Dmitry could present comparison of profiles from perf which >>> would allow us to understand the reason(s)? >> I believe I did that in the second message in this thread: >> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60953#8 >> >> To quote the specific profiles, it's >> >> 15.30% emacs libtree-sitter.so.0.0 [.] >> ts_tree_cursor_current_status >> 14.92% emacs emacs [.] process_mark_stack >> 9.75% emacs libtree-sitter.so.0.0 [.] >> ts_tree_cursor_goto_next_sibling >> 8.90% emacs libtree-sitter.so.0.0 [.] >> ts_tree_cursor_goto_first_child >> 3.87% emacs libtree-sitter.so.0.0 [.] ts_node_start_point >> >> for :pred vs. >> >> 23.72% emacs emacs [.] process_mark_stack >> 12.33% emacs libtree-sitter.so.0.0 [.] >> ts_tree_cursor_current_status >> 7.96% emacs libtree-sitter.so.0.0 [.] >> ts_tree_cursor_goto_next_sibling >> 7.38% emacs libtree-sitter.so.0.0 [.] >> ts_tree_cursor_goto_first_child >> 3.37% emacs libtree-sitter.so.0.0 [.] ts_node_start_point >> >> for :match. >> >> And to continue the quote: >> >> Here's a significant jump in GC time which is almost the same as the >> difference in runtime. And all of it is spent marking? >> >> I suppose if the problem is allocation of a large string (many times >> over), the GC could be spending a lot of time scanning through the >> memory. Could this be avoided by passing some substitute handle to TS, >> instead of the full string? E.g. some kind of reference to it in the >> regexp cache. > If you are saying that GC is responsible, then running the benchmark > with gc-cons-threshold set to most-positive-fixnum should produce a > more interesting profile and perhaps a more interesting comparison. That really helps: (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (let (treesit--font-lock-fast-mode) (font-lock-ensure)))) => (16.078430587 251 5.784299419999996) (let ((gc-cons-threshold most-positive-fixnum)) (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (let (treesit--font-lock-fast-mode) (font-lock-ensure))))) => (10.369389725 0 0.0) Do you want a perf profile for the latter? It might not be very useful. > (But I thought you concluded that GC alone cannot explain the > difference in performance?) I'm inclined to think the difference is related to copying of the regexp string, but whether the time is spent in actually copying it, or scanning its copies for garbage later, it was harder to say. Seems like it's the latter, though.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 18:24:41 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 26 13:24:40 2023 Received: from localhost ([127.0.0.1]:36157 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pL6vN-0007d2-AN for submit <at> debbugs.gnu.org; Thu, 26 Jan 2023 13:24:40 -0500 Received: from eggs.gnu.org ([209.51.188.92]:44046) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1pL6vL-0007cp-TO for 60953 <at> debbugs.gnu.org; Thu, 26 Jan 2023 13:24:36 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pL6v8-0001vF-HP; Thu, 26 Jan 2023 13:24:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=Qmi3YOXcojTNp5jsRt/GIVG3BHQUuoEkta292bTOkF8=; b=NW1CoQgMdg0u VHFjT1nEIpqgVOCURhiI9aRwQv6DIfz/I94mdYn/yaoR9q7BrxwNNNcTfpfQqz4NmWvaukpWqxhbd seh1mEOL4FwaM2U/0aR5/X0SAu2Gae2e+Y8GPTlLiTCC4hB4GOJMVADXurt43PtzFpV4bWBmxyNwn bOro9v3XSU6iyGcgEk8aKoFKs3Lyo213dMejSAzzRs/zXrZ/5xHeUE+ZsmtNHyhgT1ACtKJknVFLb XnmOHFNsaooxAA/2hxzVbd8cfRz1ky4F4BHS0RArWeCyE5Mwk1Vz5c++Oxygznvx19HVbSSLTl4wm i6AmGjxa5uNBwL+EU79jyg==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pL6v7-0008VX-If; Thu, 26 Jan 2023 13:24:21 -0500 Date: Thu, 26 Jan 2023 20:24:11 +0200 Message-Id: <83sffxcfxw.fsf@HIDDEN> From: Eli Zaretskii <eliz@HIDDEN> To: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> (message from Dmitry Gutov on Thu, 26 Jan 2023 19:15:51 +0200) Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) > Date: Thu, 26 Jan 2023 19:15:51 +0200 > Cc: 60953 <at> debbugs.gnu.org > From: Dmitry Gutov <dgutov@HIDDEN> > > On 26/01/2023 10:10, Eli Zaretskii wrote: > > Perhaps Dmitry could present comparison of profiles from perf which > > would allow us to understand the reason(s)? > > I believe I did that in the second message in this thread: > https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60953#8 > > To quote the specific profiles, it's > > 15.30% emacs libtree-sitter.so.0.0 [.] > ts_tree_cursor_current_status > 14.92% emacs emacs [.] process_mark_stack > 9.75% emacs libtree-sitter.so.0.0 [.] > ts_tree_cursor_goto_next_sibling > 8.90% emacs libtree-sitter.so.0.0 [.] > ts_tree_cursor_goto_first_child > 3.87% emacs libtree-sitter.so.0.0 [.] ts_node_start_point > > for :pred vs. > > 23.72% emacs emacs [.] process_mark_stack > 12.33% emacs libtree-sitter.so.0.0 [.] > ts_tree_cursor_current_status > 7.96% emacs libtree-sitter.so.0.0 [.] > ts_tree_cursor_goto_next_sibling > 7.38% emacs libtree-sitter.so.0.0 [.] > ts_tree_cursor_goto_first_child > 3.37% emacs libtree-sitter.so.0.0 [.] ts_node_start_point > > for :match. > > And to continue the quote: > > Here's a significant jump in GC time which is almost the same as the > difference in runtime. And all of it is spent marking? > > I suppose if the problem is allocation of a large string (many times > over), the GC could be spending a lot of time scanning through the > memory. Could this be avoided by passing some substitute handle to TS, > instead of the full string? E.g. some kind of reference to it in the > regexp cache. If you are saying that GC is responsible, then running the benchmark with gc-cons-threshold set to most-positive-fixnum should produce a more interesting profile and perhaps a more interesting comparison. (But I thought you concluded that GC alone cannot explain the difference in performance?) Otherwise, the profiles are too similar to support any conclusions, and the fact that process_mark_stack is in a prominent place doesn't help.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 18:07:40 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 26 13:07:40 2023 Received: from localhost ([127.0.0.1]:36142 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pL6ex-0007As-P0 for submit <at> debbugs.gnu.org; Thu, 26 Jan 2023 13:07:40 -0500 Received: from mail-wm1-f52.google.com ([209.85.128.52]:51815) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <raaahh@HIDDEN>) id 1pL6ew-0007Ac-7U for 60953 <at> debbugs.gnu.org; Thu, 26 Jan 2023 13:07:38 -0500 Received: by mail-wm1-f52.google.com with SMTP id fl24so1715489wmb.1 for <60953 <at> debbugs.gnu.org>; Thu, 26 Jan 2023 10:07:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=M2pH88xOb9V43g6yrp2i59Thp/jKnq2pxz+qExBtkQQ=; b=OQABbPBv4fDrd+EREjWAvCYCcN/vfadA/FzVWW2j/orpJS93Lk3smf/13imzGake6Z KsNA33TghsspuQHZtYtb92QKLdIoP3lwgSe4a84DALTOBSOQFkjw7dXaDhB1QSD6456K 4+aDI9z8s1T+GeI2OJEXyzyIPWSPY4k8aO7/DMlIuAnpozvZoTmm482e8dgUjW48V2bI sZhpWYLYcBrrdrl+TCIHjGuzn7Jab6dcc7AmAIrnM9MKEgthW0AJgn0c06Eh/B70YpDF elLuIpUevCXWgLUY28jGrwbQ7xDxNPyFvV9wHHfuxmtoCgLbmU483f/iKboRtMQaC6Ay /BHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=M2pH88xOb9V43g6yrp2i59Thp/jKnq2pxz+qExBtkQQ=; b=vaFL+ZPRP6N6vjFCAaTuDhtqiy5gfidbQS8cpHNHYgJ1oHbUmLepBcD2tEqLhjuSfu 5Ienq9qQPpZ8aAUN9Q8cnad9H1qzQRk5MPSsnCgxopVx3ZVkBk3dLOeQBfGLOkO+a9O5 QPzrlxpBlSS/bGAy6YdvIUeWjYtfxZtLau+zO1AnPkoGIyb+FyAJ3M7h3/OjpETNhX7i +MSTh16uGfukf9/pVauoqhG8vhC6pJgHsCngx/7xIaZPN1meA5ZnfhlQL1+mGwsB4amD F+tagnJ+xZ67FWIh1gl1nT8e2V6HiEgSQLYlPCK/uOZpzrbm8ejANYBJJb20HOWt3dN5 apbw== X-Gm-Message-State: AFqh2koBdT+Imi6TPq6GAepXwIWbQ3UzmNa+qLKub3Ex5+TOnkszXeph M39daiSz8/s8mDwI9QMZWy8= X-Google-Smtp-Source: AMrXdXs9Lb31+THfr5sq3oJDwuXys4R0YMMcmnlct1OMEr9FDUJ3V9KMhZko+xPA0nJQ+7cI/tRj7Q== X-Received: by 2002:a05:600c:1e1d:b0:3cf:674a:aefe with SMTP id ay29-20020a05600c1e1d00b003cf674aaefemr35976420wmb.22.1674756452289; Thu, 26 Jan 2023 10:07:32 -0800 (PST) Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id 18-20020a05600c26d200b003da28dfdedcsm2473698wmv.5.2023.01.26.10.07.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 26 Jan 2023 10:07:31 -0800 (PST) Message-ID: <31559c1f-1a12-691d-3d03-f566019a0aab@HIDDEN> Date: Thu, 26 Jan 2023 20:07:30 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Content-Language: en-US To: Eli Zaretskii <eliz@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> From: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <838rhpg57n.fsf@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.1 (/) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.9 (-) On 26/01/2023 08:50, Eli Zaretskii wrote: >> Date: Thu, 26 Jan 2023 01:21:08 +0200 >> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org >> From: Dmitry Gutov <dgutov@HIDDEN> >> >> Thank you. Unfortunately, the performance improvement from this patch is >> still fairly negligible. > > This is quite strange, since all of the approaches basically use the > same primitives under the hood. Perhaps the reason for the slowness > is that the code which computes the text span of a node is slow? That code seems to be the same between the two options: treesit_predicate_capture_name_to_text basically does the same as treesit-node-text (except in C) after iterating through a Lisp list to find the node. ruby-ts--builtin-method-p calls treesit-node-text. And treesit_predicate_pred does the same iteration, so the :pred option should just be slower, due to Lisp-related overhead. funcalls and stuff. > Otherwise, I must be missing something here, since the rest of the > code on the C level is basically the same, give or take some wrappers > that should not change the overall picture. The query object is smaller, though. That's basically my only remaining hypothesis. >> Switching to using :pred with function (like I did in commit >> d94dc606a0934) which still uses buffer-substring inside is significantly >> faster. > > If the performance issue is fixed, then the only aspect that we should > perhaps try to improve is consing. I wouldn't say it's "fixed", just improved. And :match really should be able to be made faster than :pred, since it'll probably be used for similar cases (where a lot/most of nodes match). There seems to be a fair amount of consing going on inside treesit-query-capture already: we wrap every TS node in our objects, we turn the captured nodes into a Lisp alist, and we turn the predicates into a list, turning the strings into "our" strings. The 'make_string' function creates a new copy in the memory, right? One could hope to avoid recreating the list of predicates on every match, but that seems to be a limitation of the TS API: ts_query_predicates_for_pattern requires a second argument, match.pattern_index. Maybe we could memoize that, though? In any case, that seems to explain why adding or avoiding one buffer-substring call per match isn't moving the needle very much. > Consing a string each time you > need to fontify increases the GC pressure, so if there's a good way of > avoiding that without performance degradation, we should take it. Is > it possible to use your :pred technique in a way that doesn't need to > produce strings from buffer text? The only version I managed to get some (very minor) performance improvement is this: (defun ruby-ts--builtin-method-p (node) (goto-char (treesit-node-start node)) (let ((inhibit-changing-match-data t)) (re-search-forward ruby-ts--builtin-methods (treesit-node-end node) t))) The improvement is like 200-300ms, whereas the difference between :match and :pred in this benchmark is several seconds. And if I try to bring it back to 100% correctness, to ensure that the whole of node text is matched, I have to use narrowing (and string-start and string-end anchors in regexp): (defvar ruby-ts--builtin-methods (format "\\`%s\\'" (regexp-opt (append ruby-builtin-methods-no-reqs ruby-builtin-methods-with-reqs))) "Ruby built-in methods.") (defun ruby-ts--builtin-method-p (node) (save-restriction (goto-char (treesit-node-start node)) (narrow-to-region (point) (treesit-node-end node)) (let ((inhibit-changing-match-data t)) (re-search-forward ruby-ts--builtin-methods nil t)))) And with that, the performance is again no better than the current version. If I also add save-excursion, it's worse.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 17:16:02 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 26 12:16:02 2023 Received: from localhost ([127.0.0.1]:36083 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pL5qz-0005mS-ID for submit <at> debbugs.gnu.org; Thu, 26 Jan 2023 12:16:01 -0500 Received: from mail-ej1-f51.google.com ([209.85.218.51]:35499) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <raaahh@HIDDEN>) id 1pL5qx-0005m5-ES for 60953 <at> debbugs.gnu.org; Thu, 26 Jan 2023 12:16:00 -0500 Received: by mail-ej1-f51.google.com with SMTP id rl14so6968480ejb.2 for <60953 <at> debbugs.gnu.org>; Thu, 26 Jan 2023 09:15:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=/8JpnlPbbX1T4t/TLHZ4sc3Y+1T/wLznUvwd/XH3M8c=; b=hF/xBfZybUxCX8gixm2lLrvMo3OlmYqCKjCrraM7NV5Wh0twspPUxi6tYgm7zhK7MB +ctSH1iaSao3Psc2LQuZELKFTpaxEj9cvmend/HXwtTjZwnMdrf0cRCK0iGg3D1GvVhy IvIQX4RY4qb9M/+o7uRADeQkp5tSzZjuFw73Gwnn1sXGiKRdvPx4V4W+BmIOznQzlFdt 6u7Z/okBK90EAFoORJ5SuuRmVGjpP6PaN8LSLN3VSHKv+sbO1hWRi9UMYsMWQ5yblF2Q t3PT99EASHPqnumUkr7TRBv2RQyO3x8PD3KMRm0nLSd07YOmv4gGnnzm9tBwNC+JfhNn gK2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=/8JpnlPbbX1T4t/TLHZ4sc3Y+1T/wLznUvwd/XH3M8c=; b=XEzsd3vsmEqBMnuXkhyAyFas+NtLAK+eEo0sS0ZXvx2cqf+jIumMPBU6nU09G/uJB8 HvhvOuakKmhTO+KBk0lUCS7aR5d738Mn+4Ppc3niyK6p/mvW+bbAtjAbOWmlaUhcaiEX XEr8ybRaOTeQM8KnWqeMTdAjAszc6vOENkTSt1U/KMWzi9qw1ol1yQZP62kkt8wNdX/k G9IHegeS2BWL/salps0b/YBWylji73gkA3THduAknHEDPXaJqxbt7Hv2yVP7HuXrlclk UOAy+7PkBbOErcIt2fa50L8dfqNqtEza9fzfqJTE7KAfPWwfqqye/0LkfbIsZn1dSC7X SNtA== X-Gm-Message-State: AFqh2kpgMHuUZFChzb/7C5K6Zb8KRhsdxXUXMqiRZasQP4fCq/g3u6dF dng19Me0spK5VUCHIxOQZQw= X-Google-Smtp-Source: AMrXdXuklpiKLg0ZuwjkGrOhdM5cvpzdvqTt5IkE7Iys4kDX6CyVbkeJes6U/ys77MgYD7dPlyHqrg== X-Received: by 2002:a17:907:6021:b0:843:a9fe:f115 with SMTP id fs33-20020a170907602100b00843a9fef115mr34561411ejc.32.1674753353920; Thu, 26 Jan 2023 09:15:53 -0800 (PST) Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id kv15-20020a17090778cf00b007bd28b50305sm853625ejc.200.2023.01.26.09.15.52 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 26 Jan 2023 09:15:53 -0800 (PST) Message-ID: <6f318afc-ca71-8b7e-c822-52e6635b5718@HIDDEN> Date: Thu, 26 Jan 2023 19:15:51 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Content-Language: en-US To: Eli Zaretskii <eliz@HIDDEN>, Yuan Fu <casouri@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> <83pmb1emxi.fsf@HIDDEN> From: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <83pmb1emxi.fsf@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.9 (/) X-Debbugs-Envelope-To: 60953 Cc: 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.9 (-) On 26/01/2023 10:10, Eli Zaretskii wrote: >> From: Yuan Fu<casouri@HIDDEN> >> Date: Wed, 25 Jan 2023 23:17:25 -0800 >> Cc: Dmitry Gutov<dgutov@HIDDEN>, >> 60953 <at> debbugs.gnu.org >> >>>> Switching to using :pred with function (like I did in commit >>>> d94dc606a0934) which still uses buffer-substring inside is significantly >>>> faster. >>> If the performance issue is fixed, then the only aspect that we should >>> perhaps try to improve is consing. Consing a string each time you >>> need to fontify increases the GC pressure, so if there's a good way of >>> avoiding that without performance degradation, we should take it. Is >>> it possible to use your :pred technique in a way that doesn't need to >>> produce strings from buffer text? >> Why is :pred more performant though? They just use string-match-p. If anything, the :pred predicates should be more expensive, since they execute lisp functions and conses tree-sitter nodes into lisp objects. > Yes, exactly my thoughts. > > Perhaps Dmitry could present comparison of profiles from perf which > would allow us to understand the reason(s)? I believe I did that in the second message in this thread: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60953#8 To quote the specific profiles, it's 15.30% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 14.92% emacs emacs [.] process_mark_stack 9.75% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 8.90% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child 3.87% emacs libtree-sitter.so.0.0 [.] ts_node_start_point for :pred vs. 23.72% emacs emacs [.] process_mark_stack 12.33% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 7.96% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 7.38% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child 3.37% emacs libtree-sitter.so.0.0 [.] ts_node_start_point for :match. And to continue the quote: Here's a significant jump in GC time which is almost the same as the difference in runtime. And all of it is spent marking? I suppose if the problem is allocation of a large string (many times over), the GC could be spending a lot of time scanning through the memory. Could this be avoided by passing some substitute handle to TS, instead of the full string? E.g. some kind of reference to it in the regexp cache.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 17:12:14 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 26 12:12:14 2023 Received: from localhost ([127.0.0.1]:36073 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pL5nK-0005gL-94 for submit <at> debbugs.gnu.org; Thu, 26 Jan 2023 12:12:14 -0500 Received: from mail-ej1-f54.google.com ([209.85.218.54]:37674) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <raaahh@HIDDEN>) id 1pL5nI-0005g8-VP for 60953 <at> debbugs.gnu.org; Thu, 26 Jan 2023 12:12:13 -0500 Received: by mail-ej1-f54.google.com with SMTP id ud5so6900894ejc.4 for <60953 <at> debbugs.gnu.org>; Thu, 26 Jan 2023 09:12:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=go50RHfZJjwyYHgB6J0z/W4Lq43QyL4Jf2dlZUIrPP0=; b=InAibHfsCrvQfPsVBV/iM2cSZRVLLFyYDYgxl44Q9unbH5GdA91agKf3xxJkE3PEHF LBszO5L5uATU8yMFTjYiK9+N3RXXoNn1ccFvMFoFohyJcygBUHoQ29SgziRmZkvViNwc BrkrO/hO2sKb5kI6xoYcoRjLlAxRQdQwJAMGZwkMtUpnsFYcYk+bEEiLXr5B3NYTswnK Zbg65yISQM9/CmGqIxJF9AbZAYHO7yiFV0+ER+/rnXm2AcLHCGlQB01TuBy+jUQjT3VX Xlz7/iQf8UWF22tJRwGUO1nFdeitJKBnymG3QdjzxvSI/v44GAQibmqHDEunscTSex/J 28Og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=go50RHfZJjwyYHgB6J0z/W4Lq43QyL4Jf2dlZUIrPP0=; b=cq1LkNQJ5Ee35FceEf6C5fF9Abj8Iz3q6L54r2YMYBs7UhBh9IDwx/7fyTgRx/l4bU RZvTXqPPoxVuCvTVAnZ3S6ooW/J71YwWor50+m47xU+t6G/lEcUVY2stBDxc67lQ6Yye cN0HHIHOClqdTIytsrXTd0k9hxnTqdd3eOa2eX13ylXCECrYVctIT3OgKG9ndQWQhRy/ HF+6HXIjjA0CiLjKg4cPsWhd9uZbntx2xZOkQrk8WRBtvsQ0C0H/vvUzvO2BCNMBXtel 0yWfei+8dS8ItzrILXh5qkBLg/IByGYVCXNkgkuYRCYA5IAx3uR/nClhk/R4bjbhnVSg e+fA== X-Gm-Message-State: AFqh2koOnqPOmMMLjbTmgwum2211n3eibPoxRZEDjlRlb8HMeq23KfgO dKFcdbvi8lrlLOHpZ4jhKJs= X-Google-Smtp-Source: AMrXdXvuBslwLyUQXmrpBn1zhkd0MxBqyAD94kZamVAOEjoCIZoQznXSoz2E2gh555EEjF443EpgUQ== X-Received: by 2002:a17:907:8d18:b0:7c0:d6b6:1ee9 with SMTP id tc24-20020a1709078d1800b007c0d6b61ee9mr42763950ejc.11.1674753127032; Thu, 26 Jan 2023 09:12:07 -0800 (PST) Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id z4-20020a170906714400b0087223b8d6efsm872369ejj.16.2023.01.26.09.12.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 26 Jan 2023 09:12:06 -0800 (PST) Message-ID: <62d8ea72-3cf6-7c1a-4fce-53f5ee435215@HIDDEN> Date: Thu, 26 Jan 2023 19:12:05 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Content-Language: en-US To: Yuan Fu <casouri@HIDDEN>, Eli Zaretskii <eliz@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> From: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.9 (/) X-Debbugs-Envelope-To: 60953 Cc: 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.9 (-) On 26/01/2023 09:17, Yuan Fu wrote: > If anything, the :pred predicates should be more expensive, since they execute lisp functions and conses tree-sitter nodes into lisp objects. Doesn't the :match predicate "cons tree-sitter nodes into lisp objects"? IIUC the list of captures is produced inside treesit-query-capture exactly the same way before the predicates are processed -- whether they are :pred, or :match, or a combination. But indeed, I (and most other users) would expect :match to be faster than :pred if the predicate does a regexp check anyway. That's the essence of this bug report.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 08:10:08 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 26 03:10:08 2023 Received: from localhost ([127.0.0.1]:60592 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pKxKi-0006Mi-DM for submit <at> debbugs.gnu.org; Thu, 26 Jan 2023 03:10:08 -0500 Received: from eggs.gnu.org ([209.51.188.92]:47522) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1pKxKg-0006M5-Lo for 60953 <at> debbugs.gnu.org; Thu, 26 Jan 2023 03:10:07 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pKxKb-00037F-ER; Thu, 26 Jan 2023 03:10:01 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=0pFttqP5ydyYtKWicBVAhbzETeqCqmA+Ytx8shNzBEI=; b=XSC1roB0DPpf xG/ZmIOKVB7dKGIhBRNRHBCN1x4u0Vfxo3jeG4DuO4qeUPqC72b2ISNYEL/rduHRbYQqNZlKfJvXP 1T8I1+8c6Ma4nL1Xbjqecds8834pwpN3EIWj4cLRDjiilDwo2s0qP0zPB22djnGa65L/ZP4Zs7z82 n4ubYxH1FnnX7v4bJP/yoMzC72CMN32gpu02VizRibvcbjjoZnYQPnVyQuu4i8Z5+1Te55zlcbiCe kLFJ3EpROAj8TH0oTQ+GKwm8ciLULJmEFf4tTBEHT7hAx+nB5ZFQoe46HsJRJX2nTslQGr3e+Rk4O 99zY49OgkXsreSMkGGiaXA==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pKxKa-00047H-P8; Thu, 26 Jan 2023 03:10:01 -0500 Date: Thu, 26 Jan 2023 10:10:17 +0200 Message-Id: <83pmb1emxi.fsf@HIDDEN> From: Eli Zaretskii <eliz@HIDDEN> To: Yuan Fu <casouri@HIDDEN> In-Reply-To: <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> (message from Yuan Fu on Wed, 25 Jan 2023 23:17:25 -0800) Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 60953 Cc: 60953 <at> debbugs.gnu.org, dgutov@HIDDEN X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) > From: Yuan Fu <casouri@HIDDEN> > Date: Wed, 25 Jan 2023 23:17:25 -0800 > Cc: Dmitry Gutov <dgutov@HIDDEN>, > 60953 <at> debbugs.gnu.org > > >> Switching to using :pred with function (like I did in commit > >> d94dc606a0934) which still uses buffer-substring inside is significantly > >> faster. > > > > If the performance issue is fixed, then the only aspect that we should > > perhaps try to improve is consing. Consing a string each time you > > need to fontify increases the GC pressure, so if there's a good way of > > avoiding that without performance degradation, we should take it. Is > > it possible to use your :pred technique in a way that doesn't need to > > produce strings from buffer text? > > Why is :pred more performant though? They just use string-match-p. If anything, the :pred predicates should be more expensive, since they execute lisp functions and conses tree-sitter nodes into lisp objects. Yes, exactly my thoughts. Perhaps Dmitry could present comparison of profiles from perf which would allow us to understand the reason(s)?
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 07:17:49 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 26 02:17:49 2023 Received: from localhost ([127.0.0.1]:60486 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pKwW4-0004tu-Ks for submit <at> debbugs.gnu.org; Thu, 26 Jan 2023 02:17:48 -0500 Received: from mail-pf1-f181.google.com ([209.85.210.181]:46688) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <casouri@HIDDEN>) id 1pKwW1-0004tb-3k for 60953 <at> debbugs.gnu.org; Thu, 26 Jan 2023 02:17:47 -0500 Received: by mail-pf1-f181.google.com with SMTP id 20so556737pfu.13 for <60953 <at> debbugs.gnu.org>; Wed, 25 Jan 2023 23:17:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=X1LoNddQtKSBIcOu8XkeUEdfM3/ZGbglGZbldh5thaw=; b=XK8bHyS6C12NkFbl5YGcjC9xdFTV4uoA88b4TFj3X1r0RFja3XgJHbI6mEtwaV+wHm VraMMqR+aXTT1qn8A3EgUlc7q0WPJq2HUTk0jGHrxAUgFS5frKTC9y+8aBugqn7vs2Qb kJjrRg5IG9CteOss5tbw127Zfk3S7Xhef+eWHcvINugvuyzrQOr0eScHxbKUcK46uln4 jwcbi2haimVpNMdp0toPiYEF9xiqCYi8V3i52ZFwe75SCC8IB4hRvpjYKYO1wMzGwn41 /o9QhhP4QJ4VcEKfPo5mvkX874gh4n08Pvt7RjsoIkII0OEjmNLscF3+XDHxdnIYeWgf kN9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=X1LoNddQtKSBIcOu8XkeUEdfM3/ZGbglGZbldh5thaw=; b=vHK5rWeMvSSZ/Hi5JJ9Zu8blRO+GyIczTdz/7xmUF9sdt1VnpmgH/vTMPl6amZ7PbD HbIRGGkasS9vbYRBGbmAYmolqk7H/Cct1QYDgNNhWBy39GPdW4hqcQawruS3vnPdhrGg pw1ymkNMO3aUvteBjaVxylpC11VWhNtOTwrriV2iw4mUDaboamaNMhXwmDSrWamXIvsx LKRZey7jhjtPwQYyfesb3jzVt2II7t4Dcr+42BDS8BOA9TCqECJeD0clU0i5xk9DYjgT KOmj9zXbMygO/p978FXQwOF6n0ge20ZhEeJYnTUUawcrDxZu5PH6J3v2CHOawJ7HNgV4 KxWQ== X-Gm-Message-State: AO0yUKXr8IkNqL0Guxc9vKh0+lJfwtK9Fh9ytBzYHcylPaVy0LmR63ts mWlUgPav2v/sXZZwfboXAtI= X-Google-Smtp-Source: AK7set8FoN7ZKjbGnppSO1LJy6zJB49+9hGvhYlMf4uRkIiT+1oLZmxwXt6Mmy9PgEw+R2tG+V44yA== X-Received: by 2002:a62:e110:0:b0:590:32a6:b6d6 with SMTP id q16-20020a62e110000000b0059032a6b6d6mr1178643pfh.32.1674717459294; Wed, 25 Jan 2023 23:17:39 -0800 (PST) Received: from smtpclient.apple (cpe-172-117-161-177.socal.res.rr.com. [172.117.161.177]) by smtp.gmail.com with ESMTPSA id a23-20020aa794b7000000b0058837da69edsm264665pfl.128.2023.01.25.23.17.38 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Jan 2023 23:17:38 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.300.101.1.3\)) Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient From: Yuan Fu <casouri@HIDDEN> In-Reply-To: <838rhpg57n.fsf@HIDDEN> Date: Wed, 25 Jan 2023 23:17:25 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <5026D975-983F-4D18-8690-BE139C92825D@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> <838rhpg57n.fsf@HIDDEN> To: Eli Zaretskii <eliz@HIDDEN> X-Mailer: Apple Mail (2.3731.300.101.1.3) X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 60953 Cc: 60953 <at> debbugs.gnu.org, Dmitry Gutov <dgutov@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) > On Jan 25, 2023, at 10:50 PM, Eli Zaretskii <eliz@HIDDEN> wrote: >=20 >> Date: Thu, 26 Jan 2023 01:21:08 +0200 >> Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org >> From: Dmitry Gutov <dgutov@HIDDEN> >>=20 >> Thank you. Unfortunately, the performance improvement from this patch = is=20 >> still fairly negligible. >=20 > This is quite strange, since all of the approaches basically use the > same primitives under the hood. Perhaps the reason for the slowness > is that the code which computes the text span of a node is slow? > Otherwise, I must be missing something here, since the rest of the > code on the C level is basically the same, give or take some wrappers > that should not change the overall picture. >=20 > Yuan, do you have some insights here? Sadly, no. >=20 >> Switching to using :pred with function (like I did in commit=20 >> d94dc606a0934) which still uses buffer-substring inside is = significantly=20 >> faster. >=20 > If the performance issue is fixed, then the only aspect that we should > perhaps try to improve is consing. Consing a string each time you > need to fontify increases the GC pressure, so if there's a good way of > avoiding that without performance degradation, we should take it. Is > it possible to use your :pred technique in a way that doesn't need to > produce strings from buffer text? Why is :pred more performant though? They just use string-match-p. If = anything, the :pred predicates should be more expensive, since they = execute lisp functions and conses tree-sitter nodes into lisp objects. Yuan=
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 06:49:59 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 26 01:49:59 2023 Received: from localhost ([127.0.0.1]:60447 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pKw58-0004Ai-TA for submit <at> debbugs.gnu.org; Thu, 26 Jan 2023 01:49:59 -0500 Received: from eggs.gnu.org ([209.51.188.92]:58342) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1pKw55-0004AR-JF for 60953 <at> debbugs.gnu.org; Thu, 26 Jan 2023 01:49:58 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pKw50-0003By-2j; Thu, 26 Jan 2023 01:49:50 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=z5bFN+3gETqH79q/fPgWogGuTp+jfDDTMEHztAodpag=; b=ba1XRKRDkBft WK7qxmGHiWScoRGZvSEq7teRE5Ie1OieybQtZSq5/130KM+jXZGCfQg86srVwFbug3LV9jnz9KDUd lcBp9OA3xYzltRhopRBx5LISAYGYszRaf6rvsIrqOEcSyR69EEt0vQ0FumciXtozEUYnpm0NCcBEr UIRZCZO3kTD9p0e4CUHdm141iY35We5s9CujOeop0SzbCq21FB3wTLF5JM30ZMRx4CXpAfG3NP9dD rcRRom0icQpYssdQHuHYmDjZqKokI/WUbjJ4g2bQXwixZ03fZttj6nkupmyCxAq63uylPn44SvJQE 1ft0kvAKI9RN1eDVRZD8qQ==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pKw4z-0004hS-Co; Thu, 26 Jan 2023 01:49:49 -0500 Date: Thu, 26 Jan 2023 08:50:04 +0200 Message-Id: <838rhpg57n.fsf@HIDDEN> From: Eli Zaretskii <eliz@HIDDEN> To: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> (message from Dmitry Gutov on Thu, 26 Jan 2023 01:21:08 +0200) Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) > Date: Thu, 26 Jan 2023 01:21:08 +0200 > Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org > From: Dmitry Gutov <dgutov@HIDDEN> > > Thank you. Unfortunately, the performance improvement from this patch is > still fairly negligible. This is quite strange, since all of the approaches basically use the same primitives under the hood. Perhaps the reason for the slowness is that the code which computes the text span of a node is slow? Otherwise, I must be missing something here, since the rest of the code on the C level is basically the same, give or take some wrappers that should not change the overall picture. Yuan, do you have some insights here? > Switching to using :pred with function (like I did in commit > d94dc606a0934) which still uses buffer-substring inside is significantly > faster. If the performance issue is fixed, then the only aspect that we should perhaps try to improve is consing. Consing a string each time you need to fontify increases the GC pressure, so if there's a good way of avoiding that without performance degradation, we should take it. Is it possible to use your :pred technique in a way that doesn't need to produce strings from buffer text?
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 25 Jan 2023 23:21:19 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 25 18:21:19 2023 Received: from localhost ([127.0.0.1]:60188 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pKp4w-0000oO-TX for submit <at> debbugs.gnu.org; Wed, 25 Jan 2023 18:21:19 -0500 Received: from mail-ed1-f52.google.com ([209.85.208.52]:36440) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <raaahh@HIDDEN>) id 1pKp4u-0000o6-CJ for 60953 <at> debbugs.gnu.org; Wed, 25 Jan 2023 18:21:18 -0500 Received: by mail-ed1-f52.google.com with SMTP id u21so400069edv.3 for <60953 <at> debbugs.gnu.org>; Wed, 25 Jan 2023 15:21:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=pP7gDnrI4m9kj9KUZ4SIHlTchsrVVFxIvqqg+Wn5TJw=; b=oYN92WcECUxyazraxpd0PnPXfAXroOr4VMFPP7rJ/ZO8ZrZ+HffB6REvUjYl8q79G3 S0RfVc/XAx8yC5XBf3WSvUukbSLAiLzQNHWlq1TgRRGXaoBkNWmgHN0ApE2CNJX1av3P GqbG7wJaTccry924U5tnfl1PYMCCli8SDaqPvv4yX/K6ld18674wqsHEFYE0u2q4qxxx MZoEVGbnAjZarLt4ajV4nAt8wUsh9pTcF5nXkJaA/NICQ8gQlngGgvpcS4ER+S93CZgC QhshT2pWdTe0/QAcMUqHY6w6skpKFU0dh/EMNSAHKc5NPM0vrug+7Csm6DLVTUQAwcTy d33Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=pP7gDnrI4m9kj9KUZ4SIHlTchsrVVFxIvqqg+Wn5TJw=; b=ovh8O0cvtRkpJ46U5HfsOSj+gQuKgGQVZQFVY6/umtUBuMnoHCFQ2MHwx6euF0WO3t 5i+8GySVTYCMhQ3twWIqK8v4Ng10UgQ2XN1MiYN/l0FErYrmaEz8QNh2umyPXErv35zm rF6JEMx6UE5BjyzVrFljwlOOF7maYUz/6GZSHgpTz/kTHZmgxGErHcv0+CQKukam3exw V+vr4V/Nj2ntwZUarIt/ZRpgRXoPqFS6KsRZKezUJ7rM/iQQ+A3hQYVb2hf3aYPOwvcJ 1XquEPOEhZ9ZZU7C/QS6Ou4Sx8ww/UjjZVCebyv9+4LfeuTjeDolbmPZ86HxOXgoiFBC nm/A== X-Gm-Message-State: AO0yUKVEd+EIKVL8SqxL5CoXS43esAnE2h2C+PR7d53TIDHrukaGblIx q3j3cPxbGsUH7d/BErHLr1U= X-Google-Smtp-Source: AK7set8NT8p/BbSNwO9eRnU9aR3AZEwd+1219nFucQK/pvW7dqNQCw7RmPWtArHNUTXhei442Hy5xg== X-Received: by 2002:a05:6402:1a5a:b0:4a0:b72a:6552 with SMTP id bf26-20020a0564021a5a00b004a0b72a6552mr1411563edb.19.1674688870524; Wed, 25 Jan 2023 15:21:10 -0800 (PST) Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id k25-20020aa7c059000000b00499b6b50419sm2944297edo.11.2023.01.25.15.21.09 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 25 Jan 2023 15:21:09 -0800 (PST) Message-ID: <01b5d074-fb12-6b1f-cbfb-5e759833b854@HIDDEN> Date: Thu, 26 Jan 2023 01:21:08 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Content-Language: en-US To: Eli Zaretskii <eliz@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> <83wn5ag4nc.fsf@HIDDEN> From: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <83wn5ag4nc.fsf@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.9 (/) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.9 (-) On 25/01/2023 14:49, Eli Zaretskii wrote: >> Cc: 60953 <at> debbugs.gnu.org >> Date: Wed, 25 Jan 2023 05:48:13 +0200 >> From: Dmitry Gutov <dgutov@HIDDEN> >> >>> We can probably match the regexp in-place, just limit the match to the range of the node. >> >> That's what I tried to do in the patch attached to the first message: >> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60953#5 >> >> But the effect on performance was surprisingly hard to notice. It also >> broke the actual highlighting, but that's probably because the regexp >> uses anchors \` and \', which don't really work for fast_looking_at >> calls inside a buffer. > > The condition for a match in that patch is not correct, AFAIU: > > if (val >= 0) > return true; > else > return false; > > It should be "if (val > 0)" instead, since fast_looking_at returns the > number of characters that matched (unlike fast_string_match in the > original code, which returns the _index_ of the match). Thank you. Unfortunately, the performance improvement from this patch is still fairly negligible. Even though I got the highlighting to work -- by removing the \` and \' anchors from ruby-ts--builtin-methods (reducing the precision a little, but that's not important for the benchmark). Switching to using :pred with function (like I did in commit d94dc606a0934) which still uses buffer-substring inside is significantly faster. > Also, fast_string_match is capable of succeeding if the match begins > not at the first character, whereas fast_looking_at does an anchored > match. Do we expect the text to match from its beginning in this > case? If not, I think the replacement didn't do what the original > code does, and you should have used search_buffer or maybe > search_buffer_re instead. I suppose one could use a non-anchored regexp with :match, but that's not the case with the regexp I'm using currently. Anyway, that's only going to be important if we find something that I missed here with this patch. Because otherwise the major bottleneck is somewhere else. If we do end up using it and try to get it to 100% correctness, I suppose a combination of narrow-to-region (so that the \` and \' anchors work) with re-search-forward can do the trick. Although I've tried using that combination inside ruby-ts--builtin-method-p (to avoid the buffer-substring call), and it wasn't much of an improvement in performance either.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 25 Jan 2023 12:49:56 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jan 25 07:49:56 2023 Received: from localhost ([127.0.0.1]:58568 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pKfDv-0006mm-OY for submit <at> debbugs.gnu.org; Wed, 25 Jan 2023 07:49:56 -0500 Received: from eggs.gnu.org ([209.51.188.92]:52420) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1pKfDt-0006mW-1T for 60953 <at> debbugs.gnu.org; Wed, 25 Jan 2023 07:49:54 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pKfDn-0005Hw-EK; Wed, 25 Jan 2023 07:49:47 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=HCBveevOT6IDrey944hhxzPO90v1QN2kvEJJY7mSGRs=; b=APk6a+KLz69C 2LmUjS3JaZe0Bw7vIt84bmWLcuqO5MKAF4RwGkSzu0VAhqvyBUlsaOV7OWrJImvAGs/6UW3/56wzG nUZGR++f7tvj8kPgrkNBavIv2dEswsZiw2Sy2fc8rifjrJdBDEJbYMMt8iuUGDcPR20EKNs/y3/Vz rWJNnfoxwAOJfggC9hKMcwMq2S3GcIaWCjNanTPvw9YtUAiVV6N+TiuWsGcBLXXd9Rdf/JxkRPSN9 1RPO49FEi/B6H1OaejsKDCwo6b3aoaLvyZA0fZgDEXp5ZyWnzPer5lZ8/A+20W871ZiijPoVTX1Zg 2azxciqPaJe0dv4hrdbJ8w==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pKfDm-0003JD-PR; Wed, 25 Jan 2023 07:49:47 -0500 Date: Wed, 25 Jan 2023 14:49:59 +0200 Message-Id: <83wn5ag4nc.fsf@HIDDEN> From: Eli Zaretskii <eliz@HIDDEN> To: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> (message from Dmitry Gutov on Wed, 25 Jan 2023 05:48:13 +0200) Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 60953 Cc: casouri@HIDDEN, 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) > Cc: 60953 <at> debbugs.gnu.org > Date: Wed, 25 Jan 2023 05:48:13 +0200 > From: Dmitry Gutov <dgutov@HIDDEN> > > > We can probably match the regexp in-place, just limit the match to the range of the node. > > That's what I tried to do in the patch attached to the first message: > https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60953#5 > > But the effect on performance was surprisingly hard to notice. It also > broke the actual highlighting, but that's probably because the regexp > uses anchors \` and \', which don't really work for fast_looking_at > calls inside a buffer. The condition for a match in that patch is not correct, AFAIU: if (val >= 0) return true; else return false; It should be "if (val > 0)" instead, since fast_looking_at returns the number of characters that matched (unlike fast_string_match in the original code, which returns the _index_ of the match). Also, fast_string_match is capable of succeeding if the match begins not at the first character, whereas fast_looking_at does an anchored match. Do we expect the text to match from its beginning in this case? If not, I think the replacement didn't do what the original code does, and you should have used search_buffer or maybe search_buffer_re instead.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 25 Jan 2023 03:48:24 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Tue Jan 24 22:48:24 2023 Received: from localhost ([127.0.0.1]:58017 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pKWls-0007bS-DE for submit <at> debbugs.gnu.org; Tue, 24 Jan 2023 22:48:24 -0500 Received: from mail-ed1-f54.google.com ([209.85.208.54]:40544) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <raaahh@HIDDEN>) id 1pKWlq-0007ay-EO for 60953 <at> debbugs.gnu.org; Tue, 24 Jan 2023 22:48:23 -0500 Received: by mail-ed1-f54.google.com with SMTP id k20so3249128edj.7 for <60953 <at> debbugs.gnu.org>; Tue, 24 Jan 2023 19:48:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=pklSNqMOsISod1Yi1O+oVyqipVv4EOh4x4dqzL7thlc=; b=PNyOYV5KgQEm8NF4TjhAZ/0+W7pBjEB5g+SVt09MvlNhIobIKlHzEqYAy3UmVEXXlT SQP/cRcvJ04S4qJqlvcrsyETKLtktSuKiMqBiOmB12e2F+yHoLkkcN6ZPSze/qBx5q13 RolV74rh0K0NT1Cxs6zlfZYRA3GtlJhfDuq1JuHr2zSVVqqkpO2oyzqjehyM9DJg8FKP sQyTn2cfBsNs/eXPpIHyItwQ5RDINk/EPhOgk5pwW1pWFWHFcM/wcJJupGFD4Um8Yjcq oiK8LnFItDHonNahVHhBJWDgRZwDq4I07iX/rYZx2zVoHG8TYZmEh9Z3P4we0pynAHGK YcsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=pklSNqMOsISod1Yi1O+oVyqipVv4EOh4x4dqzL7thlc=; b=7YioWKDEbKg9VfOIAsw74x/6ezrumNN9GemwLXm/TBj/JElI5fNEfY52ajU04W02P1 DlQF4tlks+DGdZjWojyy2AdMqTIR6Yf7+NvoSoyj+Htp/WxwKV0aHo8y3eitEWW7xUzw +yosE2KmcV0zmMvWLL8I4IVhtDv6hBfpOiPGsxPS2AOXmUI3yoWVS/w6+RiE0cZJatFE 5R7iVJRugIbw5r3dB50xDDgTNMqch17sWzlb234/jjkQanNO5ASo8eDH0TjvZtBoWaNw ny11yeXn+IYNx+KWZTy+gUc+uo2PPji3TElzpds/B/HYPmWOSVcgVr5W1ykx0aJrKVWT gDVA== X-Gm-Message-State: AFqh2kqiMhEQczByCymPcywKdwQlTCD+FvFV2UkqBQi2YKiBKUU72TKr DcfvCmAVnX1WocjXyfreReA= X-Google-Smtp-Source: AMrXdXvM+FMgcFjJyDxlq08cO29Z7CuC7exg6UK6i8lpcRmqNwFR7E6SALCCenMmG+mmmQ6qvJK3tA== X-Received: by 2002:aa7:c44d:0:b0:46c:b919:997f with SMTP id n13-20020aa7c44d000000b0046cb919997fmr22998290edr.17.1674618495781; Tue, 24 Jan 2023 19:48:15 -0800 (PST) Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id q28-20020a056402033c00b0048789661fa2sm1761752edw.66.2023.01.24.19.48.14 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 24 Jan 2023 19:48:15 -0800 (PST) Message-ID: <b1dc100e-1ee9-dc89-00b6-25bb7864e90b@HIDDEN> Date: Wed, 25 Jan 2023 05:48:13 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Content-Language: en-US To: Yuan Fu <casouri@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> From: Dmitry Gutov <dgutov@HIDDEN> In-Reply-To: <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Score: -0.9 (/) X-Debbugs-Envelope-To: 60953 Cc: 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.9 (-) On 25/01/2023 05:13, Yuan Fu wrote: > FYI the predicates are not processed by tree-sitter, but by us. For example, the #equal predicate is handled by treesit_predicate_equal. For #match, right now we create a string with Fbuffer_substring and pass it to fast_string_match, so it definitely causes a lot of gc’s, as you observed. Right. > We can probably match the regexp in-place, just limit the match to the range of the node. That's what I tried to do in the patch attached to the first message: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60953#5 But the effect on performance was surprisingly hard to notice. It also broke the actual highlighting, but that's probably because the regexp uses anchors \` and \', which don't really work for fast_looking_at calls inside a buffer. I also experimented with replacing the current buffer-substring+string-match-p scheme with looking-at. No difference in performance. Reducing the size of the regexp, however, made a lot of difference. ruby-ts--builtin-methods is 721 characters long. So my current hypothesis is that the extra GC is caused by copying the regexp string back and forth. Which seems a bit more difficult to avoid. But could be done if we replace that value with some indirection.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 25 Jan 2023 03:13:50 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Tue Jan 24 22:13:50 2023 Received: from localhost ([127.0.0.1]:57981 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pKWEP-0006e0-RD for submit <at> debbugs.gnu.org; Tue, 24 Jan 2023 22:13:50 -0500 Received: from mail-pj1-f44.google.com ([209.85.216.44]:36564) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <casouri@HIDDEN>) id 1pKWEN-0006dk-Gr for 60953 <at> debbugs.gnu.org; Tue, 24 Jan 2023 22:13:48 -0500 Received: by mail-pj1-f44.google.com with SMTP id e10-20020a17090a630a00b0022bedd66e6dso647545pjj.1 for <60953 <at> debbugs.gnu.org>; Tue, 24 Jan 2023 19:13:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ZxnLshmRBdXkTCPmScFG4XWddbad1R3badRfjSwGSX8=; b=g5H2C1scYa2Uo71baaoVqV8jhRuqhNOVWcwBebrdGITKptG6wndHEgrWwOp0FEMYFy pgUZxtTl93aKJzq5zfDx2wwxyOzwXNMEKdRIsUyGoqtWYZnli35Sb+6nyq8w12fQyP/B wZIclQkykUPlBksexhaQMz38szuUMYTpD3Y0W+8TTrNtLxBIzfusrHe6k/F6TEL3jj0s 4QhrukA6TO2HhzgoLLKg6v/krZT7NMCsM29n+YkGD9zfY0pbq7PoqYTSbTOWu613DPKa I16RYhpCogfEdCNDXeqHFE1RH7HptM8FFt28taFSUDUf02924+tWW4E4kwhozebGyVGu 3oog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZxnLshmRBdXkTCPmScFG4XWddbad1R3badRfjSwGSX8=; b=EZ1BZctgaLY8w7pLA91tjtOpNAUyylJ/lTHbX2KXnkqt2MbK3aP4aA42sbzCANSfij 0HgcmJfwOpOzYgPOkjOAArKMH4EW+69cW0JPQxzME6nn9D2VfPmcvYvoOLmUed211cyG dAF3OQ2u/F5EoLxvhQBvFdMsb3V+eWGI50pV6WEsPfb7QwTRZ2CvOt1gR1uRJjI+nSxj PDj8IpWS5VGbghfBopK8NXLBwZn3dbsz+5c0CdWCYfLNwDiY+NIlnrrWAFZ2kuamlkOD THwadr11yz1IhCRtZUuHDyApqBBf1Eo+wSRSCom131xTG/36LmlgKzM4GRSm2J9NSkex 91tA== X-Gm-Message-State: AFqh2kr2kVA8nE3LGeeS42JbT4dOCxDeYJxdXI/6KKCy4SjGtdEFKufu 1GGD0ibDi6eQMNuhcWqj+8Q= X-Google-Smtp-Source: AMrXdXuTH2iBpfccSedj0iTUa5ZxKzvV/dr8neaokw32AZHJrHErZdnBlQbEuDVeJoXA2ObCL7E4hQ== X-Received: by 2002:a05:6a21:788c:b0:b5:f6de:e299 with SMTP id bf12-20020a056a21788c00b000b5f6dee299mr40624567pzc.35.1674616421409; Tue, 24 Jan 2023 19:13:41 -0800 (PST) Received: from smtpclient.apple (cpe-172-117-161-177.socal.res.rr.com. [172.117.161.177]) by smtp.gmail.com with ESMTPSA id x3-20020a63b343000000b004b1b9e23790sm2076551pgt.92.2023.01.24.19.13.40 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 24 Jan 2023 19:13:41 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.300.101.1.3\)) Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient From: Yuan Fu <casouri@HIDDEN> In-Reply-To: <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> Date: Tue, 24 Jan 2023 19:13:29 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> To: Dmitry Gutov <dgutov@HIDDEN> X-Mailer: Apple Mail (2.3731.300.101.1.3) X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 60953 Cc: 60953 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) > On Jan 23, 2023, at 8:04 PM, Dmitry Gutov <dgutov@HIDDEN> wrote: >=20 > Cc-ing Yuan, just in case. >=20 > On 20/01/2023 05:53, Dmitry Gutov wrote: >> In my benchmarking -- using this form in = test/lisp/progmodes/ruby-mode-resources/ruby.rb after enabling = ruby-ts-mode: >> (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) = (let (treesit--font-lock-fast-mode) (font-lock-ensure)))) >> the rule added to its font-lock in commit d66ac5285f7 >> :language language >> :feature 'builtin-functions >> `((((identifier) @font-lock-builtin-face) >> (:match ,ruby-ts--builtin-methods >> @font-lock-builtin-face))) >> ...seems to have made it 50% slower. >> The profile looked like this: >> 9454 84% - font-lock-fontify-region >> 9454 84% - font-lock-default-fontify-region >> 8862 79% - = font-lock-fontify-syntactically-region >> 8702 78% - treesit-font-lock-fontify-region >> 128 1% treesit-fontify-with-override >> 123 1% facep >> 84 0% treesit--children-covering-range-recurse >> 60 0% + ruby-ts--comment-font-lock >> 4 0% + font-lock-unfontify-region >> 568 5% + font-lock-fontify-keywords-region >> 16 0% + font-lock-unfontify-region >> So there's nothing on the Lisp level to look at. >=20 > I've done some perf recordings now. It seems most/all of the = difference comes down to garbage collection. Or more concretely, time = spent inside process_mark_stack. >=20 > Without the added query benchmark reports: >=20 > (10.13723333 49 1.141649534999999) >=20 > And the perf top5 is: >=20 > 17.26% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_current_status > 10.83% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_goto_next_sibling > 10.18% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_goto_first_child > 8.37% emacs emacs [.] = process_mark_stack > 4.63% emacs libtree-sitter.so.0.0 [.] = ts_node_start_point >=20 > With this simple query that colors everything: >=20 > :language language > :feature 'builtin-function > `((((identifier) @font-lock-builtin-face))) >=20 > I get: >=20 > (11.993968995 82 1.9326509270000045) >=20 > Note the jump in runtime that's larger than the jump in GC. >=20 > 17.26% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_current_status > 10.83% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_goto_next_sibling > 10.18% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_goto_first_child > 8.37% emacs emacs [.] = process_mark_stack > 4.63% emacs libtree-sitter.so.0.0 [.] = ts_node_start_point >=20 > The current query looks like this: >=20 > :language language > :feature 'builtin-function > `((((identifier) @font-lock-builtin-face) > (:pred ruby-ts--builtin-method-p @font-lock-builtin-face))) >=20 > Benchmarking: >=20 > (12.493614359 107 2.558609025999999) >=20 > 15.30% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_current_status > 14.92% emacs emacs [.] = process_mark_stack > 9.75% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_goto_next_sibling > 8.90% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_goto_first_child > 3.87% emacs libtree-sitter.so.0.0 [.] = ts_node_start_point >=20 > Here we get the same jump in runtime as in GC. Even though this rule = ends up coloring much fewer (almost none) nodes in the current buffer. I = interpret the results like this: >=20 > - The jump in runtime of the previous query was probably related to = the number of nodes needed to be processed, but not with the resulting = highlighting, even though every identifier in the buffer ends up being = colored. >=20 > - The GC overhead created by the predicates is non-negligible. >=20 > And the original query that I tried: >=20 > :language language > :feature 'builtin-function > `((((identifier) @font-lock-builtin-face) > (:match ,ruby-ts--builtin-methods @font-lock-builtin-face))) >=20 > Benchmarking: >=20 > (16.433451865000002 249 5.908674810000001) >=20 > 23.72% emacs emacs [.] process_mark_stack > 12.33% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_current_status > 7.96% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_goto_next_sibling > 7.38% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_goto_first_child > 3.37% emacs libtree-sitter.so.0.0 [.] = ts_node_start_point >=20 > Here's a significant jump in GC time which is almost the same as the = difference in runtime. And all of it is spent marking? >=20 > I suppose if the problem is allocation of a large string (many times = over), the GC could be spending a lot of time scanning through the = memory. Could this be avoided by passing some substitute handle to TS, = instead of the full string? E.g. some kind of reference to it in the = regexp cache. >=20 FYI the predicates are not processed by tree-sitter, but by us. For = example, the #equal predicate is handled by treesit_predicate_equal. For = #match, right now we create a string with Fbuffer_substring and pass it = to fast_string_match, so it definitely causes a lot of gc=E2=80=99s, as = you observed. We can probably match the regexp in-place, just limit the = match to the range of the node. Yuan
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at 60953) by debbugs.gnu.org; 24 Jan 2023 04:04:18 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 23 23:04:18 2023 Received: from localhost ([127.0.0.1]:55687 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pKAXi-0003vk-6e for submit <at> debbugs.gnu.org; Mon, 23 Jan 2023 23:04:18 -0500 Received: from mail-ed1-f44.google.com ([209.85.208.44]:35390) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <raaahh@HIDDEN>) id 1pKAXg-0003vY-27 for 60953 <at> debbugs.gnu.org; Mon, 23 Jan 2023 23:04:16 -0500 Received: by mail-ed1-f44.google.com with SMTP id y19so16955453edc.2 for <60953 <at> debbugs.gnu.org>; Mon, 23 Jan 2023 20:04:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:references:to:from :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=5K5a7QtZ0imfO2URB0gWWkM0LaW8K3Lncn8FRoQ4V4I=; b=ZVapMInVnPF5uPy5Ud49OutEyb2zc8AFvnOkisrq4jI9tUhsx8vHztxxRbrh3knV2T WxSxwG3ycsaeze4R6p1nbwgLDcVRVgx0oIJntuH5JR8GgkOcdh3eK6q3B99UbcsjdHP5 Ft0twTqxT/B01pnslXdTei9RCFUm2Gcxn5YnqHyb9ieqwe1rYXNSIxW+BlS9nVE/pv6t RO3dn9Hoh/1DEu9K2D6QQ8ZWLesSDVmCmBpVyoDnj9lTX28D4MijZrV5iJf+cYsN/t4U qyIXuuP+UOUZnaI39U+wkega+3FtNzGJn5K5InyHHlA9+9+7etvm7EipUd7njWD27toE 1U+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:references:to:from :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=5K5a7QtZ0imfO2URB0gWWkM0LaW8K3Lncn8FRoQ4V4I=; b=Z0Dapm0skS/TJd6r/jJj1dEr+KUOhAC4n2VuK4yHF1vRQNlbHoLjOHwzNl+Aw+PbCA ECTRLg8xzzaGom/t/StXcYcXMdKM+ld9g6yJSEcUsMXRzeWkVtBjw+gq2gbZnLX+gdZ8 cRT+FKF3pjxGS1uJPd27TzojXahQHsv4foCehotoJaPfYvfRDun7CdS8W8CRdoBWVFsj +N9w69AJ1GeEJG7xbhByqUEnoQaREdB8ZkR4jerJ0Gz+gDLHXpG1kd96MPmYAIsEJbN1 aQFPCej8SQKdcsz2t5jO7DGMrHyRsPdaBNU0MUWrZ8aDGbJCbHXYBetpUNuUsW6GHnWm BdZg== X-Gm-Message-State: AFqh2krORSjt43bS3gxFNV8Qk7xQ+Bo9RjgadnxEHAkWlCH7cySL2A+G LqxE3/fLNaGxl8b/g6SaMSRa4OCIZnM= X-Google-Smtp-Source: AMrXdXvSnHBdXpZQTRB7lDi2NBg4HINbj6ZE1EuMyAY9kzgieyR2JkFO30B6lLhDYc0or0SNEejKAA== X-Received: by 2002:a05:6402:4305:b0:49c:7aa2:55de with SMTP id m5-20020a056402430500b0049c7aa255demr36687914edc.1.1674533049913; Mon, 23 Jan 2023 20:04:09 -0800 (PST) Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id k13-20020a056402048d00b0049dc0123f29sm487955edv.61.2023.01.23.20.04.08 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 23 Jan 2023 20:04:09 -0800 (PST) Message-ID: <04729838-b7d4-8a08-2b71-12536a28aebb@HIDDEN> Date: Tue, 24 Jan 2023 06:04:07 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Content-Language: en-US From: Dmitry Gutov <dgutov@HIDDEN> To: 60953 <at> debbugs.gnu.org, Yuan Fu <casouri@HIDDEN> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> In-Reply-To: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Score: -0.9 (/) X-Debbugs-Envelope-To: 60953 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.9 (-) Cc-ing Yuan, just in case. On 20/01/2023 05:53, Dmitry Gutov wrote: > In my benchmarking -- using this form in > test/lisp/progmodes/ruby-mode-resources/ruby.rb after enabling > ruby-ts-mode: > > (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) > (let (treesit--font-lock-fast-mode) (font-lock-ensure)))) > > the rule added to its font-lock in commit d66ac5285f7 > > :language language > :feature 'builtin-functions > `((((identifier) @font-lock-builtin-face) > (:match ,ruby-ts--builtin-methods > @font-lock-builtin-face))) > > ...seems to have made it 50% slower. > > The profile looked like this: > > 9454 84% - font-lock-fontify-region > 9454 84% - font-lock-default-fontify-region > 8862 79% - font-lock-fontify-syntactically-region > 8702 78% - treesit-font-lock-fontify-region > 128 1% treesit-fontify-with-override > 123 1% facep > 84 0% treesit--children-covering-range-recurse > 60 0% + ruby-ts--comment-font-lock > 4 0% + font-lock-unfontify-region > 568 5% + font-lock-fontify-keywords-region > 16 0% + font-lock-unfontify-region > > So there's nothing on the Lisp level to look at. I've done some perf recordings now. It seems most/all of the difference comes down to garbage collection. Or more concretely, time spent inside process_mark_stack. Without the added query benchmark reports: (10.13723333 49 1.141649534999999) And the perf top5 is: 17.26% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 10.83% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 10.18% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child 8.37% emacs emacs [.] process_mark_stack 4.63% emacs libtree-sitter.so.0.0 [.] ts_node_start_point With this simple query that colors everything: :language language :feature 'builtin-function `((((identifier) @font-lock-builtin-face))) I get: (11.993968995 82 1.9326509270000045) Note the jump in runtime that's larger than the jump in GC. 17.26% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 10.83% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 10.18% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child 8.37% emacs emacs [.] process_mark_stack 4.63% emacs libtree-sitter.so.0.0 [.] ts_node_start_point The current query looks like this: :language language :feature 'builtin-function `((((identifier) @font-lock-builtin-face) (:pred ruby-ts--builtin-method-p @font-lock-builtin-face))) Benchmarking: (12.493614359 107 2.558609025999999) 15.30% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 14.92% emacs emacs [.] process_mark_stack 9.75% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 8.90% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child 3.87% emacs libtree-sitter.so.0.0 [.] ts_node_start_point Here we get the same jump in runtime as in GC. Even though this rule ends up coloring much fewer (almost none) nodes in the current buffer. I interpret the results like this: - The jump in runtime of the previous query was probably related to the number of nodes needed to be processed, but not with the resulting highlighting, even though every identifier in the buffer ends up being colored. - The GC overhead created by the predicates is non-negligible. And the original query that I tried: :language language :feature 'builtin-function `((((identifier) @font-lock-builtin-face) (:match ,ruby-ts--builtin-methods @font-lock-builtin-face))) Benchmarking: (16.433451865000002 249 5.908674810000001) 23.72% emacs emacs [.] process_mark_stack 12.33% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 7.96% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 7.38% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child 3.37% emacs libtree-sitter.so.0.0 [.] ts_node_start_point Here's a significant jump in GC time which is almost the same as the difference in runtime. And all of it is spent marking? I suppose if the problem is allocation of a large string (many times over), the GC could be spending a lot of time scanning through the memory. Could this be avoided by passing some substitute handle to TS, instead of the full string? E.g. some kind of reference to it in the regexp cache.
bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.Received: (at submit) by debbugs.gnu.org; 20 Jan 2023 03:53:21 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 19 22:53:21 2023 Received: from localhost ([127.0.0.1]:45102 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pIiSu-0002RN-SV for submit <at> debbugs.gnu.org; Thu, 19 Jan 2023 22:53:21 -0500 Received: from lists.gnu.org ([209.51.188.17]:59018) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <raaahh@HIDDEN>) id 1pIiSs-0002RF-Kp for submit <at> debbugs.gnu.org; Thu, 19 Jan 2023 22:53:19 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <raaahh@HIDDEN>) id 1pIiSs-0001i6-Cl for bug-gnu-emacs@HIDDEN; Thu, 19 Jan 2023 22:53:18 -0500 Received: from mail-ej1-x62d.google.com ([2a00:1450:4864:20::62d]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <raaahh@HIDDEN>) id 1pIiSq-0005ET-BX for bug-gnu-emacs@HIDDEN; Thu, 19 Jan 2023 22:53:18 -0500 Received: by mail-ej1-x62d.google.com with SMTP id u19so10844727ejm.8 for <bug-gnu-emacs@HIDDEN>; Thu, 19 Jan 2023 19:53:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=subject:from:to:content-language:user-agent:mime-version:date :message-id:sender:from:to:cc:subject:date:message-id:reply-to; bh=8EUcOHn1OmXnlPRul+DiGBvti9B6k9GySpE5DXhQTzk=; b=qXsRHp5f9LA05Cjr3of2oAIGQtTl8plit3JKcLy9xt+VqF7KP8aKsvrXvKninIj70/ i2C+76Oo74Tq+feSg6Byuqx09g/iUDCYThRUJFpQttfJZTcAcK8DpotiWht1k4Gdxa2M f7k6Ys/uNw7ggUTxFhivwsEGxL5xuh1hrJiQUk6wW6q/5VRav7eHD0qg9dZHosq8II6+ 5Bxzl1gygcCAcxFNkXZmI3hn/3OnOpXGla3DnA7cssATW1Ip/U6aOJvLKI7nJM4xRTZT 2tTYK/bSa72/dyGSHujuJg731jX2Zfsp0z+xtrSgwC8O6yJC/pN2oI9LaR+6viYOcRvy yFTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=subject:from:to:content-language:user-agent:mime-version:date :message-id:sender:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=8EUcOHn1OmXnlPRul+DiGBvti9B6k9GySpE5DXhQTzk=; b=JOCGTOCljhCW3ZUzEmvHcJL9EUTq9uuXBCOTs9O9arSZVapfam5+AlVuniux0YTpca ACizkAWPRpnMyjJXGOoRxSQcfNjrQDYO0gS+W38JVG2BvNwTc7J6HQagXqL77SYBGwhj jnRdVWExFMCO8siA8GrWO54Soq5lhbfokR+jV4IDUuMCfkh98/2sLhnkUqJ8jOVFAVMK CTWK0e43u/2akOFCCEm6V05DifFNPm0XJbiKBZlVhwPRYv9ivqSV6Ab/Ld8Jgv2VkEwa Tys5sWs/YNYZhemujExHgf8olY2K/jvaR/qUgrA+FutS1Tftir9KBc0tXg/64XYi4PIS Gm7g== X-Gm-Message-State: AFqh2kpN/ZV5mlZX9lUDSEd7scpTO6W6WcZqUtxoK/NH+2+G4cJ7Q5cO Lt0g+00ljiwkCxIQpBdDOiQspQyR7Kg= X-Google-Smtp-Source: AMrXdXtSdHAVHurP2uS2AYjbF0lpu2fqdi4EVWZDHHaDOaZQ7N9jw84Em4UMLZqbpMbFGulgdWIltQ== X-Received: by 2002:a17:906:1851:b0:86e:4067:b699 with SMTP id w17-20020a170906185100b0086e4067b699mr17804418eje.4.1674186794816; Thu, 19 Jan 2023 19:53:14 -0800 (PST) Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id k11-20020a1709062a4b00b0073022b796a7sm17567994eje.93.2023.01.19.19.53.13 for <bug-gnu-emacs@HIDDEN> (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 19 Jan 2023 19:53:14 -0800 (PST) Content-Type: multipart/mixed; boundary="------------UQx0D8HGepFSU5bBQqKxNJmr" Message-ID: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@HIDDEN> Date: Fri, 20 Jan 2023 05:53:12 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Content-Language: en-US To: bug-gnu-emacs@HIDDEN From: Dmitry Gutov <dgutov@HIDDEN> Subject: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Received-SPF: pass client-ip=2a00:1450:4864:20::62d; envelope-from=raaahh@HIDDEN; helo=mail-ej1-x62d.google.com X-Spam_score_int: -14 X-Spam_score: -1.5 X-Spam_bar: - X-Spam_report: (-1.5 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.1 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.1 (--) This is a multi-part message in MIME format. --------------UQx0D8HGepFSU5bBQqKxNJmr Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit In my benchmarking -- using this form in test/lisp/progmodes/ruby-mode-resources/ruby.rb after enabling ruby-ts-mode: (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (let (treesit--font-lock-fast-mode) (font-lock-ensure)))) the rule added to its font-lock in commit d66ac5285f7 :language language :feature 'builtin-functions `((((identifier) @font-lock-builtin-face) (:match ,ruby-ts--builtin-methods @font-lock-builtin-face))) ...seems to have made it 50% slower. The profile looked like this: 9454 84% - font-lock-fontify-region 9454 84% - font-lock-default-fontify-region 8862 79% - font-lock-fontify-syntactically-region 8702 78% - treesit-font-lock-fontify-region 128 1% treesit-fontify-with-override 123 1% facep 84 0% treesit--children-covering-range-recurse 60 0% + ruby-ts--comment-font-lock 4 0% + font-lock-unfontify-region 568 5% + font-lock-fontify-keywords-region 16 0% + font-lock-unfontify-region So there's nothing on the Lisp level to look at. Looking at the code, apparently we get a cursor and basically iterate through all (identifier) nodes, running our predicate manually. Without trying something more advanced like perf, I took a stab in the dark and tried to reduce string allocation in treesit_predicate_match (it currently ends up delegating to buffer-substring for every node), which seemed inefficient. But while my patch (attached) compiles and doesn't crash, it doesn't actually work (the rule's highlighting is missing), and the performance was unchanged. This message was originally longer, but see commit d94dc606a09: I switched to using :pred -- thus avoiding embedding the 720-char long regexp in the query -- and the performance drop got reduced to like 20%. As a baseline, this simplified query without predicates and colors every identifier in the buffer using the specified face, is still faster (just 10% over the original): :language language :feature 'builtin-function `(((identifier) @font-lock-builtin-face)) The regexp matching itself doesn't seem to be the problem: (benchmark 354100 '(string-match-p ruby-ts--builtin-methods "gsub")) => Elapsed time: 0.141681s -- whereas the difference between the benchmarks is on the order of seconds. I think the marshaling of the long regexp string back and forth could be the culprit. Would be nice to fix that somehow. I also think that trying to reduce the string allocation overhead has potential, but so far all my experiments haven't moved the needle anywhere noticeable. --------------UQx0D8HGepFSU5bBQqKxNJmr Content-Type: text/x-patch; charset=UTF-8; name="treesit_predicate_match.diff" Content-Disposition: attachment; filename="treesit_predicate_match.diff" Content-Transfer-Encoding: base64 ZGlmZiAtLWdpdCBhL3NyYy90cmVlc2l0LmMgYi9zcmMvdHJlZXNpdC5jCmluZGV4IDkxN2Ri NTgyNjc2Li43ZTI5NGEwYTY2ZiAxMDA2NDQKLS0tIGEvc3JjL3RyZWVzaXQuYworKysgYi9z cmMvdHJlZXNpdC5jCkBAIC0yNDY2LDEwICsyNDY2LDI2IEBAIHRyZWVzaXRfcHJlZGljYXRl X21hdGNoIChMaXNwX09iamVjdCBhcmdzLCBzdHJ1Y3QgY2FwdHVyZV9yYW5nZSBjYXB0dXJl cykKIAkgICAgICBidWlsZF9zdHJpbmcgKCJUaGUgc2Vjb25kIGFyZ3VtZW50IHRvIGBtYXRj aCcgc2hvdWxkICIKIAkJICAgICAgICAgICAgImJlIGEgY2FwdHVyZSBuYW1lLCBub3QgYSBz dHJpbmciKSk7CiAKLSAgTGlzcF9PYmplY3QgdGV4dCA9IHRyZWVzaXRfcHJlZGljYXRlX2Nh cHR1cmVfbmFtZV90b190ZXh0IChjYXB0dXJlX25hbWUsCi0JCQkJCQkJICAgICBjYXB0dXJl cyk7CisgIExpc3BfT2JqZWN0IG5vZGUgPSB0cmVlc2l0X3ByZWRpY2F0ZV9jYXB0dXJlX25h bWVfdG9fbm9kZSAoY2FwdHVyZV9uYW1lLCBjYXB0dXJlcyk7CiAKLSAgaWYgKGZhc3Rfc3Ry aW5nX21hdGNoIChyZWdleHAsIHRleHQpID49IDApCisgIHN0cnVjdCBidWZmZXIgKm9sZF9i dWZmZXIgPSBjdXJyZW50X2J1ZmZlcjsKKyAgc3RydWN0IGJ1ZmZlciAqYnVmZmVyID0gWEJV RkZFUiAoWFRTX1BBUlNFUiAoWFRTX05PREUgKG5vZGUpLT5wYXJzZXIpLT5idWZmZXIpOwor ICBzZXRfYnVmZmVyX2ludGVybmFsIChidWZmZXIpOworCisgIFRTTm9kZSB0cmVlc2l0X25v ZGUgPSBYVFNfTk9ERSAobm9kZSktPm5vZGU7CisgIHB0cmRpZmZfdCB2aXNpYmxlX2JlZyA9 IFhUU19QQVJTRVIgKFhUU19OT0RFIChub2RlKS0+cGFyc2VyKS0+dmlzaWJsZV9iZWc7Cisg IHVpbnQzMl90IHN0YXJ0X2J5dGVfb2Zmc2V0ID0gdHNfbm9kZV9zdGFydF9ieXRlICh0cmVl c2l0X25vZGUpOworICB1aW50MzJfdCBlbmRfYnl0ZV9vZmZzZXQgPSB0c19ub2RlX2VuZF9i eXRlICh0cmVlc2l0X25vZGUpOworICBwdHJkaWZmX3Qgc3RhcnRfYnl0ZSA9IHZpc2libGVf YmVnICsgc3RhcnRfYnl0ZV9vZmZzZXQ7CisgIHB0cmRpZmZfdCBlbmRfYnl0ZSA9IHZpc2li bGVfYmVnICsgZW5kX2J5dGVfb2Zmc2V0OworICBwdHJkaWZmX3Qgc3RhcnRfcG9zID0gYnVm X2J5dGVwb3NfdG9fY2hhcnBvcyAoYnVmZmVyLCBzdGFydF9ieXRlKTsKKyAgcHRyZGlmZl90 IGVuZF9wb3MgPSBidWZfYnl0ZXBvc190b19jaGFycG9zIChidWZmZXIsIGVuZF9ieXRlKTsK KworICBwdHJkaWZmX3QgdmFsID0gZmFzdF9sb29raW5nX2F0IChyZWdleHAsIHN0YXJ0X3Bv cywgc3RhcnRfYnl0ZSwgZW5kX3BvcywgZW5kX2J5dGUsIFFuaWwpOworCisgIHNldF9idWZm ZXJfaW50ZXJuYWwgKG9sZF9idWZmZXIpOworCisgIGlmICh2YWwgPj0gMCkKICAgICByZXR1 cm4gdHJ1ZTsKICAgZWxzZQogICAgIHJldHVybiBmYWxzZTsK --------------UQx0D8HGepFSU5bBQqKxNJmr--
Dmitry Gutov <dgutov@HIDDEN>
:bug-gnu-emacs@HIDDEN
.
Full text available.bug-gnu-emacs@HIDDEN
:bug#60953
; Package emacs
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.