Received: (at 62983) by debbugs.gnu.org; 29 Apr 2023 06:55:06 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Sat Apr 29 02:55:06 2023 Received: from localhost ([127.0.0.1]:35057 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pseU6-0001Y9-4K for submit <at> debbugs.gnu.org; Sat, 29 Apr 2023 02:55:06 -0400 Received: from mail-lj1-f174.google.com ([209.85.208.174]:45257) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <meyering@HIDDEN>) id 1pseU3-0001XZ-G7 for 62983 <at> debbugs.gnu.org; Sat, 29 Apr 2023 02:55:04 -0400 Received: by mail-lj1-f174.google.com with SMTP id 38308e7fff4ca-2a8b082d6feso5327931fa.2 for <62983 <at> debbugs.gnu.org>; Fri, 28 Apr 2023 23:55:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682751297; x=1685343297; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=iNbhEgysq8IS3aW395mPDVpd3MDiYb42t3blmdrCy4E=; b=Cjob8Dj91Q1m15td3L4N1z7XfiS2+LZykW9okCK6j3U46uSWlqjRSjleenoMpTQSt7 fmVDlPHGN9c3EfCb3tTBf7APZ2pfB8zjORmqARXniC+1w+uRnbi9hfsFZokKQnsC22VI Ql2ouYooIULfo6tv3X3jWjB5wMpzlaKG7n2mdD0HAHK+7ie9Pa7BiqN/qZh22qG03v8u 2MqhTrto2jLOREtx6DSzk+pXMEHgw340zrFOZQk/K8EBzCk3IvSVWd8wfJcDBZAhRfAa N7+jKrGHUruGIm9bEh5eUFR51NlPYIvFgW3pYsWGl0mgR4aBo2cetw+74qaBX6FTL9oy BJ9A== X-Gm-Message-State: AC+VfDzBEGBzZuFZDLIUvNzB8DcHXR7ZZqk6G7h4DWZmUY+kZBJg6wVh zeBepnhwxq98MBB49+7F3FUGYHY1J8uUVF81kzI= X-Google-Smtp-Source: ACHHUZ5HAqEoOQVSpANvW5Tn8kEwkV1MNGz00xlIOrsZBj+7eDH3ZWv/ZwQEYgE1Z6VC4aKh/SqDYiVKnDNauh0zZUs= X-Received: by 2002:a2e:8501:0:b0:2a9:f8fd:49ff with SMTP id j1-20020a2e8501000000b002a9f8fd49ffmr2175720lji.17.1682751297436; Fri, 28 Apr 2023 23:54:57 -0700 (PDT) MIME-Version: 1.0 References: <mseeglsi46hm3qor5pdj6xkejip7lgyqpvata65cakztcgwgoq@hsrhke2bfjgd> <c82d3567-5dc9-ec84-f656-90e480bd3987@HIDDEN> <zwfll3hke4opx3ueoap3xodaxqf4vqjiy5zsknj4ngouohx63v@nd4npghhit3n> In-Reply-To: <zwfll3hke4opx3ueoap3xodaxqf4vqjiy5zsknj4ngouohx63v@nd4npghhit3n> From: Jim Meyering <jim@HIDDEN> Date: Sat, 29 Apr 2023 08:54:44 +0200 Message-ID: <CA+8g5KEvbw1cdJW+wn8fKf8izcE6oVQ=G2XaCoANzNR6s48=Xg@HIDDEN> Subject: Re: bug#62983: workaround PCRE2 bug affecting at least \D and \W To: =?UTF-8?Q?Carlo_Marcelo_Arenas_Bel=C3=B3n?= <carenas@HIDDEN> Content-Type: multipart/mixed; boundary="000000000000545b5d05fa74110f" X-Spam-Score: 0.2 (/) X-Debbugs-Envelope-To: 62983 Cc: Paul Eggert <eggert@HIDDEN>, 62983 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.8 (/) --000000000000545b5d05fa74110f Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Apr 21, 2023 at 10:22=E2=80=AFPM Carlo Marcelo Arenas Bel=C3=B3n <carenas@HIDDEN> wrote: > On Fri, Apr 21, 2023 at 11:42:50AM -0700, Paul Eggert wrote: > > On 2023-04-20 19:04, Carlo Marcelo Arenas Bel=C3=B3n wrote: > > > All versions of PCRE2 that include PCRE2_MATCH_INVALID_UTF had a bug = on > > > its JIT implementation that results in failure to match for the negat= ive > > > perl classes, and seems to be easier to replicate when the matching > > > character is a multibyte one. > > > > Unfortunately that is a little vague. I expect the issue is not limited= to > > \D and \W, as there are other ways to specify negative Perl classes. > > Correct, it should also affect at least \S, but hadn't been able to trigg= er > it there. > > The bug was that an uninitialized value was being used in the JIT code th= at > supports the PCRE2_MATCH_INVALID_UTF mode. which is why I said "randomly"= in > the commit message. > > If you want to be strict, how about the attached patch instead? > > > And if > > the bug merely seems to be easier to replicate with multibyte character= s, it > > sounds like we may have issues even when matching ASCII characters in a > > UTF-8 locale. > > Which the current workaround addresses, since you need both PCRE2_JIT and > PCRE2_MATCH_INVALID_UTF to trigger it, and the subject encoding is irrele= vant > for the logic to decide if PCRE2_MATCH_INVALID_UTF gets enabled or not. > > > Furthermore, I'm leery of optimizing for PCRE2 10.42 and earlier. We sh= ould > > focus our optimization efforts on future PCRE2 versions, and not worry = about > > optimizing earlier versions where optimizations complicate maintenance = for a > > declining benefit, and are likely to provoke bugs in older versions tha= t as > > time passes will be harder to debug. > > Not sure I understand your concern here, but if it is about disabling JIT > insteed, then the possibility of introducing bugs is even bigger since it > affects all versions of PCRE2 (not only 10.34 or newer). > > > > Alternatively JIT could be disabled instead, but the option selected = has > > > less of an impact on performance. > > > > Disabling JIT sounds better, as correctness trumps performance. Until t= he > > bug is fixed (or at least better-understood so that we have a workaroun= d we > > can trust), how about the attached patch instead? > > The bug has been fixed already, and will be included in the next release. > There might be additional changes as spelled in that discussion, and inde= ed > the change to the proposed solution proactively helps with one of those. > > It is very unlikely, but some systems might include non 0 values on the > tables for characters over 127 and that might trigger a similar problem t= hat > is yet to be fixed. > > Carlo > > [1] https://github.com/PCRE2Project/pcre2/commit/2c08b619dc973beacc474dcb= 67cda8cd366200ce Thanks, Carlo. I've made some small adjustments and tidied up the ChangeLog in the attache= d. Hope to push it by Sunday. There's enough going on via gnulib that I'll likely make yet another snapshot with the very latest. Also, there remain solaris sparc and i386 gnulib test failures: https://buildfarm.opencsw.org/buildbot/builders/ggrep-solaris10-sparc/b= uilds/336 FAIL: test-c-stack.sh FAIL: test-year2038 https://buildfarm.opencsw.org/buildbot/builders/ggrep-solaris10-i386/bu= ilds/334 FAIL: test-year2038 --000000000000545b5d05fa74110f Content-Type: application/octet-stream; name="grep-pcre2.diff" Content-Disposition: attachment; filename="grep-pcre2.diff" Content-Transfer-Encoding: base64 Content-ID: <f_lh1mnrd10> X-Attachment-Id: f_lh1mnrd10 RnJvbSA5Mzk3Yzc0ZmNlODhlZWYxN2RkMDBhN2M3Yjg4OWQwNDk1ZjQ1YjUxIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiA9P1VURi04P3E/Q2FybG89MjBNYXJjZWxvPTIwQXJlbmFzPTIw QmVsPUMzPUIzbj89IDxjYXJlbmFzQGdtYWlsLmNvbT4KRGF0ZTogVGh1LCAyMCBBcHIgMjAyMyAx ODozNzoyMCAtMDcwMApTdWJqZWN0OiBbUEFUQ0hdIHBjcmU6IHdvcmsgYXJvdW5kIGEgUENSRTJf TUFUQ0hfSU5WQUxJRF9VVEYgYnVnCgpQQ1JFMiBoYXMgYSBidWcgd2hlbiB1c2luZyBQQ1JFMl9N QVRDSF9JTlZBTElEX1VURjogaXQgd291bGQKc29tZXRpbWVzIGZhaWwgdG8gbWF0Y2ggcGF0dGVy bnMgdXNpbmcgcGVybCBuZWdhdGl2ZSBjbGFzc2VzCmxpa2UgXFcgYW5kIFxELgoKKiBORVdTIChC dWcgZml4ZXMpOiBNZW50aW9uIGl0LgoqIHNyYy9wY3JlMnNlYXJjaC5jOiByZXN0cmljIGltcGFj dCBvZiB0aGUgYnVnCkRvIG5vdCB1c2UgdGhlIHByb2JsZW1hdGljIGZsYWcgd2l0aCBicm9rZW4g dmVyc2lvbnMgb2YgUENSRTIuCkdlbmVyYXRlIGxvY2FsZSB0YWJsZXMgb25seSBmb3Igc2luZ2xl LWJ5dGUgbG9jYWxlcy4KKiB0ZXN0cy9NYWtlZmlsZS5hbSAoVEVTVFMpOiBBZGQgdGhlIGZpbGUg bmFtZQoqIHRlc3RzL3BjcmUtdXRmOC1idWcyMjQ6IE5ldyBmaWxlLCB0byB0ZXN0IGZvciB0aGlz LgotLS0KIE5FV1MgICAgICAgICAgICAgICAgICAgfCAgNSArKysrKwogc3JjL3BjcmVzZWFyY2gu YyAgICAgICB8IDIyICsrKysrKysrKysrKysrLS0tLS0tLS0KIHRlc3RzL01ha2VmaWxlLmFtICAg ICAgfCAgMSArCiB0ZXN0cy9wY3JlLXV0ZjgtYnVnMjI0IHwgMzEgKysrKysrKysrKysrKysrKysr KysrKysrKysrKysrKwogNCBmaWxlcyBjaGFuZ2VkLCA1MSBpbnNlcnRpb25zKCspLCA4IGRlbGV0 aW9ucygtKQogY3JlYXRlIG1vZGUgMTAwNzU1IHRlc3RzL3BjcmUtdXRmOC1idWcyMjQKCmRpZmYg LS1naXQgYS9ORVdTIGIvTkVXUwppbmRleCBjMTU3NjRjLi45N2E5MTNjIDEwMDY0NAotLS0gYS9O RVdTCisrKyBiL05FV1MKQEAgLTE1LDYgKzE1LDExIEBAIEdOVSBncmVwIE5FV1MgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAtKi0gb3V0bGluZSAtKi0KICAgd2hlbiBydW5uaW5n IG9uIDMyLWJpdCB4ODYgYW5kIEFSTSBob3N0cyB1c2luZyBnbGliYyAyLjM0Ky4KICAgW2J1ZyBp bnRyb2R1Y2VkIGluIGdyZXAgMy45XQoKKyAgZ3JlcCBubyBsb25nZXIgZmFpbHMgdG8gbWF0Y2gg cGF0dGVybnMgdXNpbmcgbmVnYXRlZCBwZXJsCisgIGNsYXNzZXMgbGlrZSBcRCBvciBcVyB3aGVu IGxpbmtlZCB3aXRoIFBDUkUyIDEwLjM0IG9yIG5ld2VyLgorICBbYnVnIGludHJvZHVjZWQgaW4g Z3JlcCAzLjhdCisKKwogKiogQ2hhbmdlcyBpbiBiZWhhdmlvcgoKICAgZ3JlcCAtLXZlcnNpb24g bm93IHByaW50cyBhIGxpbmUgZGVzY3JpYmluZyB0aGUgdmVyc2lvbiBvZiBQQ1JFMiBpdCB1c2Vz LgpkaWZmIC0tZ2l0IGEvc3JjL3BjcmVzZWFyY2guYyBiL3NyYy9wY3Jlc2VhcmNoLmMKaW5kZXgg ZTg2N2Y0OS4uNjhlYzZkZSAxMDA2NDQKLS0tIGEvc3JjL3BjcmVzZWFyY2guYworKysgYi9zcmMv cGNyZXNlYXJjaC5jCkBAIC01OCw2ICs1OCw5IEBAIHN0cnVjdCBwY3JlX2NvbXAKICAgLyogVGFi bGUsIGluZGV4ZWQgYnkgISAoZmxhZyAmIFBDUkUyX05PVEJPTCksIG9mIHdoZXRoZXIgdGhlIGVt cHR5CiAgICAgIHN0cmluZyBtYXRjaGVzIHdoZW4gdGhhdCBmbGFnIGlzIHVzZWQuICAqLwogICBp bnQgZW1wdHlfbWF0Y2hbMl07CisKKyAgLyogRmxhZ3MgKi8KKyAgdW5zaWduZWQgYmluYXJ5X3Nh ZmU6MTsKIH07CgogLyogTWVtb3J5IGFsbG9jYXRpb24gZnVuY3Rpb25zIGZvciBQQ1JFLiAgKi8K QEAgLTEzMCwxNiArMTMzLDExIEBAIGppdF9leGVjIChzdHJ1Y3QgcGNyZV9jb21wICpwYywgY2hh ciBjb25zdCAqc3ViamVjdCwgaWR4X3Qgc2VhcmNoX2J5dGVzLAogICAgIH0KIH0KCi0vKiBSZXR1 cm4gdHJ1ZSBpZiBFIGlzIGFuIGVycm9yIGNvZGUgZm9yIGJhZCBVVEYtOCwgYW5kIGlmIHBjcmUy X21hdGNoCi0gICBjb3VsZCByZXR1cm4gRSBiZWNhdXNlIFBDUkUgbGFja3MgUENSRTJfTUFUQ0hf SU5WQUxJRF9VVEYuICAqLworLyogUmV0dXJuIHRydWUgaWYgRSBpcyBhbiBlcnJvciBjb2RlIGZv ciBiYWQgVVRGLTggKi8KIHN0YXRpYyBib29sCiBiYWRfdXRmOF9mcm9tX3BjcmUyIChpbnQgZSkK IHsKLSNpZmRlZiBQQ1JFMl9NQVRDSF9JTlZBTElEX1VURgotICByZXR1cm4gZmFsc2U7Ci0jZWxz ZQogICByZXR1cm4gUENSRTJfRVJST1JfVVRGOF9FUlIyMSA8PSBlICYmIGUgPD0gUENSRTJfRVJS T1JfVVRGOF9FUlIxOwotI2VuZGlmCiB9CgogLyogQ29tcGlsZSB0aGUgLVAgc3R5bGUgUEFUVEVS TiwgY29udGFpbmluZyBTSVpFIGJ5dGVzIHRoYXQgYXJlCkBAIC0xNTcsNiArMTU1LDcgQEAgUGNv bXBpbGUgKGNoYXIgKnBhdHRlcm4sIGlkeF90IHNpemUsIHJlZ19zeW50YXhfdCBpZ25vcmVkLCBi b29sIGV4YWN0KQogICAgID0gcGNyZTJfZ2VuZXJhbF9jb250ZXh0X2NyZWF0ZSAocHJpdmF0ZV9t YWxsb2MsIHByaXZhdGVfZnJlZSwgTlVMTCk7CiAgIHBjcmUyX2NvbXBpbGVfY29udGV4dCAqY2Nv bnRleHQgPSBwY3JlMl9jb21waWxlX2NvbnRleHRfY3JlYXRlIChnY29udGV4dCk7CgorICBwYy0+ YmluYXJ5X3NhZmUgPSBmYWxzZTsKICAgaWYgKGxvY2FsZWluZm8ubXVsdGlieXRlKQogICAgIHsK ICAgICAgIHVpbnQzMl90IHVuaWNvZGU7CkBAIC0xODEsOCArMTgwLDEzIEBAIFBjb21waWxlIChj aGFyICpwYXR0ZXJuLCBpZHhfdCBzaXplLCByZWdfc3ludGF4X3QgaWdub3JlZCwgYm9vbCBleGFj dCkKICAgICAgIGZsYWdzIHw9IFBDUkUyX05FVkVSX0JBQ0tTTEFTSF9DOwogI2VuZGlmCiAjaWZk ZWYgUENSRTJfTUFUQ0hfSU5WQUxJRF9VVEYKKyAgICAgIC8qIHdvcmthcm91bmQgUENSRTIgYnVn CisgICAgICAgICBodHRwczovL2dpdGh1Yi5jb20vUENSRTJQcm9qZWN0L3BjcmUyL2lzc3Vlcy8y MjQgKi8KKyNpZiAxMCA8IFBDUkUyX01BSk9SIHx8IChQQ1JFMl9NQUpPUiA9PSAxMCAmJiA0MiA8 IFBDUkUyX01JTk9SKQorICAgICAgcGMtPmJpbmFyeV9zYWZlID0gdHJ1ZTsKICAgICAgIC8qIENv bnNpZGVyIGludmFsaWQgVVRGLTggYXMgYSBiYXJyaWVyLCBpbnN0ZWFkIG9mIGVycm9yLiAgKi8K ICAgICAgIGZsYWdzIHw9IFBDUkUyX01BVENIX0lOVkFMSURfVVRGOworI2VuZGlmCiAjZW5kaWYK ICAgICB9CgpAQCAtMjI2LDcgKzIzMCw5IEBAIFBjb21waWxlIChjaGFyICpwYXR0ZXJuLCBpZHhf dCBzaXplLCByZWdfc3ludGF4X3QgaWdub3JlZCwgYm9vbCBleGFjdCkKICAgICAgIHNpemUgPSBy ZV9zaXplOwogICAgIH0KCi0gIHBjcmUyX3NldF9jaGFyYWN0ZXJfdGFibGVzIChjY29udGV4dCwg cGNyZTJfbWFrZXRhYmxlcyAoZ2NvbnRleHQpKTsKKyAgaWYgKCFsb2NhbGVpbmZvLm11bHRpYnl0 ZSkKKyAgICBwY3JlMl9zZXRfY2hhcmFjdGVyX3RhYmxlcyAoY2NvbnRleHQsIHBjcmUyX21ha2V0 YWJsZXMgKGdjb250ZXh0KSk7CisKICAgcGMtPmNyZSA9IHBjcmUyX2NvbXBpbGUgKChQQ1JFMl9T UFRSKSBwYXR0ZXJuLCBzaXplLCBmbGFncywKICAgICAgICAgICAgICAgICAgICAgICAgICAgICZl YywgJmUsIGNjb250ZXh0KTsKICAgaWYgKCFwYy0+Y3JlKQpAQCAtMzEzLDcgKzMxOSw3IEBAIFBl eGVjdXRlICh2b2lkICp2Y3AsIGNoYXIgY29uc3QgKmJ1ZiwgaWR4X3Qgc2l6ZSwgaWR4X3QgKm1h dGNoX3NpemUsCgogICAgICAgICAgIGUgPSBqaXRfZXhlYyAocGMsIHN1YmplY3QsIGxpbmVfZW5k IC0gc3ViamVjdCwKICAgICAgICAgICAgICAgICAgICAgICAgIHNlYXJjaF9vZmZzZXQsIG9wdGlv bnMpOwotICAgICAgICAgIGlmICghYmFkX3V0ZjhfZnJvbV9wY3JlMiAoZSkpCisgICAgICAgICAg aWYgKHBjLT5iaW5hcnlfc2FmZSB8fCAhYmFkX3V0ZjhfZnJvbV9wY3JlMiAoZSkpCiAgICAgICAg ICAgICBicmVhazsKCiAgICAgICAgICAgaWR4X3QgdmFsaWRfYnl0ZXMgPSBwY3JlMl9nZXRfc3Rh cnRjaGFyIChwYy0+ZGF0YSk7CmRpZmYgLS1naXQgYS90ZXN0cy9NYWtlZmlsZS5hbSBiL3Rlc3Rz L01ha2VmaWxlLmFtCmluZGV4IDc3MThmMjQuLjliNDQyMmUgMTAwNjQ0Ci0tLSBhL3Rlc3RzL01h a2VmaWxlLmFtCisrKyBiL3Rlc3RzL01ha2VmaWxlLmFtCkBAIC0xNTUsNiArMTU1LDcgQEAgVEVT VFMgPQkJCQkJCVwKICAgcGNyZS1qaXRzdGFjawkJCQkJXAogICBwY3JlLW8JCQkJCVwKICAgcGNy ZS11dGY4CQkJCQlcCisgIHBjcmUtdXRmOC1idWcyMjQJCQkJXAogICBwY3JlLXV0ZjgtdwkJCQkJ XAogICBwY3JlLXcJCQkJCVwKICAgcGNyZS13eC1iYWNrcmVmCQkJCVwKZGlmZiAtLWdpdCBhL3Rl c3RzL3BjcmUtdXRmOC1idWcyMjQgYi90ZXN0cy9wY3JlLXV0ZjgtYnVnMjI0Cm5ldyBmaWxlIG1v ZGUgMTAwNzU1CmluZGV4IDAwMDAwMDAuLmU3ZTBkY2QKLS0tIC9kZXYvbnVsbAorKysgYi90ZXN0 cy9wY3JlLXV0ZjgtYnVnMjI0CkBAIC0wLDAgKzEsMzEgQEAKKyMhL2Jpbi9zaAorIyBFbnN1cmUg bmVnYXRlZCBwZXJsIGNsYXNzZXMgbWF0Y2ggbXVsdGlieXRlIGNoYXJhY3RlcnMgaW4gVVRGIG1v ZGUKKyMKKyMgQ29weXJpZ2h0IChDKSAyMDIzIEZyZWUgU29mdHdhcmUgRm91bmRhdGlvbiwgSW5j LgorIworIyBDb3B5aW5nIGFuZCBkaXN0cmlidXRpb24gb2YgdGhpcyBmaWxlLCB3aXRoIG9yIHdp dGhvdXQgbW9kaWZpY2F0aW9uLAorIyBhcmUgcGVybWl0dGVkIGluIGFueSBtZWRpdW0gd2l0aG91 dCByb3lhbHR5IHByb3ZpZGVkIHRoZSBjb3B5cmlnaHQKKyMgbm90aWNlIGFuZCB0aGlzIG5vdGlj ZSBhcmUgcHJlc2VydmVkLgorCisuICIke3NyY2Rpcj0ufS9pbml0LnNoIjsgcGF0aF9wcmVwZW5k XyAuLi9zcmMKK3JlcXVpcmVfZW5fdXRmOF9sb2NhbGVfCitMQ19BTEw9ZW5fVVMuVVRGLTgKK2V4 cG9ydCBMQ19BTEwKK3JlcXVpcmVfcGNyZV8KKworZWNobyAuIHwgZ3JlcCAtcVAgJygqVVRGKS4n IDI+L2Rldi9udWxsIFwKKyAgfHwgc2tpcF8gJ1BDUkUgdW5pY29kZSBzdXBwb3J0IGlzIGNvbXBp bGVkIG91dCcKKworZmFpbD0wCisKKyMgJ8OxJyAoVSswMEYxKQorcHJpbnRmICdcMzAyXDIyMVxu JyA+IGluIHx8IGZyYW1ld29ya19mYWlsdXJlXworZ3JlcCAtUCAnXEQnIGluID4gb3V0IHx8IGZh aWw9MQorY29tcGFyZSBpbiBvdXQgfHwgZmFpbD0xCisKKyMg4oCc8J2EnuKAnSAoVSsxRDExRSkK K3ByaW50ZiAnXDM2MFwyMzVcMjA0XDIzNlxuJyA+IGluIHx8IGZyYW1ld29ya19mYWlsdXJlXwor Z3JlcCAtUCAnXFcnIGluID4gb3V0IHx8IGZhaWw9MQorY29tcGFyZSBpbiBvdXQgfHwgZmFpbD0x CisKK0V4aXQgJGZhaWwKLS0gCjIuNDAuMC4zNjMuZzljNjk5MGNjYTIKCg== --000000000000545b5d05fa74110f--
bug-grep@HIDDEN
:bug#62983
; Package grep
.
Full text available.Received: (at 62983) by debbugs.gnu.org; 21 Apr 2023 20:21:07 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Fri Apr 21 16:21:07 2023 Received: from localhost ([127.0.0.1]:41241 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1ppxFi-0003xC-Tc for submit <at> debbugs.gnu.org; Fri, 21 Apr 2023 16:21:07 -0400 Received: from mail-pf1-f170.google.com ([209.85.210.170]:62527) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <carenas@HIDDEN>) id 1ppxFd-0003wG-SC for 62983 <at> debbugs.gnu.org; Fri, 21 Apr 2023 16:21:05 -0400 Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-63d4595d60fso16685142b3a.0 for <62983 <at> debbugs.gnu.org>; Fri, 21 Apr 2023 13:21:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1682108456; x=1684700456; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=Wk43UOhQN17QhkherOf7+1kJXZ/T0Azv/V5Gck1V8yI=; b=EX+/I0JW8OfQXTettP3EdwxwBBZGH4eJBYyHK2Mka8m/FXxhRaMGn1nMf/ADz8110k MbsWwzQSgIAdGebuON3yOZJB/hI9S6h2wf2GMc88X3h6ZQyJ3qGEJYTsJ5T4LCbLw/eT 65jhxYmMq6IWbkU9sq/Ohjqswxru8tpI6ngaSyI8D1Pyy4cBSUxL+ypA1hpq4yfQ60Pp Z7jnM1Vl6+Oxlb9l6ExHQ7EVgSGcRXwtUhYC1hI3Bt1CAOQKFkF14sI/gQE8rtTRJYZz U9j1rESHMreZM2tsCatvnglm77wBXwB/kDjioSd/XEir7gW4T15DwMXvyJm9EaEhcb+e tljQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682108456; x=1684700456; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Wk43UOhQN17QhkherOf7+1kJXZ/T0Azv/V5Gck1V8yI=; b=lJX099+vjuw27hGIeTa5wLva1HgqFl+F5AQZVbiS0YJNz51OMNM1dDG2sAbXUxycWm M1gv+fSfFhkGTm2bqEXO2VQZLIKs4e5cCrA+kbsvbkYEksqG5o3HjXog0ksTuN8iGlmh A9MwwOMTGTjBYyTWEszn/tMg7qgfsrPpo1TYaI7aw47I1POFaQSpwk7T+i1srZWMcdfu kddFU5Gg1M7OJBCdYO5jOpmVFmylgfgV6OA46e+d4NRQ3ORMKXpj5+0srGhjewEN60Kj pRgmrnzroJBlU3J+BN+SD5J6KxTOHbTqKG9gKPU0zyFgpLvyZOVS6gceKSBDy7Re1ZBB 0uNg== X-Gm-Message-State: AAQBX9ePmdOTUN0tmfhuo9TlrJD56ENpL0Rdh5MpunighgQpupt3Unva tCABQxnVvZLHQQNNAdOsBww= X-Google-Smtp-Source: AKy350bUDgVMuqqTOZBY4eZJ03dMT5eXUck6H+eLbaEjZtz5lXt2miiI6VYc+AlXx63JmuUNJFVyUw== X-Received: by 2002:a17:902:ecd0:b0:1a6:8548:e0ac with SMTP id a16-20020a170902ecd000b001a68548e0acmr6336164plh.34.1682108455616; Fri, 21 Apr 2023 13:20:55 -0700 (PDT) Received: from Carlos-MacBook-Pro-2.local (192-184-219-167.fiber.dynamic.sonic.net. [192.184.219.167]) by smtp.gmail.com with ESMTPSA id h11-20020a170902748b00b001a641e4738asm3090742pll.1.2023.04.21.13.20.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Apr 2023 13:20:55 -0700 (PDT) Date: Fri, 21 Apr 2023 13:20:53 -0700 From: Carlo Marcelo Arenas =?utf-8?B?QmVsw7Nu?= <carenas@HIDDEN> To: Paul Eggert <eggert@HIDDEN> Subject: Re: bug#62983: workaround PCRE2 bug affecting at least \D and \W Message-ID: <zwfll3hke4opx3ueoap3xodaxqf4vqjiy5zsknj4ngouohx63v@nd4npghhit3n> References: <mseeglsi46hm3qor5pdj6xkejip7lgyqpvata65cakztcgwgoq@hsrhke2bfjgd> <c82d3567-5dc9-ec84-f656-90e480bd3987@HIDDEN> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="kp62zerfaxgdsxut" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <c82d3567-5dc9-ec84-f656-90e480bd3987@HIDDEN> X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 62983 Cc: 62983 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) --kp62zerfaxgdsxut Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit On Fri, Apr 21, 2023 at 11:42:50AM -0700, Paul Eggert wrote: > On 2023-04-20 19:04, Carlo Marcelo Arenas Belón wrote: > > All versions of PCRE2 that include PCRE2_MATCH_INVALID_UTF had a bug on > > its JIT implementation that results in failure to match for the negative > > perl classes, and seems to be easier to replicate when the matching > > character is a multibyte one. > > Unfortunately that is a little vague. I expect the issue is not limited to > \D and \W, as there are other ways to specify negative Perl classes. Correct, it should also affect at least \S, but hadn't been able to trigger it there. The bug was that an uninitialized value was being used in the JIT code that supports the PCRE2_MATCH_INVALID_UTF mode. which is why I said "randomly" in the commit message. If you want to be strict, how about the attached patch instead? > And if > the bug merely seems to be easier to replicate with multibyte characters, it > sounds like we may have issues even when matching ASCII characters in a > UTF-8 locale. Which the current workaround addresses, since you need both PCRE2_JIT and PCRE2_MATCH_INVALID_UTF to trigger it, and the subject encoding is irrelevant for the logic to decide if PCRE2_MATCH_INVALID_UTF gets enabled or not. > Furthermore, I'm leery of optimizing for PCRE2 10.42 and earlier. We should > focus our optimization efforts on future PCRE2 versions, and not worry about > optimizing earlier versions where optimizations complicate maintenance for a > declining benefit, and are likely to provoke bugs in older versions that as > time passes will be harder to debug. Not sure I understand your concern here, but if it is about disabling JIT insteed, then the possibility of introducing bugs is even bigger since it affects all versions of PCRE2 (not only 10.34 or newer). > > Alternatively JIT could be disabled instead, but the option selected has > > less of an impact on performance. > > Disabling JIT sounds better, as correctness trumps performance. Until the > bug is fixed (or at least better-understood so that we have a workaround we > can trust), how about the attached patch instead? The bug has been fixed already, and will be included in the next release. There might be additional changes as spelled in that discussion, and indeed the change to the proposed solution proactively helps with one of those. It is very unlikely, but some systems might include non 0 values on the tables for characters over 127 and that might trigger a similar problem that is yet to be fixed. Carlo [1] https://github.com/PCRE2Project/pcre2/commit/2c08b619dc973beacc474dcb67cda8cd366200ce --kp62zerfaxgdsxut Content-Type: text/x-patch; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable =46rom 919d4aa016dd979a52b9e5fd3b0ba1d1cf833ac8 Mon Sep 17 00:00:00 2001 =46rom: =3D?UTF-8?q?Carlo=3D20Marcelo=3D20Arenas=3D20Bel=3DC3=3DB3n?=3D <ca= renas@HIDDEN> Date: Thu, 20 Apr 2023 18:37:20 -0700 Subject: [PATCH v2] pcre: workaround bug affecting PCRE2_MATCH_INVALID_UTF PCRE2 has a bug when using PCRE2_MATCH_INVALID_UTF that would randomly fail to match patterns using perl negative classes (like \W or \D). * NEWS: mention this * src/pcre2search.c: restric impact of the but not use the problematic flag in all broken versions of PCRE2 only generate locale tables for non Unicode * tests: add new pcre2-utf-bug224 test with replications for \[W|D] --- NEWS | 5 +++++ src/pcresearch.c | 22 ++++++++++++++-------- tests/Makefile.am | 1 + tests/pcre-utf8-bug224 | 31 +++++++++++++++++++++++++++++++ 4 files changed, 51 insertions(+), 8 deletions(-) create mode 100755 tests/pcre-utf8-bug224 diff --git a/NEWS b/NEWS index f16c576..3552db1 100644 --- a/NEWS +++ b/NEWS @@ -15,6 +15,11 @@ GNU grep NEWS -*- out= line -*- when running on 32-bit x86 and ARM hosts using glibc 2.34+. [bug introduced in grep 3.9] =20 + grep no longer fails to match patterns which relied on negative perl + classes like \D or \W when linked with PCRE2 10.34 or newer. + [bug introduced in grep 3.8] + + ** Changes in behavior =20 grep --version now prints a line describing the version of PCRE2 it uses. diff --git a/src/pcresearch.c b/src/pcresearch.c index e867f49..a64b65b 100644 --- a/src/pcresearch.c +++ b/src/pcresearch.c @@ -58,6 +58,9 @@ struct pcre_comp /* Table, indexed by ! (flag & PCRE2_NOTBOL), of whether the empty string matches when that flag is used. */ int empty_match[2]; + + /* Flags */ + unsigned binary_safe:1; }; =20 /* Memory allocation functions for PCRE. */ @@ -130,16 +133,11 @@ jit_exec (struct pcre_comp *pc, char const *subject, = idx_t search_bytes, } } =20 -/* Return true if E is an error code for bad UTF-8, and if pcre2_match - could return E because PCRE lacks PCRE2_MATCH_INVALID_UTF. */ +/* Return true if E is an error code for bad UTF-8 */ static bool bad_utf8_from_pcre2 (int e) { -#ifdef PCRE2_MATCH_INVALID_UTF - return false; -#else return PCRE2_ERROR_UTF8_ERR21 <=3D e && e <=3D PCRE2_ERROR_UTF8_ERR1; -#endif } =20 /* Compile the -P style PATTERN, containing SIZE bytes that are @@ -157,6 +155,7 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignor= ed, bool exact) =3D pcre2_general_context_create (private_malloc, private_free, NULL); pcre2_compile_context *ccontext =3D pcre2_compile_context_create (gconte= xt); =20 + pc->binary_safe =3D false; if (localeinfo.multibyte) { uint32_t unicode; @@ -181,8 +180,13 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t igno= red, bool exact) flags |=3D PCRE2_NEVER_BACKSLASH_C; #endif #ifdef PCRE2_MATCH_INVALID_UTF + /* workaround PCRE2 bug + https://github.com/PCRE2Project/pcre2/issues/224 */ +#if PCRE2_MAJOR =3D=3D 10 && PCRE2_MINOR > 42 + pc->binary_safe =3D true; /* Consider invalid UTF-8 as a barrier, instead of error. */ flags |=3D PCRE2_MATCH_INVALID_UTF; +#endif #endif } =20 @@ -226,7 +230,9 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignor= ed, bool exact) size =3D re_size; } =20 - pcre2_set_character_tables (ccontext, pcre2_maketables (gcontext)); + if (!localeinfo.multibyte) + pcre2_set_character_tables (ccontext, pcre2_maketables (gcontext)); + pc->cre =3D pcre2_compile ((PCRE2_SPTR) pattern, size, flags, &ec, &e, ccontext); if (!pc->cre) @@ -313,7 +319,7 @@ Pexecute (void *vcp, char const *buf, idx_t size, idx_t= *match_size, =20 e =3D jit_exec (pc, subject, line_end - subject, search_offset, options); - if (!bad_utf8_from_pcre2 (e)) + if (pc->binary_safe || !bad_utf8_from_pcre2 (e)) break; =20 idx_t valid_bytes =3D pcre2_get_startchar (pc->data); diff --git a/tests/Makefile.am b/tests/Makefile.am index 7718f24..9b4422e 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -155,6 +155,7 @@ TESTS =3D \ pcre-jitstack \ pcre-o \ pcre-utf8 \ + pcre-utf8-bug224 \ pcre-utf8-w \ pcre-w \ pcre-wx-backref \ diff --git a/tests/pcre-utf8-bug224 b/tests/pcre-utf8-bug224 new file mode 100755 index 0000000..549cc43 --- /dev/null +++ b/tests/pcre-utf8-bug224 @@ -0,0 +1,31 @@ +#!/bin/sh +# Ensure negative perl classes matches multibyte characters in UTF mode +# +# Copyright (C) 2023 Free Software Foundation, Inc. +# +# Copying and distribution of this file, with or without modification, +# are permitted in any medium without royalty provided the copyright +# notice and this notice are preserved. + +. "${srcdir=3D.}/init.sh"; path_prepend_ ../src +require_en_utf8_locale_ +LC_ALL=3Den_US.UTF-8 +export LC_ALL +require_pcre_ + +echo . | grep -qP '(*UTF).' 2>/dev/null \ + || skip_ 'PCRE unicode support is compiled out' + +fail=3D0 + +# '=C3=B1' (U+00F1) +printf '\302\221\n' > in || framework_failure_ +grep -P '\D' in > out || fail=3D1 +compare in out || fail=3D1 + +# =E2=80=9C=F0=9D=84=9E=E2=80=9D (U+1D11E) +printf '\360\235\204\236\n' > in || framework_failure_ +grep -P '\W' in > out || fail=3D1 +compare in out || fail=3D1 + +Exit $fail --=20 2.39.2 (Apple Git-143) --kp62zerfaxgdsxut--
bug-grep@HIDDEN
:bug#62983
; Package grep
.
Full text available.Received: (at 62983) by debbugs.gnu.org; 21 Apr 2023 18:43:05 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Fri Apr 21 14:43:05 2023 Received: from localhost ([127.0.0.1]:41147 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1ppviq-0000wG-IV for submit <at> debbugs.gnu.org; Fri, 21 Apr 2023 14:43:05 -0400 Received: from mail.cs.ucla.edu ([131.179.128.66]:39362) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eggert@HIDDEN>) id 1ppvij-0000vg-J8 for 62983 <at> debbugs.gnu.org; Fri, 21 Apr 2023 14:43:03 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id 590AB3C097AFA; Fri, 21 Apr 2023 11:42:51 -0700 (PDT) Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id rO4KlsmuiOUB; Fri, 21 Apr 2023 11:42:50 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id CB0A83C097AFD; Fri, 21 Apr 2023 11:42:50 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.cs.ucla.edu CB0A83C097AFD DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu; s=9D0B346E-2AEB-11ED-9476-E14B719DCE6C; t=1682102570; bh=FCZiUNZfMbdhdOaA0RTrGf9wLYmpnkxkERYdXuqlmKI=; h=Message-ID:Date:MIME-Version:To:From; b=np2GLh7NqiBEyEW7N1Vo//fUO5eA4NWslkiVgqiki0czH6S81CcoKShJ7QbDJkQJ9 xIX4BOCHHP5YVAuJn0pkjio4+QBWU88FnBiKI8fj3ZgCxEXg9r3QULtbgUYRUR/V7B X2ZZYT/p837CeMCQXZsBgj8WvfYWytRKLb1IzjgBL+yUOaVwARkKO3r5yQFB4ltuRT xjdZfujCOclK/+ph/6M/vcB6sdjZOd0KJSNMluQSlCTa4LWgJWGm9BnUZp05G63wOX nY3gRyJ2sd1UGeBy4eiK02LC7XV0ieIHKNvXSJ08Utqf9a3092ZEtWjyoMPqOhZ0rO dBRVtcr/NI/OQ== X-Virus-Scanned: amavisd-new at mail.cs.ucla.edu Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id bCSycuizZS6N; Fri, 21 Apr 2023 11:42:50 -0700 (PDT) Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com [172.91.119.151]) by mail.cs.ucla.edu (Postfix) with ESMTPSA id A764E3C097AFA; Fri, 21 Apr 2023 11:42:50 -0700 (PDT) Content-Type: multipart/mixed; boundary="------------LRtUWsM1TxZJ3GWjVmeLEwal" Message-ID: <c82d3567-5dc9-ec84-f656-90e480bd3987@HIDDEN> Date: Fri, 21 Apr 2023 11:42:50 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Content-Language: en-US To: =?UTF-8?Q?Carlo_Marcelo_Arenas_Bel=c3=b3n?= <carenas@HIDDEN> References: <mseeglsi46hm3qor5pdj6xkejip7lgyqpvata65cakztcgwgoq@hsrhke2bfjgd> From: Paul Eggert <eggert@HIDDEN> Organization: UCLA Computer Science Department Subject: Re: bug#62983: workaround PCRE2 bug affecting at least \D and \W In-Reply-To: <mseeglsi46hm3qor5pdj6xkejip7lgyqpvata65cakztcgwgoq@hsrhke2bfjgd> X-Spam-Score: -1.1 (-) X-Debbugs-Envelope-To: 62983 Cc: 62983 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.1 (--) This is a multi-part message in MIME format. --------------LRtUWsM1TxZJ3GWjVmeLEwal Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable On 2023-04-20 19:04, Carlo Marcelo Arenas Bel=C3=B3n wrote: > All versions of PCRE2 that include PCRE2_MATCH_INVALID_UTF had a bug on > its JIT implementation that results in failure to match for the negativ= e > perl classes, and seems to be easier to replicate when the matching > character is a multibyte one. Unfortunately that is a little vague. I expect the issue is not limited=20 to \D and \W, as there are other ways to specify negative Perl classes.=20 And if the bug merely seems to be easier to replicate with multibyte=20 characters, it sounds like we may have issues even when matching ASCII=20 characters in a UTF-8 locale. Furthermore, I'm leery of optimizing for PCRE2 10.42 and earlier. We=20 should focus our optimization efforts on future PCRE2 versions, and not=20 worry about optimizing earlier versions where optimizations complicate=20 maintenance for a declining benefit, and are likely to provoke bugs in=20 older versions that as time passes will be harder to debug. > Alternatively JIT could be disabled instead, but the option selected ha= s > less of an impact on performance. Disabling JIT sounds better, as correctness trumps performance. Until=20 the bug is fixed (or at least better-understood so that we have a=20 workaround we can trust), how about the attached patch instead? --------------LRtUWsM1TxZJ3GWjVmeLEwal Content-Type: text/x-patch; charset=UTF-8; name="0001-grep-use-PCRE2-JIT-only-in-unibyte-locales.patch" Content-Disposition: attachment; filename="0001-grep-use-PCRE2-JIT-only-in-unibyte-locales.patch" Content-Transfer-Encoding: base64 RnJvbSA0ZWM3MWI2M2Y5YWMwYmIyN2I2MGUxYzk4MDJlZGNiYTg2ODA5OWU4IE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBGcmksIDIxIEFwciAyMDIzIDExOjMxOjEyIC0wNzAwClN1YmplY3Q6IFtQQVRD SF0gZ3JlcDogdXNlIFBDUkUyIEpJVCBvbmx5IGluIHVuaWJ5dGUgbG9jYWxlcwoKKiBzcmMv cGNyZXNlYXJjaC5jIChQY29tcGlsZSk6IENhbGwgcGNyZTJfaml0X2NvbXBpbGUgb25seQpp ZiBpbiBhIG11bHRpYnl0ZSBsb2NhbGUsIHRvIHdvcmsgYXJvdW5kIGEgUENSRTIgSklUIGJ1 Zy4KLS0tCiBORVdTICAgICAgICAgICAgIHwgIDQgKysrKwogc3JjL3BjcmVzZWFyY2guYyB8 IDE3ICsrKysrKysrKysrLS0tLS0tCiAyIGZpbGVzIGNoYW5nZWQsIDE1IGluc2VydGlvbnMo KyksIDYgZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvTkVXUyBiL05FV1MKaW5kZXggZjE2 YzU3Ni4uYjliOGNkYSAxMDA2NDQKLS0tIGEvTkVXUworKysgYi9ORVdTCkBAIC0xMSw2ICsx MSwxMCBAQCBHTlUgZ3JlcCBORVdTICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgLSotIG91dGxpbmUgLSotCiAgIFVuaWNvZGUgaW50ZXJwcmV0YXRpb25zLgogICBbYnVn IGludHJvZHVjZWQgaW4gZ3JlcCAzLjEwXQogCisgIFdpdGggLVAsIHBhdHRlcm5zIGxpa2Ug XEQgYW5kIFxXIG5vdyB3b3JrIGFnYWluIGluIGEgVVRGLTggbG9jYWxlLAorICB3aGVuIGxp bmtlZCB0byBQQ1JFMiAxMC4zNCBvciBuZXdlci4KKyAgW2J1ZyBpbnRyb2R1Y2VkIGluIGdy ZXAgMy44XQorCiAgIGdyZXAgbm8gbG9uZ2VyIGZhaWxzIG9uIGZpbGVzIGRhdGVkIGFmdGVy IHRoZSB5ZWFyIDIwMzgsCiAgIHdoZW4gcnVubmluZyBvbiAzMi1iaXQgeDg2IGFuZCBBUk0g aG9zdHMgdXNpbmcgZ2xpYmMgMi4zNCsuCiAgIFtidWcgaW50cm9kdWNlZCBpbiBncmVwIDMu OV0KZGlmZiAtLWdpdCBhL3NyYy9wY3Jlc2VhcmNoLmMgYi9zcmMvcGNyZXNlYXJjaC5jCmlu ZGV4IGU4MmJmODYuLjQwODZiYmMgMTAwNjQ0Ci0tLSBhL3NyYy9wY3Jlc2VhcmNoLmMKKysr IGIvc3JjL3BjcmVzZWFyY2guYwpAQCAtMjQzLDEzICsyNDMsMTggQEAgUGNvbXBpbGUgKGNo YXIgKnBhdHRlcm4sIGlkeF90IHNpemUsIHJlZ19zeW50YXhfdCBpZ25vcmVkLCBib29sIGV4 YWN0KQogICBwYy0+bWNvbnRleHQgPSBOVUxMOwogICBwYy0+ZGF0YSA9IHBjcmUyX21hdGNo X2RhdGFfY3JlYXRlX2Zyb21fcGF0dGVybiAocGMtPmNyZSwgZ2NvbnRleHQpOwogCi0gIC8q IElnbm9yZSBhbnkgZmFpbHVyZSByZXR1cm4gZnJvbSBwY3JlMl9qaXRfY29tcGlsZSwgYXMg dGhhdCBtZXJlbHkKLSAgICAgbWVhbnMgSklUIHdvbid0IGJlIHVzZWQgZHVyaW5nIG1hdGNo aW5nLiAgKi8KLSAgcGNyZTJfaml0X2NvbXBpbGUgKHBjLT5jcmUsIFBDUkUyX0pJVF9DT01Q TEVURSk7CisgIC8qIERvIG5vdCB1c2UgUENSRTIgSklUIGluIG11bHRpYnl0ZSBsb2NhbGVz IDxodHRwczovL2J1Z3MuZ251Lm9yZy82Mjk4Mz4uCisgICAgIEZJWE1FOiB3aGVuIHRoZSBQ Q1JFMiBidWcgaXMgZml4ZWQgb3IgYSByZWxpYWJsZSB3b3JrYXJvdW5kIGZvdW5kLiAgKi8K KyAgaWYgKCFsb2NhbGVpbmZvLm11bHRpYnl0ZSkKKyAgICB7CisgICAgICAvKiBJZ25vcmUg YW55IGZhaWx1cmUgcmV0dXJuIGZyb20gcGNyZTJfaml0X2NvbXBpbGUsIGFzIHRoYXQgbWVy ZWx5CisgICAgICAgICBtZWFucyBKSVQgd29uJ3QgYmUgdXNlZCBkdXJpbmcgbWF0Y2hpbmcu ICAqLworICAgICAgcGNyZTJfaml0X2NvbXBpbGUgKHBjLT5jcmUsIFBDUkUyX0pJVF9DT01Q TEVURSk7CiAKLSAgLyogVGhlIFBDUkUgZG9jdW1lbnRhdGlvbiBzYXlzIHRoYXQgYSAzMiBL aUIgc3RhY2sgaXMgdGhlIGRlZmF1bHQuICAqLwotICBwYy0+aml0X3N0YWNrID0gTlVMTDsK LSAgcGMtPmppdF9zdGFja19zaXplID0gMzIgPDwgMTA7CisgICAgICAvKiBUaGUgUENSRSBk b2N1bWVudGF0aW9uIHNheXMgdGhhdCBhIDMyIEtpQiBzdGFjayBpcyB0aGUgZGVmYXVsdC4g ICovCisgICAgICBwYy0+aml0X3N0YWNrID0gTlVMTDsKKyAgICAgIHBjLT5qaXRfc3RhY2tf c2l6ZSA9IDMyIDw8IDEwOworICAgIH0KIAogICBwYy0+ZW1wdHlfbWF0Y2hbZmFsc2VdID0g cGNyZV9leGVjIChwYywgIiIsIDAsIDAsIFBDUkUyX05PVEJPTCk7CiAgIHBjLT5lbXB0eV9t YXRjaFt0cnVlXSA9IHBjcmVfZXhlYyAocGMsICIiLCAwLCAwLCAwKTsKLS0gCjIuMzkuMgoK --------------LRtUWsM1TxZJ3GWjVmeLEwal--
bug-grep@HIDDEN
:bug#62983
; Package grep
.
Full text available.Received: (at 62983) by debbugs.gnu.org; 21 Apr 2023 02:35:26 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Apr 20 22:35:26 2023 Received: from localhost ([127.0.0.1]:38983 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1ppgcQ-0007X1-1E for submit <at> debbugs.gnu.org; Thu, 20 Apr 2023 22:35:26 -0400 Received: from mail-lj1-f178.google.com ([209.85.208.178]:54566) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <meyering@HIDDEN>) id 1ppgcN-0007Wl-W9 for 62983 <at> debbugs.gnu.org; Thu, 20 Apr 2023 22:35:24 -0400 Received: by mail-lj1-f178.google.com with SMTP id 38308e7fff4ca-2a8db10a5d4so11144561fa.1 for <62983 <at> debbugs.gnu.org>; Thu, 20 Apr 2023 19:35:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682044518; x=1684636518; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=20+5sD91/V7PjGlb8363BnX8KzCSk76PejvcA+rQ+2c=; b=YvBcAKEbmK0fxlNEWRkPnFWB1MovdJY5PPmrM2H1NAgEH4oln0cVjzyl2ke6MjyHvU Zj1jDwvgeZ2rg2uO1WwCl47OpAlyIgRYDEe4Fuc5wf49MZwxhrp7JOocZVNMQF2fZXfO fCat8LifIOhS9h8ZaVEguHbrIvW/D0AlGMREo07wEuPe8Z+kjwtP1sRTOIu/TnRZwH+U t7nZaAIEZmkca7/u4Ib4rX+90FXahG0S5kkJZK8UlIImr5gNeq9hrVItvwQlx6P/oqtO oBJsnrRAlGrIEFzvpJgstHkbO6RYQHw8ljq7o4ZWdCHtsJYHinf0kZCCgYfL1ybDP18Z vXlQ== X-Gm-Message-State: AAQBX9ex03x6RWP8mVGy1/o3yo8vBzdnDcjMSCfYEwRCV9+J4wzUMxGh WydLL3Qk8EDERYCSsBj9+xAeMyn2Qs13poTUNekPbOvT//k= X-Google-Smtp-Source: AKy350awN0ZuXmQ+APErZpYK8anaAmsx9J78LuneGiYPJPYwNf55GJ664eI3k16vgEi5E04kipfNiiNtxeikf8r6uQI= X-Received: by 2002:a05:651c:c2:b0:2a8:ea22:28b1 with SMTP id 2-20020a05651c00c200b002a8ea2228b1mr190996ljr.21.1682044518009; Thu, 20 Apr 2023 19:35:18 -0700 (PDT) MIME-Version: 1.0 References: <mseeglsi46hm3qor5pdj6xkejip7lgyqpvata65cakztcgwgoq@hsrhke2bfjgd> <CA+8g5KF2Sr5RaeQJihvQqgZGVnYUbsfATBfK6FMRed9tyn=9RA@HIDDEN> In-Reply-To: <CA+8g5KF2Sr5RaeQJihvQqgZGVnYUbsfATBfK6FMRed9tyn=9RA@HIDDEN> From: Jim Meyering <jim@HIDDEN> Date: Thu, 20 Apr 2023 19:35:05 -0700 Message-ID: <CA+8g5KHuAJw=p74wtKuEhwb=k9761qatotE4Evgj=aACHz675A@HIDDEN> Subject: Re: bug#62983: workaround PCRE2 bug affecting at least \D and \W To: =?UTF-8?Q?Carlo_Marcelo_Arenas_Bel=C3=B3n?= <carenas@HIDDEN> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.2 (/) X-Debbugs-Envelope-To: 62983 Cc: 62983 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.8 (/) On Thu, Apr 20, 2023 at 7:33=E2=80=AFPM Jim Meyering <jim@HIDDEN> wro= te: > > On Thu, Apr 20, 2023 at 7:05=E2=80=AFPM Carlo Marcelo Arenas Bel=C3=B3n > <carenas@HIDDEN> wrote: > > All versions of PCRE2 that include PCRE2_MATCH_INVALID_UTF had a bug on > > its JIT implementation that results in failure to match for the negativ= e > > perl classes, and seems to be easier to replicate when the matching > > character is a multibyte one. > > > > Disable that flag and use the original fallback instead. > > > > Alternatively JIT could be disabled instead, but the option selected ha= s > > less of an impact on performance. > > Thanks for the patch! Is there any PCRE-upstream discussion about this? > If so, I'd like to reference that from your commit log. Oh! I see it in the test file: https://github.com/PCRE2Project/pcre2/issues/224
bug-grep@HIDDEN
:bug#62983
; Package grep
.
Full text available.Received: (at 62983) by debbugs.gnu.org; 21 Apr 2023 02:33:46 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Apr 20 22:33:46 2023 Received: from localhost ([127.0.0.1]:38973 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1ppgao-0007U6-Fi for submit <at> debbugs.gnu.org; Thu, 20 Apr 2023 22:33:46 -0400 Received: from mail-lj1-f169.google.com ([209.85.208.169]:50269) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <meyering@HIDDEN>) id 1ppgam-0007Ts-Fg for 62983 <at> debbugs.gnu.org; Thu, 20 Apr 2023 22:33:45 -0400 Received: by mail-lj1-f169.google.com with SMTP id 38308e7fff4ca-2a7af0cb2e6so10780931fa.0 for <62983 <at> debbugs.gnu.org>; Thu, 20 Apr 2023 19:33:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682044418; x=1684636418; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wrrEzIw5vKWzp3X2FvL4LMMUnQHV7vrOh4NwQ+RlLfU=; b=kBygo4sJmVjQmR5w5ksRq8uwvd+G06gJ+AmNeTYy2RUbrYvAYngG2fbaxMtrCFb/Lx P2O5L7bzltaTGfuVQyUYw2u+nl8vQuwmhati29ngpnxiC4BvXCNisz9moQc24Y4Fo14W yxEnqUoENbz1sG1ByXCwsBv+yMwcW7IIP7ycRNOLpxeHr3rNm5yxjXSBmynnO3MLQD2B 6DYJFK8lLMtzf8YVBzw4rHn8sXnGRoNQbTsvRcAD31zhfdBr1nFos0jnYfqnKrKH6avl k8sxooL98ZOtFuMwXReOlJiRk7H3LVcOfJcGW7PcWiZgV9oQ2tP8QFG2NaM+XFGo4xXt 3X6g== X-Gm-Message-State: AAQBX9dqFt+q08y0k1hnpzJjJDPMCxlHltEzyCalHmLrEImZ949pZMBP 2SWZ5w1D8rAZHz/yl7QnCWNPuWLaxqPwvBgUHAY= X-Google-Smtp-Source: AKy350Y2m3+goJ+sTvGM5zCD6aiD2v+AHAmxF+vQaiVYUtEDCB+01gz2uwM0yxIeEax1XzsvmjHanfZ9N4iu6Ty3FFk= X-Received: by 2002:a2e:9143:0:b0:2a8:c8c5:c769 with SMTP id q3-20020a2e9143000000b002a8c8c5c769mr237552ljg.36.1682044418218; Thu, 20 Apr 2023 19:33:38 -0700 (PDT) MIME-Version: 1.0 References: <mseeglsi46hm3qor5pdj6xkejip7lgyqpvata65cakztcgwgoq@hsrhke2bfjgd> In-Reply-To: <mseeglsi46hm3qor5pdj6xkejip7lgyqpvata65cakztcgwgoq@hsrhke2bfjgd> From: Jim Meyering <jim@HIDDEN> Date: Thu, 20 Apr 2023 19:33:25 -0700 Message-ID: <CA+8g5KF2Sr5RaeQJihvQqgZGVnYUbsfATBfK6FMRed9tyn=9RA@HIDDEN> Subject: Re: bug#62983: workaround PCRE2 bug affecting at least \D and \W To: =?UTF-8?Q?Carlo_Marcelo_Arenas_Bel=C3=B3n?= <carenas@HIDDEN> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.2 (/) X-Debbugs-Envelope-To: 62983 Cc: 62983 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.8 (/) On Thu, Apr 20, 2023 at 7:05=E2=80=AFPM Carlo Marcelo Arenas Bel=C3=B3n <carenas@HIDDEN> wrote: > All versions of PCRE2 that include PCRE2_MATCH_INVALID_UTF had a bug on > its JIT implementation that results in failure to match for the negative > perl classes, and seems to be easier to replicate when the matching > character is a multibyte one. > > Disable that flag and use the original fallback instead. > > Alternatively JIT could be disabled instead, but the option selected has > less of an impact on performance. Thanks for the patch! Is there any PCRE-upstream discussion about this? If so, I'd like to reference that from your commit log.
bug-grep@HIDDEN
:bug#62983
; Package grep
.
Full text available.Received: (at submit) by debbugs.gnu.org; 21 Apr 2023 02:04:28 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Apr 20 22:04:28 2023 Received: from localhost ([127.0.0.1]:38963 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1ppg8R-0006oV-CY for submit <at> debbugs.gnu.org; Thu, 20 Apr 2023 22:04:27 -0400 Received: from lists.gnu.org ([209.51.188.17]:39918) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <carenas@HIDDEN>) id 1ppg8P-0006oO-Uz for submit <at> debbugs.gnu.org; Thu, 20 Apr 2023 22:04:26 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <carenas@HIDDEN>) id 1ppg8O-0000c8-Uw for bug-grep@HIDDEN; Thu, 20 Apr 2023 22:04:24 -0400 Received: from mail-pl1-x62b.google.com ([2607:f8b0:4864:20::62b]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <carenas@HIDDEN>) id 1ppg8M-0004Gj-VD for bug-grep@HIDDEN; Thu, 20 Apr 2023 22:04:24 -0400 Received: by mail-pl1-x62b.google.com with SMTP id d9443c01a7336-1a814fe0ddeso19107875ad.2 for <bug-grep@HIDDEN>; Thu, 20 Apr 2023 19:04:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1682042660; x=1684634660; h=content-transfer-encoding:content-disposition:mime-version :message-id:subject:to:from:date:from:to:cc:subject:date:message-id :reply-to; bh=TYmXZJr7ra4gvL5q34caCGe9Ko78VviEasW23gF3cOU=; b=e0gU88Rzsav58gCbIhy58wLnYWM29+7DVrkdAfg0PxhfRWwzN50UdjJ9fDellXTikV 7Z3Oxc/CUB1mMr3TABji7Ylrphuud56ULpVpCY6IUSPr4z7KQ2lDh8FYyAa1otaPute9 uzDzg46jCBMw4UNEZqYcgX3EkIF2cEPWNIxjiNyHthQa9YSMptrRPmnNBtx+RQQKC7R4 WmqAmO3kUJv0NcjqE2Lzcb/x5fejIn2XINoQFh2CJGk0QCdbN87yAX/yWPAcuDig63ls RsU7wptbwMuJbv29V3JQclW/kEwoiLizf9FArAZvjVaWWTHxd5yQAZZYvHo00cV5yoQ9 JYsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682042660; x=1684634660; h=content-transfer-encoding:content-disposition:mime-version :message-id:subject:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TYmXZJr7ra4gvL5q34caCGe9Ko78VviEasW23gF3cOU=; b=R6Y/NUml57IdfunVY403SvGu0UpN/D+H57q5gPmNu3FFJ58Go/yF43ay99bLcCSmeq qCr5wfZ3vNn0S8SAgOvhR9vg9mBb0MYnlc/4XicmBWinIWV+DDaf8T8dJF2zwXifBKHZ 58V6M8o7YRMoYilvbPcmGLq0qNgN33BFNS/+iCIniq4CMaNG1bqZHEZ079dAptF7KVgk h+4EYyHINeenVNl+XeLPveLUn/3bgp6WQS3oVcTNrP9ImD7fN234Lue4hNrf4Snm3zED RFGllB6pD+qxv3XEyyU9GHO4Z8EMcj6w/nVuATha7EwIOzS1o80ax1aN7KPxeBznFOK/ or9Q== X-Gm-Message-State: AAQBX9dXFYxBAxDOJd31MKmzzln7lZB2x1LQK5twzup6p9gMbCSe1nQB cGlWeIJ69JLXvpQNc1wPse/vtFlIsk4= X-Google-Smtp-Source: AKy350YfKBDo/K13bJdY9wigjpw6JPkXR+fSx3/MnPKy9TtaSe2nuuMbdeu/dLv0XYNcIE2UwAVVmg== X-Received: by 2002:a17:903:1ce:b0:1a2:a8d0:838e with SMTP id e14-20020a17090301ce00b001a2a8d0838emr3165077plh.61.1682042660034; Thu, 20 Apr 2023 19:04:20 -0700 (PDT) Received: from Carlos-MacBook-Pro-2.local (192-184-219-167.fiber.dynamic.sonic.net. [192.184.219.167]) by smtp.gmail.com with ESMTPSA id c23-20020a170902849700b001a05122b562sm1683684plo.286.2023.04.20.19.04.18 for <bug-grep@HIDDEN> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Apr 2023 19:04:19 -0700 (PDT) Date: Thu, 20 Apr 2023 19:04:18 -0700 From: Carlo Marcelo Arenas =?utf-8?B?QmVsw7Nu?= <carenas@HIDDEN> To: bug-grep@HIDDEN Subject: workaround PCRE2 bug affecting at least \D and \W Message-ID: <mseeglsi46hm3qor5pdj6xkejip7lgyqpvata65cakztcgwgoq@hsrhke2bfjgd> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="h7otyrncashcyzty" Content-Disposition: inline Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::62b; envelope-from=carenas@HIDDEN; helo=mail-pl1-x62b.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.3 (--) --h7otyrncashcyzty Content-Type: text/plain; charset=us-ascii Content-Disposition: inline All versions of PCRE2 that include PCRE2_MATCH_INVALID_UTF had a bug on its JIT implementation that results in failure to match for the negative perl classes, and seems to be easier to replicate when the matching character is a multibyte one. Disable that flag and use the original fallback instead. Alternatively JIT could be disabled instead, but the option selected has less of an impact on performance. Carlo --h7otyrncashcyzty Content-Type: text/plain; charset=utf-8 Content-Disposition: attachment; filename="0001-pcre-workaround-bug-affecting-W-or-D.patch" Content-Transfer-Encoding: 8bit From 9194c8e9f9ca7315c2e8c25a7986d0690fb31d7c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= <carenas@HIDDEN> Date: Thu, 20 Apr 2023 18:37:20 -0700 Subject: [PATCH] pcre: workaround bug affecting \W or \D PCRE2 has a bug when using PCRE2_MATCH_INVALID_UTF that would randomly fail to match patterns using \W or \D. * NEWS: mention this * src/pcre2search.c: not use the problematic flag in all broken versions of PCRE2 * tests: add new pcre2-utf-bug224 test --- NEWS | 5 +++++ src/pcresearch.c | 23 ++++++++++++++--------- tests/Makefile.am | 1 + tests/pcre-utf8-bug224 | 31 +++++++++++++++++++++++++++++++ 4 files changed, 51 insertions(+), 9 deletions(-) create mode 100755 tests/pcre-utf8-bug224 diff --git a/NEWS b/NEWS index f16c576..8e371dc 100644 --- a/NEWS +++ b/NEWS @@ -15,6 +15,11 @@ GNU grep NEWS -*- outline -*- when running on 32-bit x86 and ARM hosts using glibc 2.34+. [bug introduced in grep 3.9] + grep no longer fails to match patterns with \D or \W when linked to + PCRE2 10.34 or newer. + [bug introduced in grep 3.8] + + ** Changes in behavior grep --version now prints a line describing the version of PCRE2 it uses. diff --git a/src/pcresearch.c b/src/pcresearch.c index 1f82932..6ef0d2e 100644 --- a/src/pcresearch.c +++ b/src/pcresearch.c @@ -58,6 +58,9 @@ struct pcre_comp /* Table, indexed by ! (flag & PCRE2_NOTBOL), of whether the empty string matches when that flag is used. */ int empty_match[2]; + + /* Flags */ + unsigned binary_safe:1; }; /* Memory allocation functions for PCRE. */ @@ -130,16 +133,11 @@ jit_exec (struct pcre_comp *pc, char const *subject, idx_t search_bytes, } } -/* Return true if E is an error code for bad UTF-8, and if pcre2_match - could return E because PCRE lacks PCRE2_MATCH_INVALID_UTF. */ +/* Return true if E is an error code for bad UTF-8 */ static bool bad_utf8_from_pcre2 (int e) { -#ifdef PCRE2_MATCH_INVALID_UTF - return false; -#else return PCRE2_ERROR_UTF8_ERR21 <= e && e <= PCRE2_ERROR_UTF8_ERR1; -#endif } /* Compile the -P style PATTERN, containing SIZE bytes that are @@ -157,6 +155,7 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact) = pcre2_general_context_create (private_malloc, private_free, NULL); pcre2_compile_context *ccontext = pcre2_compile_context_create (gcontext); + pc->binary_safe = false; if (localeinfo.multibyte) { uint32_t unicode; @@ -181,8 +180,14 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact) flags |= PCRE2_NEVER_BACKSLASH_C; #endif #ifdef PCRE2_MATCH_INVALID_UTF - /* Consider invalid UTF-8 as a barrier, instead of error. */ - flags |= PCRE2_MATCH_INVALID_UTF; + /* workaround PCRE2 bug + https://github.com/PCRE2Project/pcre2/issues/224 */ +#if PCRE2_MAJOR == 10 && PCRE2_MINOR <= 42 + pc->binary_safe = !strstr (pattern, "\\D") && !strstr (pattern, "\\W"); + if (pc->binary_safe) + /* Consider invalid UTF-8 as a barrier, instead of error. */ + flags |= PCRE2_MATCH_INVALID_UTF; +#endif #endif } @@ -313,7 +318,7 @@ Pexecute (void *vcp, char const *buf, idx_t size, idx_t *match_size, e = jit_exec (pc, subject, line_end - subject, search_offset, options); - if (!bad_utf8_from_pcre2 (e)) + if (pc->binary_safe || !bad_utf8_from_pcre2 (e)) break; idx_t valid_bytes = pcre2_get_startchar (pc->data); diff --git a/tests/Makefile.am b/tests/Makefile.am index 7718f24..9b4422e 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -155,6 +155,7 @@ TESTS = \ pcre-jitstack \ pcre-o \ pcre-utf8 \ + pcre-utf8-bug224 \ pcre-utf8-w \ pcre-w \ pcre-wx-backref \ diff --git a/tests/pcre-utf8-bug224 b/tests/pcre-utf8-bug224 new file mode 100755 index 0000000..739e7b5 --- /dev/null +++ b/tests/pcre-utf8-bug224 @@ -0,0 +1,31 @@ +#!/bin/sh +# Ensure \D and \W matches multibyte characters in UTF mode +# +# Copyright (C) 2023 Free Software Foundation, Inc. +# +# Copying and distribution of this file, with or without modification, +# are permitted in any medium without royalty provided the copyright +# notice and this notice are preserved. + +. "${srcdir=.}/init.sh"; path_prepend_ ../src +require_en_utf8_locale_ +LC_ALL=en_US.UTF-8 +export LC_ALL +require_pcre_ + +echo . | grep -qP '(*UTF).' 2>/dev/null \ + || skip_ 'PCRE unicode support is compiled out' + +fail=0 + +# 'ñ' (U+00F1) +printf '\302\221\n' > in || framework_failure_ +grep -P '\D' in > out || fail=1 +compare in out || fail=1 + +# “𝄞” (U+1D11E) +printf '\360\235\204\236\n' > in || framework_failure_ +grep -P '\W' in > out || fail=1 +compare in out || fail=1 + +Exit $fail -- 2.39.2 (Apple Git-143) --h7otyrncashcyzty--
Carlo Marcelo Arenas Belón <carenas@HIDDEN>
:bug-grep@HIDDEN
.
Full text available.bug-grep@HIDDEN
:bug#62983
; Package grep
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.