Received: (at 24160) by debbugs.gnu.org; 19 Sep 2016 02:32:32 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Sun Sep 18 22:32:32 2016 Received: from localhost ([127.0.0.1]:34674 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1bloNj-0001ST-Sz for submit <at> debbugs.gnu.org; Sun, 18 Sep 2016 22:32:32 -0400 Received: from mailgw01.kcn.ne.jp ([61.86.7.208]:57686) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <noritnk@HIDDEN>) id 1bloNi-0001SD-3a for 24160 <at> debbugs.gnu.org; Sun, 18 Sep 2016 22:32:30 -0400 Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234]) by mailgw01.kcn.ne.jp (Postfix) with ESMTP id 42BDF4A086A for <24160 <at> debbugs.gnu.org>; Mon, 19 Sep 2016 11:32:23 +0900 (JST) X-matriXscan-loop-detect: c9a5cf15e860450258d2e4a3759089e73446cb61 Received: from mail08.kcn.ne.jp ([61.86.6.187]) by mxs02-s with ESMTP; Mon, 19 Sep 2016 11:32:20 +0900 (JST) Received: from [10.120.1.60] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail08.kcn.ne.jp (Postfix) with ESMTPA id 3CCE412B802E for <24160 <at> debbugs.gnu.org>; Mon, 19 Sep 2016 11:32:20 +0900 (JST) Date: Mon, 19 Sep 2016 11:32:20 +0900 From: Norihiro Tanaka <noritnk@HIDDEN> To: 24160 <at> debbugs.gnu.org Subject: Re: bug#24160: [PATCH 1/2] sed: cache results of mbrtowc for speed In-Reply-To: <20160805225116.64FE.27F6AC2D@HIDDEN> References: <20160805225116.64FE.27F6AC2D@HIDDEN> Message-Id: <20160919113219.41D1.27F6AC2D@HIDDEN> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_57DF4CEB0000000041C6_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.65.07 [ja] X-matriXscan-Sophos-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 24160 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.3 (--) --------_57DF4CEB0000000041C6_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit On Fri, 05 Aug 2016 22:51:16 +0900 Norihiro Tanaka <noritnk@HIDDEN> wrote: > Hi, > > We can speeds up sed by caching result of result mbrtowc() for single > byte characters. It is effective especially in non-UTF8 multibyte > locales which is expensive calculatation. > > $ yes $(printf %040d 0) | head -1000000 >k > > Before: > > $ time -p env LC_ALL=ja_JP.eucjp sed/sed -ne /a.b/p k > real 1.93 > user 1.61 > sys 0.27 > > After patching > > $ time -p env LC_ALL=ja_JP.eucjp sed/sed -ne /a.b/p k > real 0.46 > user 0.42 > sys 0.03 > > Thanks, > Norihiro I rewrote the patch as using localeinfo in gnulib. --------_57DF4CEB0000000041C6_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0001-sed-use-cache-provided-by-localeinfo-for-mbrtowc-and.patch" Content-Disposition: attachment; filename="0001-sed-use-cache-provided-by-localeinfo-for-mbrtowc-and.patch" Content-Transfer-Encoding: base64 RnJvbSBjMWE5ZDcwOTM2NzU2ODg3YzdjZGY1NWI1YjMyODI2ZGY3MmI5ZDUyIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBTdW4sIDE4IFNlcCAyMDE2IDE3OjQ2OjU3ICswOTAwClN1YmplY3Q6IFtQQVRDSF0gc2Vk OiB1c2UgY2FjaGUgcHJvdmlkZWQgYnkgbG9jYWxlaW5mbyBmb3IgbWJydG93YyBhbmQgbWJybGVu CgoqIHNlZC9zZWQuaCAoTUJSVE9XQywgTUJSTEVOKTogVXNlIGNhY2hlIHByb3ZpZGVkIGJ5IGxv Y2FsZWluZm8uCihNQlJUT1dDLCBNQlJMRU4pOiBVc2UgdGhlIGNhY2hlLgotLS0KIHNlZC9zZWQu aCB8ICAgIDcgKysrKy0tLQogMSBmaWxlcyBjaGFuZ2VkLCA0IGluc2VydGlvbnMoKyksIDMgZGVs ZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvc2VkL3NlZC5oIGIvc2VkL3NlZC5oCmluZGV4IDA4M2Jh YWUuLjllYmM4MTUgMTAwNjQ0Ci0tLSBhL3NlZC9zZWQuaAorKysgYi9zZWQvc2VkLmgKQEAgLTI0 Nyw4ICsyNDcsOCBAQCBleHRlcm4gYm9vbCBpc191dGY4OwogZXh0ZXJuIGJvb2wgc2FuZGJveDsK IAogI2RlZmluZSBNQlJUT1dDKHB3YywgcywgbiwgcHMpIFwKLSAgKG1iX2N1cl9tYXggPT0gMSA/ IFwKLSAgICgqKHB3YykgPSBidG93YyAoKih1bnNpZ25lZCBjaGFyICopIChzKSksIDEpIDogXAor ICAobG9jYWxlaW5mby5zYmNsZW5bKih1bnNpZ25lZCBjaGFyICopIChzKV0gPT0gMSA/IFwKKyAg ICgqKHB3YykgPSBsb2NhbGVpbmZvLnNiY3Rvd2NbKih1bnNpZ25lZCBjaGFyICopIChzKV0sIDEp IDogXAogICAgbWJydG93YyAoKHB3YyksIChzKSwgKG4pLCAocHMpKSkKIAogI2RlZmluZSBXQ1JU T01CKHMsIHdjLCBwcykgXApAQCAtMjYwLDcgKzI2MCw4IEBAIGV4dGVybiBib29sIHNhbmRib3g7 CiAgIChtYl9jdXJfbWF4ID09IDEgPyAxIDogbWJzaW5pdCAoKHMpKSkKIAogI2RlZmluZSBNQlJM RU4ocywgbiwgcHMpIFwKLSAgKG1iX2N1cl9tYXggPT0gMSA/IDEgOiBtYnJ0b3djIChOVUxMLCBz LCBuLCBwcykpCisgIChsb2NhbGVpbmZvLnNiY2xlblsqKHVuc2lnbmVkIGNoYXIgKikgKHMpXSA9 PSAxID8gXAorICAgMSA6IG1icnRvd2MgKE5VTEwsIHMsIG4sIHBzKSkKIAogI2RlZmluZSBJU19N Ql9DSEFSKGNoLCBwcykgICAgICAgICAgICAgICAgXAogICAobWJfY3VyX21heCA9PSAxID8gMCA6 IGlzX21iX2NoYXIgKGNoLCBwcykpCi0tIAoxLjcuMQoK --------_57DF4CEB0000000041C6_MULTIPART_MIXED_--
bug-sed@HIDDEN
:bug#24160
; Package sed
.
Full text available.Received: (at 24160) by debbugs.gnu.org; 6 Aug 2016 07:13:35 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Sat Aug 06 03:13:35 2016 Received: from localhost ([127.0.0.1]:57088 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1bVvnb-0002WH-1J for submit <at> debbugs.gnu.org; Sat, 06 Aug 2016 03:13:35 -0400 Received: from mailgw01.kcn.ne.jp ([61.86.7.208]:44664) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <noritnk@HIDDEN>) id 1bVvnZ-0002W3-IB for 24160 <at> debbugs.gnu.org; Sat, 06 Aug 2016 03:13:34 -0400 Received: from mxs01-s (mailgw1.kcn.ne.jp [61.86.15.233]) by mailgw01.kcn.ne.jp (Postfix) with ESMTP id 6C01C4A083A for <24160 <at> debbugs.gnu.org>; Sat, 6 Aug 2016 16:13:26 +0900 (JST) X-matriXscan-loop-detect: 6194e6788300ac8807d6daf6022f9f9dd28bff71 Received: from mail09.kcn.ne.jp ([61.86.6.188]) by mxs01-s with ESMTP; Sat, 06 Aug 2016 16:13:25 +0900 (JST) Received: from [10.120.1.17] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail09.kcn.ne.jp (Postfix) with ESMTPA id 560F71BD0097; Sat, 6 Aug 2016 16:13:25 +0900 (JST) Date: Sat, 06 Aug 2016 16:13:27 +0900 From: Norihiro Tanaka <noritnk@HIDDEN> To: Assaf Gordon <assafgordon@HIDDEN> Subject: Re: bug#24160: [PATCH 1/2] sed: cache results of mbrtowc for speed In-Reply-To: <5fb8c9bf-2233-f782-f9a1-9d55ca33f083@HIDDEN> References: <20160805225116.64FE.27F6AC2D@HIDDEN> <5fb8c9bf-2233-f782-f9a1-9d55ca33f083@HIDDEN> Message-Id: <20160806161326.E614.27F6AC2D@HIDDEN> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.65.07 [ja] X-matriXscan-Sophos-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-Spam-Score: -1.2 (-) X-Debbugs-Envelope-To: 24160 Cc: 24160 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.2 (-) On Fri, 5 Aug 2016 10:45:59 -0400 Assaf Gordon <assafgordon@HIDDEN> wrote: > Hello Norihiro, > > Thank you for the patch. > > By using a cache table, isn't this code ignoring mbstate ? > For example, in shift-jis encoding, the character '[' can either be standalone, > or a second character in a sequence such as '\x83\x5b' ? > Wouldn't it also prevent detection of invalid sequences ? > > As a side-note, gnu sed's current implementation has special code path for multibyte-non-utf8 input, > so this change will not likely affect utf8 or C locales. > > regards, > - assaf Hi Assaf, Thanks for review. When MBRTOWC() or MBRLEN() are called in shift-jis, mbstate is always initial state or the equivalent to a state with initial state except invalid sequence and incomplete sequence found, as shift-jis is state-less encoding. Even if their sequences were found, mbstate should be set to initial state manually to check following characters in the string. So I think that we can ignore mbstate in state-less encoding. However, the assumption is wrong for state-full encoding as ISO-2022 and UTF-7. Does sed support state-full encoding which has shift sequence? At least, It seems that regex does not support state-full encoding. Thanks, Norihiro
bug-sed@HIDDEN
:bug#24160
; Package sed
.
Full text available.Received: (at 24160) by debbugs.gnu.org; 5 Aug 2016 14:46:16 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Fri Aug 05 10:46:16 2016 Received: from localhost ([127.0.0.1]:56767 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1bVgO8-000398-KH for submit <at> debbugs.gnu.org; Fri, 05 Aug 2016 10:46:16 -0400 Received: from mail-qk0-f194.google.com ([209.85.220.194]:35288) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <assafgordon@HIDDEN>) id 1bVgO6-00038t-Pe for 24160 <at> debbugs.gnu.org; Fri, 05 Aug 2016 10:46:15 -0400 Received: by mail-qk0-f194.google.com with SMTP id q62so23633259qkf.2 for <24160 <at> debbugs.gnu.org>; Fri, 05 Aug 2016 07:46:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding; bh=nATZfaaGKIs9QRNeHd8s0TLqYAQ5btF5Z7vp6WeRO3U=; b=pp6cpW13g1rWMSGWTSnolBaweH07P+oEef1SgwvNm5GWlsfCQeNyH9AFE+nOKaaSVZ /zCPpOCBVXrdT/GF+xTD6Ue/EcvtcH/9Xh1rYJNor4AyxGQR7oV1L+n4eQwg8zh/o3DJ z2DX3mqPM0wyuRT+0DiyoZ+QXYsqhYf1WVFvGF2BjI+TyN52+KJLyUcfqWtdjtjl05wk i9K6RUzZnZz3g3ZoTDVO+sEF6UBEq2/PSrlDQ2UqmqPS4Fapm1mhso5XujVLzxt7dwF7 4GKEJPSSYoRk45wuMHAKb9Epo8a8c7XTaoShn+H9PDQFPRPaxReGBEZVPsnMwW4QSGVr SeDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=nATZfaaGKIs9QRNeHd8s0TLqYAQ5btF5Z7vp6WeRO3U=; b=lEO0QKjeuCB1zv+viP991/eXae2sMxQ6g1BZHJ8C3MJwt5xLUCsMvjg9YywLsDjpVc 7LAiOH/yhQkwj5oDJVUtv/vDVH2JNwdvdAkFutat+V1oglR4tWlIy2hBW6DtJAGFD/Y0 g8gOvo6xcn3ww7YgdET2c7CNm2A/FrnJ25NCYRV5ntqfWBlwYT8oKVQClkpUdgW4bcX7 JHaKKqWuWEiXlFeIjnHuAzPdcYtrztIEeDkDaVMnpeq73t65zeUOBWyZ4fxKDAyeYYns 7N/5wT1mavEIfPVdLH2eJZgIi6gssLpwagrGn90vmyEDV+md0vMmRuuUYjPluo1eNF7m QzKw== X-Gm-Message-State: AEkoout8ysgeRZ8+MTrJTqFRfVm/zzaEZhMWPdwuRbu2ZyccmPAMt9OgUexWVaErUVSKjA== X-Received: by 10.55.73.145 with SMTP id w139mr13044261qka.114.1470408369316; Fri, 05 Aug 2016 07:46:09 -0700 (PDT) Received: from disco.erlich.nygenome.org ([69.74.14.178]) by smtp.googlemail.com with ESMTPSA id 7sm9778104qkd.25.2016.08.05.07.46.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 05 Aug 2016 07:46:06 -0700 (PDT) Subject: Re: bug#24160: [PATCH 1/2] sed: cache results of mbrtowc for speed To: Norihiro Tanaka <noritnk@HIDDEN>, 24160 <at> debbugs.gnu.org References: <20160805225116.64FE.27F6AC2D@HIDDEN> From: Assaf Gordon <assafgordon@HIDDEN> Message-ID: <5fb8c9bf-2233-f782-f9a1-9d55ca33f083@HIDDEN> Date: Fri, 5 Aug 2016 10:45:59 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160805225116.64FE.27F6AC2D@HIDDEN> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 24160 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.7 (/) Hello Norihiro, Thank you for the patch. On 08/05/2016 09:51 AM, Norihiro Tanaka wrote: > We can speeds up sed by caching result of result mbrtowc() for single > byte characters. It is effective especially in non-UTF8 multibyte > locales which is expensive calculatation. Regarding this: ==== #define MBRTOWC(pwc, s, n, ps) \ - (mb_cur_max == 1 ? \ - (*(pwc) = btowc (*(unsigned char *) (s)), 1) : \ + (mbrlen_cache[*(unsigned char *) (s)] == 1 ? \ + (*(pwc) = mbrtowc_cache[*(unsigned char *) (s)], 1) : \ mbrtowc ((pwc), (s), (n), (ps))) #define MBRLEN(s, n, ps) \ - (mb_cur_max == 1 ? 1 : mbrtowc (NULL, s, n, ps)) + (mbrlen_cache[*(unsigned char *) (s)] == 1 ? \ + 1 : mbrtowc (NULL, s, n, ps)) ==== By using a cache table, isn't this code ignoring mbstate ? For example, in shift-jis encoding, the character '[' can either be standalone, or a second character in a sequence such as '\x83\x5b' ? Wouldn't it also prevent detection of invalid sequences ? As a side-note, gnu sed's current implementation has special code path for multibyte-non-utf8 input, so this change will not likely affect utf8 or C locales. regards, - assaf
bug-sed@HIDDEN
:bug#24160
; Package sed
.
Full text available.Received: (at submit) by debbugs.gnu.org; 5 Aug 2016 13:51:41 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Fri Aug 05 09:51:41 2016 Received: from localhost ([127.0.0.1]:56243 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1bVfXJ-0001kU-I6 for submit <at> debbugs.gnu.org; Fri, 05 Aug 2016 09:51:41 -0400 Received: from eggs.gnu.org ([208.118.235.92]:55877) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <noritnk@HIDDEN>) id 1bVfXI-0001kH-CD for submit <at> debbugs.gnu.org; Fri, 05 Aug 2016 09:51:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <noritnk@HIDDEN>) id 1bVfXC-0005c3-7q for submit <at> debbugs.gnu.org; Fri, 05 Aug 2016 09:51:35 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:41193) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <noritnk@HIDDEN>) id 1bVfXC-0005bi-4P for submit <at> debbugs.gnu.org; Fri, 05 Aug 2016 09:51:34 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46844) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <noritnk@HIDDEN>) id 1bVfX9-0001FN-Hi for bug-sed@HIDDEN; Fri, 05 Aug 2016 09:51:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <noritnk@HIDDEN>) id 1bVfX5-0005Zc-T2 for bug-sed@HIDDEN; Fri, 05 Aug 2016 09:51:31 -0400 Received: from mailgw01.kcn.ne.jp ([61.86.7.208]:44459) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <noritnk@HIDDEN>) id 1bVfX5-0005X2-Cl for bug-sed@HIDDEN; Fri, 05 Aug 2016 09:51:27 -0400 Received: from mxs01-s (mailgw1.kcn.ne.jp [61.86.15.233]) by mailgw01.kcn.ne.jp (Postfix) with ESMTP id CB45A4A0830 for <bug-sed@HIDDEN>; Fri, 5 Aug 2016 22:51:15 +0900 (JST) X-matriXscan-loop-detect: 330c21a9014acb14325ca5e417a99c9413b42fa3 Received: from mail02.kcn.ne.jp ([61.86.6.181]) by mxs01-s with ESMTP; Fri, 05 Aug 2016 22:51:14 +0900 (JST) Received: from [10.120.1.35] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail02.kcn.ne.jp (Postfix) with ESMTPA id 30370F1001F for <bug-sed@HIDDEN>; Fri, 5 Aug 2016 22:51:14 +0900 (JST) Date: Fri, 05 Aug 2016 22:51:16 +0900 From: Norihiro Tanaka <noritnk@HIDDEN> To: <bug-sed@HIDDEN> Subject: [PATCH 1/2] sed: cache results of mbrtowc for speed Message-Id: <20160805225116.64FE.27F6AC2D@HIDDEN> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_57A497D20000000064F2_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.65.07 [ja] X-matriXscan-Sophos-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -4.0 (----) --------_57A497D20000000064F2_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Hi, We can speeds up sed by caching result of result mbrtowc() for single byte characters. It is effective especially in non-UTF8 multibyte locales which is expensive calculatation. $ yes $(printf %040d 0) | head -1000000 >k Before: $ time -p env LC_ALL=ja_JP.eucjp sed/sed -ne /a.b/p k real 1.93 user 1.61 sys 0.27 After patching $ time -p env LC_ALL=ja_JP.eucjp sed/sed -ne /a.b/p k real 0.46 user 0.42 sys 0.03 Thanks, Norihiro --------_57A497D20000000064F2_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0001-sed-cache-results-of-mbrtowc-for-speed.patch" Content-Disposition: attachment; filename="0001-sed-cache-results-of-mbrtowc-for-speed.patch" Content-Transfer-Encoding: base64 RnJvbSBkYzI3NzM5NDQxNTRiMzA1Yzg5M2I3NDU5ODI5YmRlMjFjNWE2MTgyIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBGcmksIDUgQXVnIDIwMTYgMDg6Mjg6MjAgKzA5MDAKU3ViamVjdDogW1BBVENIIDEvMl0g c2VkOiBjYWNoZSByZXN1bHRzIG9mIG1icnRvd2MgZm9yIHNwZWVkCgoqIHNlZC9tYmNzLmMgKG1i cnRvd2NfY2FjaGUsIG1icmxlbl9jYWNoZSk6IE5ldyB2YXJzLgooaW5pdGlhbGl6ZV9tYmNzKTog SW5pdGlhbGl6ZSB0aGUgY2FjaGUuCiogc2VkL3NlZC5oOiBJbmNsdWRlIGxpbWl0cy5oCihNQlJU T1dDLCBNQlJMRU4pOiBVc2UgdGhlIGNhY2hlLgotLS0KIHNlZC9tYmNzLmMgfCAgIDE0ICsrKysr KysrKysrKysrCiBzZWQvc2VkLmggIHwgICAxMSArKysrKysrKy0tLQogMiBmaWxlcyBjaGFuZ2Vk LCAyMiBpbnNlcnRpb25zKCspLCAzIGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL3NlZC9tYmNz LmMgYi9zZWQvbWJjcy5jCmluZGV4IGJjZTM5ZmEuLjgxMDVlY2QgMTAwNjQ0Ci0tLSBhL3NlZC9t YmNzLmMKKysrIGIvc2VkL21iY3MuYwpAQCAtMjQsNiArMjQsOSBAQAogaW50IG1iX2N1cl9tYXg7 CiBib29sIGlzX3V0Zjg7CiAKK3NpemVfdCBtYnJsZW5fY2FjaGVbVUNIQVJfTUFYICsgMV07Cit3 aW50X3QgbWJydG93Y19jYWNoZVtVQ0hBUl9NQVggKyAxXTsKKwogLyogUmV0dXJuIG5vbi16ZXJv IGlmIENIIGlzIHBhcnQgb2YgYSB2YWxpZCBtdWx0aWJ5dGUgc2VxdWVuY2U6CiAgICBFaXRoZXIg aW5jb21wbGV0ZSB5ZXQgdmFsaWQgc2VxdWVuY2UgKGluIGNhc2Ugb2YgYSBsZWFkaW5nIGJ5dGUp LAogICAgb3IgdGhlIGxhc3QgYnl0ZSBvZiBhIHZhbGlkIG11bHRpYnl0ZSBzZXF1ZW5jZS4KQEAg LTczLDQgKzc2LDE1IEBAIGluaXRpYWxpemVfbWJjcyAodm9pZCkKICAgaXNfdXRmOCA9IChzdHJj bXAgKGNvZGVzZXRfbmFtZSwgIlVURi04IikgPT0gMCk7CiAKICAgbWJfY3VyX21heCA9IE1CX0NV Ul9NQVg7CisKKyAgZm9yIChpbnQgaSA9IENIQVJfTUlOOyBpIDw9IENIQVJfTUFYOyArK2kpCisg ICAgeworICAgICAgY2hhciBjID0gaTsKKyAgICAgIHVuc2lnbmVkIGNoYXIgdWMgPSBpOworICAg ICAgbWJzdGF0ZV90IG1icyA9IHsgMCB9OworICAgICAgd2NoYXJfdCB3YzsKKyAgICAgIHNpemVf dCBsZW4gPSBtYnJ0b3djICgmd2MsICZjLCAxLCAmbWJzKTsKKyAgICAgIG1icmxlbl9jYWNoZVt1 Y10gPSBsZW4gPyBsZW4gOiAxOworICAgICAgbWJydG93Y19jYWNoZVt1Y10gPSBsZW4gPT0gMSA/ IHdjIDogV0VPRjsKKyAgICB9CiB9CmRpZmYgLS1naXQgYS9zZWQvc2VkLmggYi9zZWQvc2VkLmgK aW5kZXggYmJkZGQyNS4uMzcxNmJjYiAxMDA2NDQKLS0tIGEvc2VkL3NlZC5oCisrKyBiL3NlZC9z ZWQuaApAQCAtMTksNiArMTksNyBAQAogI2luY2x1ZGUgImJhc2ljZGVmcy5oIgogI2luY2x1ZGUg InJlZ2V4LmgiCiAjaW5jbHVkZSA8c3RkaW8uaD4KKyNpbmNsdWRlIDxsaW1pdHMuaD4KICNpbmNs dWRlICJ1bmxvY2tlZC1pby5oIgogCiAjaW5jbHVkZSAidXRpbHMuaCIKQEAgLTIzOCw5ICsyMzks MTIgQEAgZXh0ZXJuIGJvb2wgdXNlX2V4dGVuZGVkX3N5bnRheF9wOwogZXh0ZXJuIGludCBtYl9j dXJfbWF4OwogZXh0ZXJuIGJvb2wgaXNfdXRmODsKIAorZXh0ZXJuIHNpemVfdCBtYnJsZW5fY2Fj aGVbVUNIQVJfTUFYICsgMV07CitleHRlcm4gd2ludF90IG1icnRvd2NfY2FjaGVbVUNIQVJfTUFY ICsgMV07CisKICNkZWZpbmUgTUJSVE9XQyhwd2MsIHMsIG4sIHBzKSBcCi0gIChtYl9jdXJfbWF4 ID09IDEgPyBcCi0gICAoKihwd2MpID0gYnRvd2MgKCoodW5zaWduZWQgY2hhciAqKSAocykpLCAx KSA6IFwKKyAgKG1icmxlbl9jYWNoZVsqKHVuc2lnbmVkIGNoYXIgKikgKHMpXSA9PSAxID8gXAor ICAgKCoocHdjKSA9IG1icnRvd2NfY2FjaGVbKih1bnNpZ25lZCBjaGFyICopIChzKV0sIDEpIDog XAogICAgbWJydG93YyAoKHB3YyksIChzKSwgKG4pLCAocHMpKSkKIAogI2RlZmluZSBXQ1JUT01C KHMsIHdjLCBwcykgXApAQCAtMjUyLDcgKzI1Niw4IEBAIGV4dGVybiBib29sIGlzX3V0Zjg7CiAg IChtYl9jdXJfbWF4ID09IDEgPyAxIDogbWJzaW5pdCAoKHMpKSkKIAogI2RlZmluZSBNQlJMRU4o cywgbiwgcHMpIFwKLSAgKG1iX2N1cl9tYXggPT0gMSA/IDEgOiBtYnJ0b3djIChOVUxMLCBzLCBu LCBwcykpCisgIChtYnJsZW5fY2FjaGVbKih1bnNpZ25lZCBjaGFyICopIChzKV0gPT0gMSA/IFwK KyAgIDEgOiBtYnJ0b3djIChOVUxMLCBzLCBuLCBwcykpCiAKICNkZWZpbmUgSVNfTUJfQ0hBUihj aCwgcHMpICAgICAgICAgICAgICAgIFwKICAgKG1iX2N1cl9tYXggPT0gMSA/IDAgOiBpc19tYl9j aGFyIChjaCwgcHMpKQotLSAKMS43LjEKCg== --------_57A497D20000000064F2_MULTIPART_MIXED_--
Norihiro Tanaka <noritnk@HIDDEN>
:bug-sed@HIDDEN
.
Full text available.bug-sed@HIDDEN
:bug#24160
; Package sed
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.