GNU bug report logs - #24160
[PATCH 1/2] sed: cache results of mbrtowc for speed

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: sed; Reported by: Norihiro Tanaka <noritnk@HIDDEN>; Keywords: patch; dated Fri, 5 Aug 2016 13:52:02 UTC; Maintainer for sed is bug-sed@HIDDEN.

Message received at 24160 <at> debbugs.gnu.org:


Received: (at 24160) by debbugs.gnu.org; 19 Sep 2016 02:32:32 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Sep 18 22:32:32 2016
Received: from localhost ([127.0.0.1]:34674 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1bloNj-0001ST-Sz
	for submit <at> debbugs.gnu.org; Sun, 18 Sep 2016 22:32:32 -0400
Received: from mailgw01.kcn.ne.jp ([61.86.7.208]:57686)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <noritnk@HIDDEN>) id 1bloNi-0001SD-3a
 for 24160 <at> debbugs.gnu.org; Sun, 18 Sep 2016 22:32:30 -0400
Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234])
 by mailgw01.kcn.ne.jp (Postfix) with ESMTP id 42BDF4A086A
 for <24160 <at> debbugs.gnu.org>; Mon, 19 Sep 2016 11:32:23 +0900 (JST)
X-matriXscan-loop-detect: c9a5cf15e860450258d2e4a3759089e73446cb61
Received: from mail08.kcn.ne.jp ([61.86.6.187]) by mxs02-s with ESMTP;
 Mon, 19 Sep 2016 11:32:20 +0900 (JST)
Received: from [10.120.1.60] (i118-21-128-66.s30.a048.ap.plala.or.jp
 [118.21.128.66])
 by mail08.kcn.ne.jp (Postfix) with ESMTPA id 3CCE412B802E
 for <24160 <at> debbugs.gnu.org>; Mon, 19 Sep 2016 11:32:20 +0900 (JST)
Date: Mon, 19 Sep 2016 11:32:20 +0900
From: Norihiro Tanaka <noritnk@HIDDEN>
To: 24160 <at> debbugs.gnu.org
Subject: Re: bug#24160: [PATCH 1/2] sed: cache results of mbrtowc for speed
In-Reply-To: <20160805225116.64FE.27F6AC2D@HIDDEN>
References: <20160805225116.64FE.27F6AC2D@HIDDEN>
Message-Id: <20160919113219.41D1.27F6AC2D@HIDDEN>
MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="------_57DF4CEB0000000041C6_MULTIPART_MIXED_"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.65.07 [ja]
X-matriXscan-Sophos-AV: Clean
X-matriXscan-Action: Approve
X-matriXscan: Uncategorized
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 24160
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.3 (--)

--------_57DF4CEB0000000041C6_MULTIPART_MIXED_
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit


On Fri, 05 Aug 2016 22:51:16 +0900
Norihiro Tanaka <noritnk@HIDDEN> wrote:

> Hi,
> 
> We can speeds up sed by caching result of result mbrtowc() for single
> byte characters.  It is effective especially in non-UTF8 multibyte
> locales which is expensive calculatation.
> 
> $ yes $(printf %040d 0) | head -1000000 >k
> 
> Before:
> 
> $ time -p env LC_ALL=ja_JP.eucjp sed/sed -ne /a.b/p k
> real 1.93
> user 1.61
> sys 0.27
> 
> After patching
> 
> $ time -p env LC_ALL=ja_JP.eucjp sed/sed -ne /a.b/p k
> real 0.46
> user 0.42
> sys 0.03
> 
> Thanks,
> Norihiro

I rewrote the patch as using localeinfo in gnulib.

--------_57DF4CEB0000000041C6_MULTIPART_MIXED_
Content-Type: text/plain;
 charset="US-ASCII";
 name="0001-sed-use-cache-provided-by-localeinfo-for-mbrtowc-and.patch"
Content-Disposition: attachment;
 filename="0001-sed-use-cache-provided-by-localeinfo-for-mbrtowc-and.patch"
Content-Transfer-Encoding: base64

RnJvbSBjMWE5ZDcwOTM2NzU2ODg3YzdjZGY1NWI1YjMyODI2ZGY3MmI5ZDUyIE1vbiBTZXAgMTcg
MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE
YXRlOiBTdW4sIDE4IFNlcCAyMDE2IDE3OjQ2OjU3ICswOTAwClN1YmplY3Q6IFtQQVRDSF0gc2Vk
OiB1c2UgY2FjaGUgcHJvdmlkZWQgYnkgbG9jYWxlaW5mbyBmb3IgbWJydG93YyBhbmQgbWJybGVu
CgoqIHNlZC9zZWQuaCAoTUJSVE9XQywgTUJSTEVOKTogVXNlIGNhY2hlIHByb3ZpZGVkIGJ5IGxv
Y2FsZWluZm8uCihNQlJUT1dDLCBNQlJMRU4pOiBVc2UgdGhlIGNhY2hlLgotLS0KIHNlZC9zZWQu
aCB8ICAgIDcgKysrKy0tLQogMSBmaWxlcyBjaGFuZ2VkLCA0IGluc2VydGlvbnMoKyksIDMgZGVs
ZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvc2VkL3NlZC5oIGIvc2VkL3NlZC5oCmluZGV4IDA4M2Jh
YWUuLjllYmM4MTUgMTAwNjQ0Ci0tLSBhL3NlZC9zZWQuaAorKysgYi9zZWQvc2VkLmgKQEAgLTI0
Nyw4ICsyNDcsOCBAQCBleHRlcm4gYm9vbCBpc191dGY4OwogZXh0ZXJuIGJvb2wgc2FuZGJveDsK
IAogI2RlZmluZSBNQlJUT1dDKHB3YywgcywgbiwgcHMpIFwKLSAgKG1iX2N1cl9tYXggPT0gMSA/
IFwKLSAgICgqKHB3YykgPSBidG93YyAoKih1bnNpZ25lZCBjaGFyICopIChzKSksIDEpIDogXAor
ICAobG9jYWxlaW5mby5zYmNsZW5bKih1bnNpZ25lZCBjaGFyICopIChzKV0gPT0gMSA/IFwKKyAg
ICgqKHB3YykgPSBsb2NhbGVpbmZvLnNiY3Rvd2NbKih1bnNpZ25lZCBjaGFyICopIChzKV0sIDEp
IDogXAogICAgbWJydG93YyAoKHB3YyksIChzKSwgKG4pLCAocHMpKSkKIAogI2RlZmluZSBXQ1JU
T01CKHMsIHdjLCBwcykgXApAQCAtMjYwLDcgKzI2MCw4IEBAIGV4dGVybiBib29sIHNhbmRib3g7
CiAgIChtYl9jdXJfbWF4ID09IDEgPyAxIDogbWJzaW5pdCAoKHMpKSkKIAogI2RlZmluZSBNQlJM
RU4ocywgbiwgcHMpIFwKLSAgKG1iX2N1cl9tYXggPT0gMSA/IDEgOiBtYnJ0b3djIChOVUxMLCBz
LCBuLCBwcykpCisgIChsb2NhbGVpbmZvLnNiY2xlblsqKHVuc2lnbmVkIGNoYXIgKikgKHMpXSA9
PSAxID8gXAorICAgMSA6IG1icnRvd2MgKE5VTEwsIHMsIG4sIHBzKSkKIAogI2RlZmluZSBJU19N
Ql9DSEFSKGNoLCBwcykgICAgICAgICAgICAgICAgXAogICAobWJfY3VyX21heCA9PSAxID8gMCA6
IGlzX21iX2NoYXIgKGNoLCBwcykpCi0tIAoxLjcuMQoK
--------_57DF4CEB0000000041C6_MULTIPART_MIXED_--





Information forwarded to bug-sed@HIDDEN:
bug#24160; Package sed. Full text available.

Message received at 24160 <at> debbugs.gnu.org:


Received: (at 24160) by debbugs.gnu.org; 6 Aug 2016 07:13:35 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Aug 06 03:13:35 2016
Received: from localhost ([127.0.0.1]:57088 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1bVvnb-0002WH-1J
	for submit <at> debbugs.gnu.org; Sat, 06 Aug 2016 03:13:35 -0400
Received: from mailgw01.kcn.ne.jp ([61.86.7.208]:44664)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <noritnk@HIDDEN>) id 1bVvnZ-0002W3-IB
 for 24160 <at> debbugs.gnu.org; Sat, 06 Aug 2016 03:13:34 -0400
Received: from mxs01-s (mailgw1.kcn.ne.jp [61.86.15.233])
 by mailgw01.kcn.ne.jp (Postfix) with ESMTP id 6C01C4A083A
 for <24160 <at> debbugs.gnu.org>; Sat,  6 Aug 2016 16:13:26 +0900 (JST)
X-matriXscan-loop-detect: 6194e6788300ac8807d6daf6022f9f9dd28bff71
Received: from mail09.kcn.ne.jp ([61.86.6.188]) by mxs01-s with ESMTP;
 Sat, 06 Aug 2016 16:13:25 +0900 (JST)
Received: from [10.120.1.17] (i118-21-128-66.s30.a048.ap.plala.or.jp
 [118.21.128.66])
 by mail09.kcn.ne.jp (Postfix) with ESMTPA id 560F71BD0097;
 Sat,  6 Aug 2016 16:13:25 +0900 (JST)
Date: Sat, 06 Aug 2016 16:13:27 +0900
From: Norihiro Tanaka <noritnk@HIDDEN>
To: Assaf Gordon <assafgordon@HIDDEN>
Subject: Re: bug#24160: [PATCH 1/2] sed: cache results of mbrtowc for speed
In-Reply-To: <5fb8c9bf-2233-f782-f9a1-9d55ca33f083@HIDDEN>
References: <20160805225116.64FE.27F6AC2D@HIDDEN>
 <5fb8c9bf-2233-f782-f9a1-9d55ca33f083@HIDDEN>
Message-Id: <20160806161326.E614.27F6AC2D@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.65.07 [ja]
X-matriXscan-Sophos-AV: Clean
X-matriXscan-Action: Approve
X-matriXscan: Uncategorized
X-Spam-Score: -1.2 (-)
X-Debbugs-Envelope-To: 24160
Cc: 24160 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.2 (-)


On Fri, 5 Aug 2016 10:45:59 -0400
Assaf Gordon <assafgordon@HIDDEN> wrote:

> Hello Norihiro,
> 
> Thank you for the patch.
> 
> By using a cache table, isn't this code ignoring mbstate ?
> For example, in shift-jis encoding, the character '[' can either be standalone,
> or a second character in a sequence such as '\x83\x5b' ?
> Wouldn't it also prevent detection of invalid sequences ?
> 
> As a side-note, gnu sed's current implementation has special code path for multibyte-non-utf8 input,
> so this change will not likely affect utf8 or C locales.
> 
> regards,
>   - assaf

Hi Assaf,

Thanks for review.

When MBRTOWC() or MBRLEN() are called in shift-jis, mbstate is always
initial state or the equivalent to a state with initial state except
invalid sequence and incomplete sequence found, as shift-jis is
state-less encoding.

Even if their sequences were found, mbstate should be set to initial
state manually to check following characters in the string.  So I think
that we can ignore mbstate in state-less encoding.

However, the assumption is wrong for state-full encoding as ISO-2022 and
UTF-7.  Does sed support state-full encoding which has shift sequence?
At least, It seems that regex does not support state-full encoding.

Thanks,
Norihiro





Information forwarded to bug-sed@HIDDEN:
bug#24160; Package sed. Full text available.

Message received at 24160 <at> debbugs.gnu.org:


Received: (at 24160) by debbugs.gnu.org; 5 Aug 2016 14:46:16 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Aug 05 10:46:16 2016
Received: from localhost ([127.0.0.1]:56767 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1bVgO8-000398-KH
	for submit <at> debbugs.gnu.org; Fri, 05 Aug 2016 10:46:16 -0400
Received: from mail-qk0-f194.google.com ([209.85.220.194]:35288)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <assafgordon@HIDDEN>) id 1bVgO6-00038t-Pe
 for 24160 <at> debbugs.gnu.org; Fri, 05 Aug 2016 10:46:15 -0400
Received: by mail-qk0-f194.google.com with SMTP id q62so23633259qkf.2
 for <24160 <at> debbugs.gnu.org>; Fri, 05 Aug 2016 07:46:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=subject:to:references:from:message-id:date:user-agent:mime-version
 :in-reply-to:content-transfer-encoding;
 bh=nATZfaaGKIs9QRNeHd8s0TLqYAQ5btF5Z7vp6WeRO3U=;
 b=pp6cpW13g1rWMSGWTSnolBaweH07P+oEef1SgwvNm5GWlsfCQeNyH9AFE+nOKaaSVZ
 /zCPpOCBVXrdT/GF+xTD6Ue/EcvtcH/9Xh1rYJNor4AyxGQR7oV1L+n4eQwg8zh/o3DJ
 z2DX3mqPM0wyuRT+0DiyoZ+QXYsqhYf1WVFvGF2BjI+TyN52+KJLyUcfqWtdjtjl05wk
 i9K6RUzZnZz3g3ZoTDVO+sEF6UBEq2/PSrlDQ2UqmqPS4Fapm1mhso5XujVLzxt7dwF7
 4GKEJPSSYoRk45wuMHAKb9Epo8a8c7XTaoShn+H9PDQFPRPaxReGBEZVPsnMwW4QSGVr
 SeDA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:subject:to:references:from:message-id:date
 :user-agent:mime-version:in-reply-to:content-transfer-encoding;
 bh=nATZfaaGKIs9QRNeHd8s0TLqYAQ5btF5Z7vp6WeRO3U=;
 b=lEO0QKjeuCB1zv+viP991/eXae2sMxQ6g1BZHJ8C3MJwt5xLUCsMvjg9YywLsDjpVc
 7LAiOH/yhQkwj5oDJVUtv/vDVH2JNwdvdAkFutat+V1oglR4tWlIy2hBW6DtJAGFD/Y0
 g8gOvo6xcn3ww7YgdET2c7CNm2A/FrnJ25NCYRV5ntqfWBlwYT8oKVQClkpUdgW4bcX7
 JHaKKqWuWEiXlFeIjnHuAzPdcYtrztIEeDkDaVMnpeq73t65zeUOBWyZ4fxKDAyeYYns
 7N/5wT1mavEIfPVdLH2eJZgIi6gssLpwagrGn90vmyEDV+md0vMmRuuUYjPluo1eNF7m
 QzKw==
X-Gm-Message-State: AEkoout8ysgeRZ8+MTrJTqFRfVm/zzaEZhMWPdwuRbu2ZyccmPAMt9OgUexWVaErUVSKjA==
X-Received: by 10.55.73.145 with SMTP id w139mr13044261qka.114.1470408369316; 
 Fri, 05 Aug 2016 07:46:09 -0700 (PDT)
Received: from disco.erlich.nygenome.org ([69.74.14.178])
 by smtp.googlemail.com with ESMTPSA id 7sm9778104qkd.25.2016.08.05.07.46.03
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Fri, 05 Aug 2016 07:46:06 -0700 (PDT)
Subject: Re: bug#24160: [PATCH 1/2] sed: cache results of mbrtowc for speed
To: Norihiro Tanaka <noritnk@HIDDEN>, 24160 <at> debbugs.gnu.org
References: <20160805225116.64FE.27F6AC2D@HIDDEN>
From: Assaf Gordon <assafgordon@HIDDEN>
Message-ID: <5fb8c9bf-2233-f782-f9a1-9d55ca33f083@HIDDEN>
Date: Fri, 5 Aug 2016 10:45:59 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <20160805225116.64FE.27F6AC2D@HIDDEN>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: -0.7 (/)
X-Debbugs-Envelope-To: 24160
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.7 (/)

Hello Norihiro,

Thank you for the patch.

On 08/05/2016 09:51 AM, Norihiro Tanaka wrote:
> We can speeds up sed by caching result of result mbrtowc() for single
> byte characters.  It is effective especially in non-UTF8 multibyte
> locales which is expensive calculatation.

Regarding this:
====
  #define MBRTOWC(pwc, s, n, ps) \
-  (mb_cur_max == 1 ? \
-   (*(pwc) = btowc (*(unsigned char *) (s)), 1) : \
+  (mbrlen_cache[*(unsigned char *) (s)] == 1 ? \
+   (*(pwc) = mbrtowc_cache[*(unsigned char *) (s)], 1) : \
     mbrtowc ((pwc), (s), (n), (ps)))
  
  #define MBRLEN(s, n, ps) \
-  (mb_cur_max == 1 ? 1 : mbrtowc (NULL, s, n, ps))
+  (mbrlen_cache[*(unsigned char *) (s)] == 1 ? \
+   1 : mbrtowc (NULL, s, n, ps))
====

By using a cache table, isn't this code ignoring mbstate ?
For example, in shift-jis encoding, the character '[' can either be standalone,
or a second character in a sequence such as '\x83\x5b' ?
Wouldn't it also prevent detection of invalid sequences ?

As a side-note, gnu sed's current implementation has special code path for multibyte-non-utf8 input,
so this change will not likely affect utf8 or C locales.

regards,
  - assaf





Information forwarded to bug-sed@HIDDEN:
bug#24160; Package sed. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 5 Aug 2016 13:51:41 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Aug 05 09:51:41 2016
Received: from localhost ([127.0.0.1]:56243 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1bVfXJ-0001kU-I6
	for submit <at> debbugs.gnu.org; Fri, 05 Aug 2016 09:51:41 -0400
Received: from eggs.gnu.org ([208.118.235.92]:55877)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <noritnk@HIDDEN>) id 1bVfXI-0001kH-CD
 for submit <at> debbugs.gnu.org; Fri, 05 Aug 2016 09:51:40 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <noritnk@HIDDEN>) id 1bVfXC-0005c3-7q
 for submit <at> debbugs.gnu.org; Fri, 05 Aug 2016 09:51:35 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled
 version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:41193)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <noritnk@HIDDEN>) id 1bVfXC-0005bi-4P
 for submit <at> debbugs.gnu.org; Fri, 05 Aug 2016 09:51:34 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:46844)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <noritnk@HIDDEN>) id 1bVfX9-0001FN-Hi
 for bug-sed@HIDDEN; Fri, 05 Aug 2016 09:51:32 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <noritnk@HIDDEN>) id 1bVfX5-0005Zc-T2
 for bug-sed@HIDDEN; Fri, 05 Aug 2016 09:51:31 -0400
Received: from mailgw01.kcn.ne.jp ([61.86.7.208]:44459)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <noritnk@HIDDEN>) id 1bVfX5-0005X2-Cl
 for bug-sed@HIDDEN; Fri, 05 Aug 2016 09:51:27 -0400
Received: from mxs01-s (mailgw1.kcn.ne.jp [61.86.15.233])
 by mailgw01.kcn.ne.jp (Postfix) with ESMTP id CB45A4A0830
 for <bug-sed@HIDDEN>; Fri,  5 Aug 2016 22:51:15 +0900 (JST)
X-matriXscan-loop-detect: 330c21a9014acb14325ca5e417a99c9413b42fa3
Received: from mail02.kcn.ne.jp ([61.86.6.181]) by mxs01-s with ESMTP;
 Fri, 05 Aug 2016 22:51:14 +0900 (JST)
Received: from [10.120.1.35] (i118-21-128-66.s30.a048.ap.plala.or.jp
 [118.21.128.66])
 by mail02.kcn.ne.jp (Postfix) with ESMTPA id 30370F1001F
 for <bug-sed@HIDDEN>; Fri,  5 Aug 2016 22:51:14 +0900 (JST)
Date: Fri, 05 Aug 2016 22:51:16 +0900
From: Norihiro Tanaka <noritnk@HIDDEN>
To: <bug-sed@HIDDEN>
Subject: [PATCH 1/2] sed: cache results of mbrtowc for speed
Message-Id: <20160805225116.64FE.27F6AC2D@HIDDEN>
MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="------_57A497D20000000064F2_MULTIPART_MIXED_"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.65.07 [ja]
X-matriXscan-Sophos-AV: Clean
X-matriXscan-Action: Approve
X-matriXscan: Uncategorized
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -4.0 (----)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -4.0 (----)

--------_57A497D20000000064F2_MULTIPART_MIXED_
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit

Hi,

We can speeds up sed by caching result of result mbrtowc() for single
byte characters.  It is effective especially in non-UTF8 multibyte
locales which is expensive calculatation.

$ yes $(printf %040d 0) | head -1000000 >k

Before:

$ time -p env LC_ALL=ja_JP.eucjp sed/sed -ne /a.b/p k
real 1.93
user 1.61
sys 0.27

After patching

$ time -p env LC_ALL=ja_JP.eucjp sed/sed -ne /a.b/p k
real 0.46
user 0.42
sys 0.03

Thanks,
Norihiro

--------_57A497D20000000064F2_MULTIPART_MIXED_
Content-Type: text/plain;
 charset="US-ASCII";
 name="0001-sed-cache-results-of-mbrtowc-for-speed.patch"
Content-Disposition: attachment;
 filename="0001-sed-cache-results-of-mbrtowc-for-speed.patch"
Content-Transfer-Encoding: base64

RnJvbSBkYzI3NzM5NDQxNTRiMzA1Yzg5M2I3NDU5ODI5YmRlMjFjNWE2MTgyIE1vbiBTZXAgMTcg
MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE
YXRlOiBGcmksIDUgQXVnIDIwMTYgMDg6Mjg6MjAgKzA5MDAKU3ViamVjdDogW1BBVENIIDEvMl0g
c2VkOiBjYWNoZSByZXN1bHRzIG9mIG1icnRvd2MgZm9yIHNwZWVkCgoqIHNlZC9tYmNzLmMgKG1i
cnRvd2NfY2FjaGUsIG1icmxlbl9jYWNoZSk6IE5ldyB2YXJzLgooaW5pdGlhbGl6ZV9tYmNzKTog
SW5pdGlhbGl6ZSB0aGUgY2FjaGUuCiogc2VkL3NlZC5oOiBJbmNsdWRlIGxpbWl0cy5oCihNQlJU
T1dDLCBNQlJMRU4pOiBVc2UgdGhlIGNhY2hlLgotLS0KIHNlZC9tYmNzLmMgfCAgIDE0ICsrKysr
KysrKysrKysrCiBzZWQvc2VkLmggIHwgICAxMSArKysrKysrKy0tLQogMiBmaWxlcyBjaGFuZ2Vk
LCAyMiBpbnNlcnRpb25zKCspLCAzIGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL3NlZC9tYmNz
LmMgYi9zZWQvbWJjcy5jCmluZGV4IGJjZTM5ZmEuLjgxMDVlY2QgMTAwNjQ0Ci0tLSBhL3NlZC9t
YmNzLmMKKysrIGIvc2VkL21iY3MuYwpAQCAtMjQsNiArMjQsOSBAQAogaW50IG1iX2N1cl9tYXg7
CiBib29sIGlzX3V0Zjg7CiAKK3NpemVfdCBtYnJsZW5fY2FjaGVbVUNIQVJfTUFYICsgMV07Cit3
aW50X3QgbWJydG93Y19jYWNoZVtVQ0hBUl9NQVggKyAxXTsKKwogLyogUmV0dXJuIG5vbi16ZXJv
IGlmIENIIGlzIHBhcnQgb2YgYSB2YWxpZCBtdWx0aWJ5dGUgc2VxdWVuY2U6CiAgICBFaXRoZXIg
aW5jb21wbGV0ZSB5ZXQgdmFsaWQgc2VxdWVuY2UgKGluIGNhc2Ugb2YgYSBsZWFkaW5nIGJ5dGUp
LAogICAgb3IgdGhlIGxhc3QgYnl0ZSBvZiBhIHZhbGlkIG11bHRpYnl0ZSBzZXF1ZW5jZS4KQEAg
LTczLDQgKzc2LDE1IEBAIGluaXRpYWxpemVfbWJjcyAodm9pZCkKICAgaXNfdXRmOCA9IChzdHJj
bXAgKGNvZGVzZXRfbmFtZSwgIlVURi04IikgPT0gMCk7CiAKICAgbWJfY3VyX21heCA9IE1CX0NV
Ul9NQVg7CisKKyAgZm9yIChpbnQgaSA9IENIQVJfTUlOOyBpIDw9IENIQVJfTUFYOyArK2kpCisg
ICAgeworICAgICAgY2hhciBjID0gaTsKKyAgICAgIHVuc2lnbmVkIGNoYXIgdWMgPSBpOworICAg
ICAgbWJzdGF0ZV90IG1icyA9IHsgMCB9OworICAgICAgd2NoYXJfdCB3YzsKKyAgICAgIHNpemVf
dCBsZW4gPSBtYnJ0b3djICgmd2MsICZjLCAxLCAmbWJzKTsKKyAgICAgIG1icmxlbl9jYWNoZVt1
Y10gPSBsZW4gPyBsZW4gOiAxOworICAgICAgbWJydG93Y19jYWNoZVt1Y10gPSBsZW4gPT0gMSA/
IHdjIDogV0VPRjsKKyAgICB9CiB9CmRpZmYgLS1naXQgYS9zZWQvc2VkLmggYi9zZWQvc2VkLmgK
aW5kZXggYmJkZGQyNS4uMzcxNmJjYiAxMDA2NDQKLS0tIGEvc2VkL3NlZC5oCisrKyBiL3NlZC9z
ZWQuaApAQCAtMTksNiArMTksNyBAQAogI2luY2x1ZGUgImJhc2ljZGVmcy5oIgogI2luY2x1ZGUg
InJlZ2V4LmgiCiAjaW5jbHVkZSA8c3RkaW8uaD4KKyNpbmNsdWRlIDxsaW1pdHMuaD4KICNpbmNs
dWRlICJ1bmxvY2tlZC1pby5oIgogCiAjaW5jbHVkZSAidXRpbHMuaCIKQEAgLTIzOCw5ICsyMzks
MTIgQEAgZXh0ZXJuIGJvb2wgdXNlX2V4dGVuZGVkX3N5bnRheF9wOwogZXh0ZXJuIGludCBtYl9j
dXJfbWF4OwogZXh0ZXJuIGJvb2wgaXNfdXRmODsKIAorZXh0ZXJuIHNpemVfdCBtYnJsZW5fY2Fj
aGVbVUNIQVJfTUFYICsgMV07CitleHRlcm4gd2ludF90IG1icnRvd2NfY2FjaGVbVUNIQVJfTUFY
ICsgMV07CisKICNkZWZpbmUgTUJSVE9XQyhwd2MsIHMsIG4sIHBzKSBcCi0gIChtYl9jdXJfbWF4
ID09IDEgPyBcCi0gICAoKihwd2MpID0gYnRvd2MgKCoodW5zaWduZWQgY2hhciAqKSAocykpLCAx
KSA6IFwKKyAgKG1icmxlbl9jYWNoZVsqKHVuc2lnbmVkIGNoYXIgKikgKHMpXSA9PSAxID8gXAor
ICAgKCoocHdjKSA9IG1icnRvd2NfY2FjaGVbKih1bnNpZ25lZCBjaGFyICopIChzKV0sIDEpIDog
XAogICAgbWJydG93YyAoKHB3YyksIChzKSwgKG4pLCAocHMpKSkKIAogI2RlZmluZSBXQ1JUT01C
KHMsIHdjLCBwcykgXApAQCAtMjUyLDcgKzI1Niw4IEBAIGV4dGVybiBib29sIGlzX3V0Zjg7CiAg
IChtYl9jdXJfbWF4ID09IDEgPyAxIDogbWJzaW5pdCAoKHMpKSkKIAogI2RlZmluZSBNQlJMRU4o
cywgbiwgcHMpIFwKLSAgKG1iX2N1cl9tYXggPT0gMSA/IDEgOiBtYnJ0b3djIChOVUxMLCBzLCBu
LCBwcykpCisgIChtYnJsZW5fY2FjaGVbKih1bnNpZ25lZCBjaGFyICopIChzKV0gPT0gMSA/IFwK
KyAgIDEgOiBtYnJ0b3djIChOVUxMLCBzLCBuLCBwcykpCiAKICNkZWZpbmUgSVNfTUJfQ0hBUihj
aCwgcHMpICAgICAgICAgICAgICAgIFwKICAgKG1iX2N1cl9tYXggPT0gMSA/IDAgOiBpc19tYl9j
aGFyIChjaCwgcHMpKQotLSAKMS43LjEKCg==
--------_57A497D20000000064F2_MULTIPART_MIXED_--





Acknowledgement sent to Norihiro Tanaka <noritnk@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-sed@HIDDEN. Full text available.
Report forwarded to bug-sed@HIDDEN:
bug#24160; Package sed. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Mon, 25 Nov 2019 12:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.