GNU logs - #24160, boring messages


Message sent to bug-sed@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#24160: [PATCH 1/2] sed: cache results of mbrtowc for speed
Resent-From: Norihiro Tanaka <noritnk@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-sed@HIDDEN
Resent-Date: Fri, 05 Aug 2016 13:52:02 +0000
Resent-Message-ID: <handler.24160.B.14704051016730 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: report 24160
X-GNU-PR-Package: sed
X-GNU-PR-Keywords: patch
To: 24160 <at> debbugs.gnu.org
X-Debbugs-Original-To: <bug-sed@HIDDEN>
Received: via spool by submit <at> debbugs.gnu.org id=B.14704051016730
          (code B ref -1); Fri, 05 Aug 2016 13:52:02 +0000
Received: (at submit) by debbugs.gnu.org; 5 Aug 2016 13:51:41 +0000
Received: from localhost ([127.0.0.1]:56243 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1bVfXJ-0001kU-I6
	for submit <at> debbugs.gnu.org; Fri, 05 Aug 2016 09:51:41 -0400
Received: from eggs.gnu.org ([208.118.235.92]:55877)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <noritnk@HIDDEN>) id 1bVfXI-0001kH-CD
 for submit <at> debbugs.gnu.org; Fri, 05 Aug 2016 09:51:40 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <noritnk@HIDDEN>) id 1bVfXC-0005c3-7q
 for submit <at> debbugs.gnu.org; Fri, 05 Aug 2016 09:51:35 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled
 version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:41193)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <noritnk@HIDDEN>) id 1bVfXC-0005bi-4P
 for submit <at> debbugs.gnu.org; Fri, 05 Aug 2016 09:51:34 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:46844)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <noritnk@HIDDEN>) id 1bVfX9-0001FN-Hi
 for bug-sed@HIDDEN; Fri, 05 Aug 2016 09:51:32 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <noritnk@HIDDEN>) id 1bVfX5-0005Zc-T2
 for bug-sed@HIDDEN; Fri, 05 Aug 2016 09:51:31 -0400
Received: from mailgw01.kcn.ne.jp ([61.86.7.208]:44459)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <noritnk@HIDDEN>) id 1bVfX5-0005X2-Cl
 for bug-sed@HIDDEN; Fri, 05 Aug 2016 09:51:27 -0400
Received: from mxs01-s (mailgw1.kcn.ne.jp [61.86.15.233])
 by mailgw01.kcn.ne.jp (Postfix) with ESMTP id CB45A4A0830
 for <bug-sed@HIDDEN>; Fri,  5 Aug 2016 22:51:15 +0900 (JST)
X-matriXscan-loop-detect: 330c21a9014acb14325ca5e417a99c9413b42fa3
Received: from mail02.kcn.ne.jp ([61.86.6.181]) by mxs01-s with ESMTP;
 Fri, 05 Aug 2016 22:51:14 +0900 (JST)
Received: from [10.120.1.35] (i118-21-128-66.s30.a048.ap.plala.or.jp
 [118.21.128.66])
 by mail02.kcn.ne.jp (Postfix) with ESMTPA id 30370F1001F
 for <bug-sed@HIDDEN>; Fri,  5 Aug 2016 22:51:14 +0900 (JST)
Date: Fri, 05 Aug 2016 22:51:16 +0900
From: Norihiro Tanaka <noritnk@HIDDEN>
Message-Id: <20160805225116.64FE.27F6AC2D@HIDDEN>
MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="------_57A497D20000000064F2_MULTIPART_MIXED_"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.65.07 [ja]
X-matriXscan-Sophos-AV: Clean
X-matriXscan-Action: Approve
X-matriXscan: Uncategorized
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -4.0 (----)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -4.0 (----)

--------_57A497D20000000064F2_MULTIPART_MIXED_
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit

Hi,

We can speeds up sed by caching result of result mbrtowc() for single
byte characters.  It is effective especially in non-UTF8 multibyte
locales which is expensive calculatation.

$ yes $(printf %040d 0) | head -1000000 >k

Before:

$ time -p env LC_ALL=ja_JP.eucjp sed/sed -ne /a.b/p k
real 1.93
user 1.61
sys 0.27

After patching

$ time -p env LC_ALL=ja_JP.eucjp sed/sed -ne /a.b/p k
real 0.46
user 0.42
sys 0.03

Thanks,
Norihiro

--------_57A497D20000000064F2_MULTIPART_MIXED_
Content-Type: text/plain;
 charset="US-ASCII";
 name="0001-sed-cache-results-of-mbrtowc-for-speed.patch"
Content-Disposition: attachment;
 filename="0001-sed-cache-results-of-mbrtowc-for-speed.patch"
Content-Transfer-Encoding: base64

RnJvbSBkYzI3NzM5NDQxNTRiMzA1Yzg5M2I3NDU5ODI5YmRlMjFjNWE2MTgyIE1vbiBTZXAgMTcg
MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE
YXRlOiBGcmksIDUgQXVnIDIwMTYgMDg6Mjg6MjAgKzA5MDAKU3ViamVjdDogW1BBVENIIDEvMl0g
c2VkOiBjYWNoZSByZXN1bHRzIG9mIG1icnRvd2MgZm9yIHNwZWVkCgoqIHNlZC9tYmNzLmMgKG1i
cnRvd2NfY2FjaGUsIG1icmxlbl9jYWNoZSk6IE5ldyB2YXJzLgooaW5pdGlhbGl6ZV9tYmNzKTog
SW5pdGlhbGl6ZSB0aGUgY2FjaGUuCiogc2VkL3NlZC5oOiBJbmNsdWRlIGxpbWl0cy5oCihNQlJU
T1dDLCBNQlJMRU4pOiBVc2UgdGhlIGNhY2hlLgotLS0KIHNlZC9tYmNzLmMgfCAgIDE0ICsrKysr
KysrKysrKysrCiBzZWQvc2VkLmggIHwgICAxMSArKysrKysrKy0tLQogMiBmaWxlcyBjaGFuZ2Vk
LCAyMiBpbnNlcnRpb25zKCspLCAzIGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL3NlZC9tYmNz
LmMgYi9zZWQvbWJjcy5jCmluZGV4IGJjZTM5ZmEuLjgxMDVlY2QgMTAwNjQ0Ci0tLSBhL3NlZC9t
YmNzLmMKKysrIGIvc2VkL21iY3MuYwpAQCAtMjQsNiArMjQsOSBAQAogaW50IG1iX2N1cl9tYXg7
CiBib29sIGlzX3V0Zjg7CiAKK3NpemVfdCBtYnJsZW5fY2FjaGVbVUNIQVJfTUFYICsgMV07Cit3
aW50X3QgbWJydG93Y19jYWNoZVtVQ0hBUl9NQVggKyAxXTsKKwogLyogUmV0dXJuIG5vbi16ZXJv
IGlmIENIIGlzIHBhcnQgb2YgYSB2YWxpZCBtdWx0aWJ5dGUgc2VxdWVuY2U6CiAgICBFaXRoZXIg
aW5jb21wbGV0ZSB5ZXQgdmFsaWQgc2VxdWVuY2UgKGluIGNhc2Ugb2YgYSBsZWFkaW5nIGJ5dGUp
LAogICAgb3IgdGhlIGxhc3QgYnl0ZSBvZiBhIHZhbGlkIG11bHRpYnl0ZSBzZXF1ZW5jZS4KQEAg
LTczLDQgKzc2LDE1IEBAIGluaXRpYWxpemVfbWJjcyAodm9pZCkKICAgaXNfdXRmOCA9IChzdHJj
bXAgKGNvZGVzZXRfbmFtZSwgIlVURi04IikgPT0gMCk7CiAKICAgbWJfY3VyX21heCA9IE1CX0NV
Ul9NQVg7CisKKyAgZm9yIChpbnQgaSA9IENIQVJfTUlOOyBpIDw9IENIQVJfTUFYOyArK2kpCisg
ICAgeworICAgICAgY2hhciBjID0gaTsKKyAgICAgIHVuc2lnbmVkIGNoYXIgdWMgPSBpOworICAg
ICAgbWJzdGF0ZV90IG1icyA9IHsgMCB9OworICAgICAgd2NoYXJfdCB3YzsKKyAgICAgIHNpemVf
dCBsZW4gPSBtYnJ0b3djICgmd2MsICZjLCAxLCAmbWJzKTsKKyAgICAgIG1icmxlbl9jYWNoZVt1
Y10gPSBsZW4gPyBsZW4gOiAxOworICAgICAgbWJydG93Y19jYWNoZVt1Y10gPSBsZW4gPT0gMSA/
IHdjIDogV0VPRjsKKyAgICB9CiB9CmRpZmYgLS1naXQgYS9zZWQvc2VkLmggYi9zZWQvc2VkLmgK
aW5kZXggYmJkZGQyNS4uMzcxNmJjYiAxMDA2NDQKLS0tIGEvc2VkL3NlZC5oCisrKyBiL3NlZC9z
ZWQuaApAQCAtMTksNiArMTksNyBAQAogI2luY2x1ZGUgImJhc2ljZGVmcy5oIgogI2luY2x1ZGUg
InJlZ2V4LmgiCiAjaW5jbHVkZSA8c3RkaW8uaD4KKyNpbmNsdWRlIDxsaW1pdHMuaD4KICNpbmNs
dWRlICJ1bmxvY2tlZC1pby5oIgogCiAjaW5jbHVkZSAidXRpbHMuaCIKQEAgLTIzOCw5ICsyMzks
MTIgQEAgZXh0ZXJuIGJvb2wgdXNlX2V4dGVuZGVkX3N5bnRheF9wOwogZXh0ZXJuIGludCBtYl9j
dXJfbWF4OwogZXh0ZXJuIGJvb2wgaXNfdXRmODsKIAorZXh0ZXJuIHNpemVfdCBtYnJsZW5fY2Fj
aGVbVUNIQVJfTUFYICsgMV07CitleHRlcm4gd2ludF90IG1icnRvd2NfY2FjaGVbVUNIQVJfTUFY
ICsgMV07CisKICNkZWZpbmUgTUJSVE9XQyhwd2MsIHMsIG4sIHBzKSBcCi0gIChtYl9jdXJfbWF4
ID09IDEgPyBcCi0gICAoKihwd2MpID0gYnRvd2MgKCoodW5zaWduZWQgY2hhciAqKSAocykpLCAx
KSA6IFwKKyAgKG1icmxlbl9jYWNoZVsqKHVuc2lnbmVkIGNoYXIgKikgKHMpXSA9PSAxID8gXAor
ICAgKCoocHdjKSA9IG1icnRvd2NfY2FjaGVbKih1bnNpZ25lZCBjaGFyICopIChzKV0sIDEpIDog
XAogICAgbWJydG93YyAoKHB3YyksIChzKSwgKG4pLCAocHMpKSkKIAogI2RlZmluZSBXQ1JUT01C
KHMsIHdjLCBwcykgXApAQCAtMjUyLDcgKzI1Niw4IEBAIGV4dGVybiBib29sIGlzX3V0Zjg7CiAg
IChtYl9jdXJfbWF4ID09IDEgPyAxIDogbWJzaW5pdCAoKHMpKSkKIAogI2RlZmluZSBNQlJMRU4o
cywgbiwgcHMpIFwKLSAgKG1iX2N1cl9tYXggPT0gMSA/IDEgOiBtYnJ0b3djIChOVUxMLCBzLCBu
LCBwcykpCisgIChtYnJsZW5fY2FjaGVbKih1bnNpZ25lZCBjaGFyICopIChzKV0gPT0gMSA/IFwK
KyAgIDEgOiBtYnJ0b3djIChOVUxMLCBzLCBuLCBwcykpCiAKICNkZWZpbmUgSVNfTUJfQ0hBUihj
aCwgcHMpICAgICAgICAgICAgICAgIFwKICAgKG1iX2N1cl9tYXggPT0gMSA/IDAgOiBpc19tYl9j
aGFyIChjaCwgcHMpKQotLSAKMS43LjEKCg==
--------_57A497D20000000064F2_MULTIPART_MIXED_--





Message sent:


Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Mailer: MIME-tools 5.505 (Entity 5.505)
Content-Type: text/plain; charset=utf-8
X-Loop: help-debbugs@HIDDEN
From: help-debbugs@HIDDEN (GNU bug Tracking System)
To: Norihiro Tanaka <noritnk@HIDDEN>
Subject: bug#24160: Acknowledgement ([PATCH 1/2] sed: cache results of
 mbrtowc for speed)
Message-ID: <handler.24160.B.14704051016730.ack <at> debbugs.gnu.org>
References: <20160805225116.64FE.27F6AC2D@HIDDEN>
X-Gnu-PR-Message: ack 24160
X-Gnu-PR-Package: sed
X-Gnu-PR-Keywords: patch
Reply-To: 24160 <at> debbugs.gnu.org
Date: Fri, 05 Aug 2016 13:52:02 +0000

Thank you for filing a new bug report with debbugs.gnu.org.

This is an automatically generated reply to let you know your message
has been received.

Your message is being forwarded to the package maintainers and other
interested parties for their attention; they will reply in due course.

Your message has been sent to the package maintainer(s):
 bug-sed@HIDDEN

If you wish to submit further information on this problem, please
send it to 24160 <at> debbugs.gnu.org.

Please do not send mail to help-debbugs@HIDDEN unless you wish
to report a problem with the Bug-tracking system.

--=20
24160: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D24160
GNU Bug Tracking System
Contact help-debbugs@HIDDEN with problems


Message sent to bug-sed@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#24160: [PATCH 1/2] sed: cache results of mbrtowc for speed
Resent-From: Assaf Gordon <assafgordon@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-sed@HIDDEN
Resent-Date: Fri, 05 Aug 2016 14:47:01 +0000
Resent-Message-ID: <handler.24160.B24160.147040837612103 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 24160
X-GNU-PR-Package: sed
X-GNU-PR-Keywords: patch
To: Norihiro Tanaka <noritnk@HIDDEN>, 24160 <at> debbugs.gnu.org
Received: via spool by 24160-submit <at> debbugs.gnu.org id=B24160.147040837612103
          (code B ref 24160); Fri, 05 Aug 2016 14:47:01 +0000
Received: (at 24160) by debbugs.gnu.org; 5 Aug 2016 14:46:16 +0000
Received: from localhost ([127.0.0.1]:56767 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1bVgO8-000398-KH
	for submit <at> debbugs.gnu.org; Fri, 05 Aug 2016 10:46:16 -0400
Received: from mail-qk0-f194.google.com ([209.85.220.194]:35288)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <assafgordon@HIDDEN>) id 1bVgO6-00038t-Pe
 for 24160 <at> debbugs.gnu.org; Fri, 05 Aug 2016 10:46:15 -0400
Received: by mail-qk0-f194.google.com with SMTP id q62so23633259qkf.2
 for <24160 <at> debbugs.gnu.org>; Fri, 05 Aug 2016 07:46:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=subject:to:references:from:message-id:date:user-agent:mime-version
 :in-reply-to:content-transfer-encoding;
 bh=nATZfaaGKIs9QRNeHd8s0TLqYAQ5btF5Z7vp6WeRO3U=;
 b=pp6cpW13g1rWMSGWTSnolBaweH07P+oEef1SgwvNm5GWlsfCQeNyH9AFE+nOKaaSVZ
 /zCPpOCBVXrdT/GF+xTD6Ue/EcvtcH/9Xh1rYJNor4AyxGQR7oV1L+n4eQwg8zh/o3DJ
 z2DX3mqPM0wyuRT+0DiyoZ+QXYsqhYf1WVFvGF2BjI+TyN52+KJLyUcfqWtdjtjl05wk
 i9K6RUzZnZz3g3ZoTDVO+sEF6UBEq2/PSrlDQ2UqmqPS4Fapm1mhso5XujVLzxt7dwF7
 4GKEJPSSYoRk45wuMHAKb9Epo8a8c7XTaoShn+H9PDQFPRPaxReGBEZVPsnMwW4QSGVr
 SeDA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:subject:to:references:from:message-id:date
 :user-agent:mime-version:in-reply-to:content-transfer-encoding;
 bh=nATZfaaGKIs9QRNeHd8s0TLqYAQ5btF5Z7vp6WeRO3U=;
 b=lEO0QKjeuCB1zv+viP991/eXae2sMxQ6g1BZHJ8C3MJwt5xLUCsMvjg9YywLsDjpVc
 7LAiOH/yhQkwj5oDJVUtv/vDVH2JNwdvdAkFutat+V1oglR4tWlIy2hBW6DtJAGFD/Y0
 g8gOvo6xcn3ww7YgdET2c7CNm2A/FrnJ25NCYRV5ntqfWBlwYT8oKVQClkpUdgW4bcX7
 JHaKKqWuWEiXlFeIjnHuAzPdcYtrztIEeDkDaVMnpeq73t65zeUOBWyZ4fxKDAyeYYns
 7N/5wT1mavEIfPVdLH2eJZgIi6gssLpwagrGn90vmyEDV+md0vMmRuuUYjPluo1eNF7m
 QzKw==
X-Gm-Message-State: AEkoout8ysgeRZ8+MTrJTqFRfVm/zzaEZhMWPdwuRbu2ZyccmPAMt9OgUexWVaErUVSKjA==
X-Received: by 10.55.73.145 with SMTP id w139mr13044261qka.114.1470408369316; 
 Fri, 05 Aug 2016 07:46:09 -0700 (PDT)
Received: from disco.erlich.nygenome.org ([69.74.14.178])
 by smtp.googlemail.com with ESMTPSA id 7sm9778104qkd.25.2016.08.05.07.46.03
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Fri, 05 Aug 2016 07:46:06 -0700 (PDT)
References: <20160805225116.64FE.27F6AC2D@HIDDEN>
From: Assaf Gordon <assafgordon@HIDDEN>
Message-ID: <5fb8c9bf-2233-f782-f9a1-9d55ca33f083@HIDDEN>
Date: Fri, 5 Aug 2016 10:45:59 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <20160805225116.64FE.27F6AC2D@HIDDEN>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: -0.7 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.7 (/)

Hello Norihiro,

Thank you for the patch.

On 08/05/2016 09:51 AM, Norihiro Tanaka wrote:
> We can speeds up sed by caching result of result mbrtowc() for single
> byte characters.  It is effective especially in non-UTF8 multibyte
> locales which is expensive calculatation.

Regarding this:
====
  #define MBRTOWC(pwc, s, n, ps) \
-  (mb_cur_max == 1 ? \
-   (*(pwc) = btowc (*(unsigned char *) (s)), 1) : \
+  (mbrlen_cache[*(unsigned char *) (s)] == 1 ? \
+   (*(pwc) = mbrtowc_cache[*(unsigned char *) (s)], 1) : \
     mbrtowc ((pwc), (s), (n), (ps)))
  
  #define MBRLEN(s, n, ps) \
-  (mb_cur_max == 1 ? 1 : mbrtowc (NULL, s, n, ps))
+  (mbrlen_cache[*(unsigned char *) (s)] == 1 ? \
+   1 : mbrtowc (NULL, s, n, ps))
====

By using a cache table, isn't this code ignoring mbstate ?
For example, in shift-jis encoding, the character '[' can either be standalone,
or a second character in a sequence such as '\x83\x5b' ?
Wouldn't it also prevent detection of invalid sequences ?

As a side-note, gnu sed's current implementation has special code path for multibyte-non-utf8 input,
so this change will not likely affect utf8 or C locales.

regards,
  - assaf





Message sent to bug-sed@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#24160: [PATCH 1/2] sed: cache results of mbrtowc for speed
Resent-From: Norihiro Tanaka <noritnk@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-sed@HIDDEN
Resent-Date: Sat, 06 Aug 2016 07:14:02 +0000
Resent-Message-ID: <handler.24160.B24160.14704676159693 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 24160
X-GNU-PR-Package: sed
X-GNU-PR-Keywords: patch
To: Assaf Gordon <assafgordon@HIDDEN>
Cc: 24160 <at> debbugs.gnu.org
Received: via spool by 24160-submit <at> debbugs.gnu.org id=B24160.14704676159693
          (code B ref 24160); Sat, 06 Aug 2016 07:14:02 +0000
Received: (at 24160) by debbugs.gnu.org; 6 Aug 2016 07:13:35 +0000
Received: from localhost ([127.0.0.1]:57088 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1bVvnb-0002WH-1J
	for submit <at> debbugs.gnu.org; Sat, 06 Aug 2016 03:13:35 -0400
Received: from mailgw01.kcn.ne.jp ([61.86.7.208]:44664)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <noritnk@HIDDEN>) id 1bVvnZ-0002W3-IB
 for 24160 <at> debbugs.gnu.org; Sat, 06 Aug 2016 03:13:34 -0400
Received: from mxs01-s (mailgw1.kcn.ne.jp [61.86.15.233])
 by mailgw01.kcn.ne.jp (Postfix) with ESMTP id 6C01C4A083A
 for <24160 <at> debbugs.gnu.org>; Sat,  6 Aug 2016 16:13:26 +0900 (JST)
X-matriXscan-loop-detect: 6194e6788300ac8807d6daf6022f9f9dd28bff71
Received: from mail09.kcn.ne.jp ([61.86.6.188]) by mxs01-s with ESMTP;
 Sat, 06 Aug 2016 16:13:25 +0900 (JST)
Received: from [10.120.1.17] (i118-21-128-66.s30.a048.ap.plala.or.jp
 [118.21.128.66])
 by mail09.kcn.ne.jp (Postfix) with ESMTPA id 560F71BD0097;
 Sat,  6 Aug 2016 16:13:25 +0900 (JST)
Date: Sat, 06 Aug 2016 16:13:27 +0900
From: Norihiro Tanaka <noritnk@HIDDEN>
In-Reply-To: <5fb8c9bf-2233-f782-f9a1-9d55ca33f083@HIDDEN>
References: <20160805225116.64FE.27F6AC2D@HIDDEN>
 <5fb8c9bf-2233-f782-f9a1-9d55ca33f083@HIDDEN>
Message-Id: <20160806161326.E614.27F6AC2D@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.65.07 [ja]
X-matriXscan-Sophos-AV: Clean
X-matriXscan-Action: Approve
X-matriXscan: Uncategorized
X-Spam-Score: -1.2 (-)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.2 (-)


On Fri, 5 Aug 2016 10:45:59 -0400
Assaf Gordon <assafgordon@HIDDEN> wrote:

> Hello Norihiro,
> 
> Thank you for the patch.
> 
> By using a cache table, isn't this code ignoring mbstate ?
> For example, in shift-jis encoding, the character '[' can either be standalone,
> or a second character in a sequence such as '\x83\x5b' ?
> Wouldn't it also prevent detection of invalid sequences ?
> 
> As a side-note, gnu sed's current implementation has special code path for multibyte-non-utf8 input,
> so this change will not likely affect utf8 or C locales.
> 
> regards,
>   - assaf

Hi Assaf,

Thanks for review.

When MBRTOWC() or MBRLEN() are called in shift-jis, mbstate is always
initial state or the equivalent to a state with initial state except
invalid sequence and incomplete sequence found, as shift-jis is
state-less encoding.

Even if their sequences were found, mbstate should be set to initial
state manually to check following characters in the string.  So I think
that we can ignore mbstate in state-less encoding.

However, the assumption is wrong for state-full encoding as ISO-2022 and
UTF-7.  Does sed support state-full encoding which has shift sequence?
At least, It seems that regex does not support state-full encoding.

Thanks,
Norihiro





Message sent to bug-sed@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#24160: [PATCH 1/2] sed: cache results of mbrtowc for speed
Resent-From: Norihiro Tanaka <noritnk@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-sed@HIDDEN
Resent-Date: Mon, 19 Sep 2016 02:33:01 +0000
Resent-Message-ID: <handler.24160.B24160.14742523525613 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 24160
X-GNU-PR-Package: sed
X-GNU-PR-Keywords: patch
To: 24160 <at> debbugs.gnu.org
Received: via spool by 24160-submit <at> debbugs.gnu.org id=B24160.14742523525613
          (code B ref 24160); Mon, 19 Sep 2016 02:33:01 +0000
Received: (at 24160) by debbugs.gnu.org; 19 Sep 2016 02:32:32 +0000
Received: from localhost ([127.0.0.1]:34674 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1bloNj-0001ST-Sz
	for submit <at> debbugs.gnu.org; Sun, 18 Sep 2016 22:32:32 -0400
Received: from mailgw01.kcn.ne.jp ([61.86.7.208]:57686)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <noritnk@HIDDEN>) id 1bloNi-0001SD-3a
 for 24160 <at> debbugs.gnu.org; Sun, 18 Sep 2016 22:32:30 -0400
Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234])
 by mailgw01.kcn.ne.jp (Postfix) with ESMTP id 42BDF4A086A
 for <24160 <at> debbugs.gnu.org>; Mon, 19 Sep 2016 11:32:23 +0900 (JST)
X-matriXscan-loop-detect: c9a5cf15e860450258d2e4a3759089e73446cb61
Received: from mail08.kcn.ne.jp ([61.86.6.187]) by mxs02-s with ESMTP;
 Mon, 19 Sep 2016 11:32:20 +0900 (JST)
Received: from [10.120.1.60] (i118-21-128-66.s30.a048.ap.plala.or.jp
 [118.21.128.66])
 by mail08.kcn.ne.jp (Postfix) with ESMTPA id 3CCE412B802E
 for <24160 <at> debbugs.gnu.org>; Mon, 19 Sep 2016 11:32:20 +0900 (JST)
Date: Mon, 19 Sep 2016 11:32:20 +0900
From: Norihiro Tanaka <noritnk@HIDDEN>
In-Reply-To: <20160805225116.64FE.27F6AC2D@HIDDEN>
References: <20160805225116.64FE.27F6AC2D@HIDDEN>
Message-Id: <20160919113219.41D1.27F6AC2D@HIDDEN>
MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="------_57DF4CEB0000000041C6_MULTIPART_MIXED_"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.65.07 [ja]
X-matriXscan-Sophos-AV: Clean
X-matriXscan-Action: Approve
X-matriXscan: Uncategorized
X-Spam-Score: -2.3 (--)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.3 (--)

--------_57DF4CEB0000000041C6_MULTIPART_MIXED_
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit


On Fri, 05 Aug 2016 22:51:16 +0900
Norihiro Tanaka <noritnk@HIDDEN> wrote:

> Hi,
> 
> We can speeds up sed by caching result of result mbrtowc() for single
> byte characters.  It is effective especially in non-UTF8 multibyte
> locales which is expensive calculatation.
> 
> $ yes $(printf %040d 0) | head -1000000 >k
> 
> Before:
> 
> $ time -p env LC_ALL=ja_JP.eucjp sed/sed -ne /a.b/p k
> real 1.93
> user 1.61
> sys 0.27
> 
> After patching
> 
> $ time -p env LC_ALL=ja_JP.eucjp sed/sed -ne /a.b/p k
> real 0.46
> user 0.42
> sys 0.03
> 
> Thanks,
> Norihiro

I rewrote the patch as using localeinfo in gnulib.

--------_57DF4CEB0000000041C6_MULTIPART_MIXED_
Content-Type: text/plain;
 charset="US-ASCII";
 name="0001-sed-use-cache-provided-by-localeinfo-for-mbrtowc-and.patch"
Content-Disposition: attachment;
 filename="0001-sed-use-cache-provided-by-localeinfo-for-mbrtowc-and.patch"
Content-Transfer-Encoding: base64

RnJvbSBjMWE5ZDcwOTM2NzU2ODg3YzdjZGY1NWI1YjMyODI2ZGY3MmI5ZDUyIE1vbiBTZXAgMTcg
MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE
YXRlOiBTdW4sIDE4IFNlcCAyMDE2IDE3OjQ2OjU3ICswOTAwClN1YmplY3Q6IFtQQVRDSF0gc2Vk
OiB1c2UgY2FjaGUgcHJvdmlkZWQgYnkgbG9jYWxlaW5mbyBmb3IgbWJydG93YyBhbmQgbWJybGVu
CgoqIHNlZC9zZWQuaCAoTUJSVE9XQywgTUJSTEVOKTogVXNlIGNhY2hlIHByb3ZpZGVkIGJ5IGxv
Y2FsZWluZm8uCihNQlJUT1dDLCBNQlJMRU4pOiBVc2UgdGhlIGNhY2hlLgotLS0KIHNlZC9zZWQu
aCB8ICAgIDcgKysrKy0tLQogMSBmaWxlcyBjaGFuZ2VkLCA0IGluc2VydGlvbnMoKyksIDMgZGVs
ZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvc2VkL3NlZC5oIGIvc2VkL3NlZC5oCmluZGV4IDA4M2Jh
YWUuLjllYmM4MTUgMTAwNjQ0Ci0tLSBhL3NlZC9zZWQuaAorKysgYi9zZWQvc2VkLmgKQEAgLTI0
Nyw4ICsyNDcsOCBAQCBleHRlcm4gYm9vbCBpc191dGY4OwogZXh0ZXJuIGJvb2wgc2FuZGJveDsK
IAogI2RlZmluZSBNQlJUT1dDKHB3YywgcywgbiwgcHMpIFwKLSAgKG1iX2N1cl9tYXggPT0gMSA/
IFwKLSAgICgqKHB3YykgPSBidG93YyAoKih1bnNpZ25lZCBjaGFyICopIChzKSksIDEpIDogXAor
ICAobG9jYWxlaW5mby5zYmNsZW5bKih1bnNpZ25lZCBjaGFyICopIChzKV0gPT0gMSA/IFwKKyAg
ICgqKHB3YykgPSBsb2NhbGVpbmZvLnNiY3Rvd2NbKih1bnNpZ25lZCBjaGFyICopIChzKV0sIDEp
IDogXAogICAgbWJydG93YyAoKHB3YyksIChzKSwgKG4pLCAocHMpKSkKIAogI2RlZmluZSBXQ1JU
T01CKHMsIHdjLCBwcykgXApAQCAtMjYwLDcgKzI2MCw4IEBAIGV4dGVybiBib29sIHNhbmRib3g7
CiAgIChtYl9jdXJfbWF4ID09IDEgPyAxIDogbWJzaW5pdCAoKHMpKSkKIAogI2RlZmluZSBNQlJM
RU4ocywgbiwgcHMpIFwKLSAgKG1iX2N1cl9tYXggPT0gMSA/IDEgOiBtYnJ0b3djIChOVUxMLCBz
LCBuLCBwcykpCisgIChsb2NhbGVpbmZvLnNiY2xlblsqKHVuc2lnbmVkIGNoYXIgKikgKHMpXSA9
PSAxID8gXAorICAgMSA6IG1icnRvd2MgKE5VTEwsIHMsIG4sIHBzKSkKIAogI2RlZmluZSBJU19N
Ql9DSEFSKGNoLCBwcykgICAgICAgICAgICAgICAgXAogICAobWJfY3VyX21heCA9PSAxID8gMCA6
IGlzX21iX2NoYXIgKGNoLCBwcykpCi0tIAoxLjcuMQoK
--------_57DF4CEB0000000041C6_MULTIPART_MIXED_--






Last modified: Mon, 25 Nov 2019 12:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.