GNU bug report logs - #24924
multibyte: pr has no concept of wide characters

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: coreutils; Severity: wishlist; Reported by: 積丹尼 Dan Jacobson <jidanni@HIDDEN>; dated Fri, 11 Nov 2016 16:12:01 UTC; Maintainer for coreutils is bug-coreutils@HIDDEN.
Severity set to 'wishlist' from 'normal' Request was from Assaf Gordon <assafgordon@HIDDEN> to control <at> debbugs.gnu.org. Full text available.
Changed bug title to 'multibyte: pr has no concept of wide characters' from 'pr has no concept of wide characters' Request was from Assaf Gordon <assafgordon@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 24924 <at> debbugs.gnu.org:


Received: (at 24924) by debbugs.gnu.org; 1 Dec 2016 08:49:49 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Dec 01 03:49:49 2016
Received: from localhost ([127.0.0.1]:48064 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1cCN3s-0003e5-S0
	for submit <at> debbugs.gnu.org; Thu, 01 Dec 2016 03:49:49 -0500
Received: from mail-wm0-f65.google.com ([74.125.82.65]:35133)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <stephane.chazelas@HIDDEN>) id 1cCN3s-0003dt-4t
 for 24924 <at> debbugs.gnu.org; Thu, 01 Dec 2016 03:49:48 -0500
Received: by mail-wm0-f65.google.com with SMTP id a20so33100157wme.2
 for <24924 <at> debbugs.gnu.org>; Thu, 01 Dec 2016 00:49:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-disposition:content-transfer-encoding:in-reply-to
 :user-agent; bh=YZTA3RFTVonjPC1Wkbx67XwgHZ/bVs1JcUmfocz8pZs=;
 b=jUpFgCfMZpl03PEemgB9OKjuhjpqrdz/XfB2cyKHx6j9ImM2dS1DE4s49IJDyhQgOm
 8SyHOeamFaCUHPxrmzi+frrr90SIadbq9nQp2IaJ+dFFNsP1cTqMfGQintURLB12ulK5
 yPBjbkPQOpgl3Z1FoyTTBpKqwK3oEXqLCkAq/fakjuPpCD9vUnRn5DVWZzgPX6yJ3TGB
 WwXPVbZKetuzw3gq5i1Vld/Yf/2dFygAzhdYihrJ1/ov/IsuMXcSmSGbLMew42aZIQGj
 segtIfPXaigIcU0Qj/XgwvhCll48rcTTMtSU9N1GS1z+SMFw0Vq2QM+KyuQkIjL/N5Fa
 F8Dg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:content-transfer-encoding
 :in-reply-to:user-agent;
 bh=YZTA3RFTVonjPC1Wkbx67XwgHZ/bVs1JcUmfocz8pZs=;
 b=jpOTWzt3dl/uTgFbNOUhwCRpiMi0D+KzzDnYKqE7uQ9zsqv9JxzaKjqQnWwQZECe1k
 JfAc5XSds35M4GEIcglFk4TMU7gmbVWxUUhM5rNzeYDh3pGKxdFrwoacWnGuJtcjtazF
 HDppqwu0mNXTYhubns0du0+8NlsOCopgrppViPtNF5XiSw0myN71F9j4CUWUAQZ4LmHF
 AZlZHwUBr1ghYFTeKdirChF9MmCF7w4oLzWQPJli5NpJNTwuxzWq+0I1c9DkrfqdZ/I6
 AY3YS5s5lNxldJC8v8JfgQ1oYoEQSPqr1IjjvDtDz3F1tmiH3DIQoqLu17krcaLI3Hd3
 DENg==
X-Gm-Message-State: AKaTC012ogmyeonNk8eATN9q7qWpaiLCabxfCCiJ9XCj6oTUaj00ibCZ8gFoUwHUMEWDhQ==
X-Received: by 10.28.113.218 with SMTP id d87mr30705165wmi.111.1480582182274; 
 Thu, 01 Dec 2016 00:49:42 -0800 (PST)
Received: from chaz.gmail.com ([90.201.137.34])
 by smtp.gmail.com with ESMTPSA id 63sm12013278wmg.2.2016.12.01.00.49.40
 (version=TLS1_2 cipher=AES128-SHA bits=128/128);
 Thu, 01 Dec 2016 00:49:40 -0800 (PST)
Date: Thu, 1 Dec 2016 08:49:39 +0000
From: Stephane Chazelas <stephane.chazelas@HIDDEN>
To: Paul Eggert <eggert@HIDDEN>
Subject: Re: bug#24924: GNU pr only working with singlebyte 1-width characters
Message-ID: <20161201084939.GA11768@HIDDEN>
References: <8737iyatfd.fsf@HIDDEN> <20161130113034.GA7005@HIDDEN>
 <69288e76-9f5d-c792-9bba-6a984461463b@HIDDEN>
 <20161201070405.GB4922@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20161201070405.GB4922@HIDDEN>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Score: 0.5 (/)
X-Debbugs-Envelope-To: 24924
Cc: 24924 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.5 (/)

2016-12-01 07:04:05 +0000, Stephane Chazelas:
> 2016-11-30 18:37:05 -0800, Paul Eggert:
> [...]
> > In the meantime if you could submit a patch for the
> > documentation that should fix the immediate documentation
> > problem.
> [...]
> 
> What about:
[...]
> +Please note that @command{pr} currently doesn't support multi-byte characters
> +or non-ASCII characters that have a null or double width. If such characters
> +occur in the input or column separators, column alignment may be off or lines
> +may exceed the page width. There is also no provision to support bidirectional
> +text.
[...]

Actually, it seems it can also truncate lines in the middle of
some characters though it seems it's confined to multibyte
characters that have byte values <= 127 like:

$ locale charmap
BIG5-HKSCS
$ printf '\ue9\ue9\ue9\n' | pr -w5 -t2 | hd
00000000  88 6d 88 6d 88 0a                                 |.m.m..|
00000006

See how that third  (0x88 0x6d in BIG5-HKSCS) was truncated in
the middle.

It's as if it was considering all byte values >= 128 as having
zero width in multi-byte locales (and only in multi-byte
locales, that doesn't seem to occur in single-byte ones).

So maybe:

diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index cc85f22..15088ce 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -1838,6 +1838,13 @@ For single
 column output no line truncation occurs by default.  Use @option{-W} option to
 truncate lines in that case.
 
+Please note that @command{pr} currently doesn't support multi-byte characters
+or non-ASCII characters that have a null or double width. If such characters
+occur in the input or column separators, column alignment may be off or lines
+may exceed the page width, or truncation may occur in the middle of some
+characters producing invalid text output. There is also no provision to support
+bidirectional text.
+
 The following changes were made in version 1.22i and apply to later
 versions of @command{pr}:
 @c FIXME: this whole section here sounds very awkward to me. I

-- 
Stephane




Information forwarded to bug-coreutils@HIDDEN:
bug#24924; Package coreutils. Full text available.

Message received at 24924 <at> debbugs.gnu.org:


Received: (at 24924) by debbugs.gnu.org; 1 Dec 2016 07:04:16 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Dec 01 02:04:16 2016
Received: from localhost ([127.0.0.1]:48036 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1cCLPk-0001EP-9l
	for submit <at> debbugs.gnu.org; Thu, 01 Dec 2016 02:04:16 -0500
Received: from mail-wj0-f174.google.com ([209.85.210.174]:35211)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <stephane.chazelas@HIDDEN>) id 1cCLPh-0001E9-UO
 for 24924 <at> debbugs.gnu.org; Thu, 01 Dec 2016 02:04:14 -0500
Received: by mail-wj0-f174.google.com with SMTP id v7so195837364wjy.2
 for <24924 <at> debbugs.gnu.org>; Wed, 30 Nov 2016 23:04:13 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-disposition:in-reply-to:user-agent;
 bh=WTiA/47OLPW/dJG8HtSADogK/KSMf1Qt8NwELTGoYO0=;
 b=Tc8VtF5c+PCe0s+YBeDnjOP7WldtUnki2bq5xHWs5P2DKAcaULBwFexQtjlpwPVePQ
 sDNrGBrUVCvLnYThRT7olg9ieq1qoWMyhvPJJQ4JTU4QbsjLvXQ+UrhPij1KH14dSkH3
 7pr5BtknVrODwryLSpaobwwPsu8Zaomb+ay+0SghpvAh+QP59B0nDzmIENb1t+oUtG3i
 QJI6jG5QM3pxGKovmpcrE1BRNOj2h7wCxJQr4dbPckSTk1zRVvUTch59f+hrnWz3lu8V
 mIAwU7GHv25TSmYlI2No5ZAygXB70dbURlvi+Wwy2cJEYpyEx+HEAYCbAehvfcf6kr0D
 ikvA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:in-reply-to:user-agent;
 bh=WTiA/47OLPW/dJG8HtSADogK/KSMf1Qt8NwELTGoYO0=;
 b=VrdI2GRdk+a80BgEb69IBxXdZqp2gjaQf8EyPZuyAnyhEWI45wXp+5MaqXOyCab+qU
 vJ49x5C2ajl7VQdHXrkjQdmK673cbZSf8Im3fkDYwyiLZcs41JOpxPjHCZC8ZqQyjhU8
 RHAylEtVu7Ez9DT43ZhjWCMYa8PMgSUquhARCEancXCUB97pVI9hFxphKa5letV7W2ms
 i8Iy/GTONqNCDFIBrVOyJRhOZoI/1OFPfgvwPDMnwVVQeKh8IbPKk7nBAtWgza35OgIl
 CnHgzx4c1cMa7AFNB5yrRfJfL+pmLMUhnoSpuTGNFICfR1jaWP9Gs/Pkmt32moC9wF/j
 nc9A==
X-Gm-Message-State: AKaTC02sZOAS0ZWsLhy1t/NS5QD9LIQxxCGmiUdWq5sviyTl6GPAtL0/WQ65sSOhFlvw9Q==
X-Received: by 10.194.85.77 with SMTP id f13mr32151632wjz.187.1480575848403;
 Wed, 30 Nov 2016 23:04:08 -0800 (PST)
Received: from chaz.gmail.com ([90.201.137.34])
 by smtp.gmail.com with ESMTPSA id t82sm11527927wmd.17.2016.11.30.23.04.06
 (version=TLS1_2 cipher=AES128-SHA bits=128/128);
 Wed, 30 Nov 2016 23:04:07 -0800 (PST)
Date: Thu, 1 Dec 2016 07:04:05 +0000
From: Stephane Chazelas <stephane.chazelas@HIDDEN>
To: Paul Eggert <eggert@HIDDEN>
Subject: Re: bug#24924: GNU pr only working with singlebyte 1-width characters
Message-ID: <20161201070405.GB4922@HIDDEN>
References: <8737iyatfd.fsf@HIDDEN> <20161130113034.GA7005@HIDDEN>
 <69288e76-9f5d-c792-9bba-6a984461463b@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <69288e76-9f5d-c792-9bba-6a984461463b@HIDDEN>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Score: -0.0 (/)
X-Debbugs-Envelope-To: 24924
Cc: 24924 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.0 (/)

2016-11-30 18:37:05 -0800, Paul Eggert:
[...]
> In the meantime if you could submit a patch for the
> documentation that should fix the immediate documentation
> problem.
[...]

What about:

diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index cc85f22..6eb497b 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -1838,6 +1838,12 @@ For single
 column output no line truncation occurs by default.  Use @option{-W} option to
 truncate lines in that case.
 
+Please note that @command{pr} currently doesn't support multi-byte characters
+or non-ASCII characters that have a null or double width. If such characters
+occur in the input or column separators, column alignment may be off or lines
+may exceed the page width. There is also no provision to support bidirectional
+text.
+
 The following changes were made in version 1.22i and apply to later
 versions of @command{pr}:
 @c FIXME: this whole section here sounds very awkward to me. I




Information forwarded to bug-coreutils@HIDDEN:
bug#24924; Package coreutils. Full text available.

Message received at 24924 <at> debbugs.gnu.org:


Received: (at 24924) by debbugs.gnu.org; 1 Dec 2016 06:32:33 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Dec 01 01:32:32 2016
Received: from localhost ([127.0.0.1]:48029 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1cCKv2-0000Vd-MG
	for submit <at> debbugs.gnu.org; Thu, 01 Dec 2016 01:32:32 -0500
Received: from mail-wj0-f194.google.com ([209.85.210.194]:32868)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <stephane.chazelas@HIDDEN>) id 1cCKv0-0000VQ-LR
 for 24924 <at> debbugs.gnu.org; Thu, 01 Dec 2016 01:32:31 -0500
Received: by mail-wj0-f194.google.com with SMTP id kp2so25041747wjc.0
 for <24924 <at> debbugs.gnu.org>; Wed, 30 Nov 2016 22:32:30 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-disposition:in-reply-to:user-agent;
 bh=tlEVIp9g79pYyZNMDdHhpHyehi7ZLotGBFqgrbf3VLk=;
 b=he4B0+M8H+klHLQvBBFHzMw2tzqFD3i/rWwsYvOD8TK6HTjxa6LaF3zqu2v6XX5TIt
 J7U3sj5THVgKCkRWael2PHsvwCN2xcYhKDahE1QSiJ1jMWa0FWuAB2YwSjtW/3tDXNuE
 OUP2TjC4QJMZZxRgphM1aFYEksL0vEtoh+XIYrQbvViQJggKHP5glCNsgp3fSF5e9AIq
 FGkxFCEh+oC3bgK2Ozg1krMMMZufS73E57cwA32KaIaJgsETMg1eQ9siPQ6wr1CFyS0R
 HIyEeamBec95B6Lb4HnNMAeT+DeBHJFzo0fbYmoCxw8qaHTdc/xpJjGaC3InyCLI55U/
 hRGg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:in-reply-to:user-agent;
 bh=tlEVIp9g79pYyZNMDdHhpHyehi7ZLotGBFqgrbf3VLk=;
 b=ivxTyo06ZR0eqKgKOO9JMCgH2aHUIfVxgqkq4D1OuaVd9qQ+VEfDrlTcj7x39emhid
 J/6MegRFQgUyTgjGmaeoPs9bQ499L73T0Q9JTLUXks5hYAKSBN8bYxhrNgF936h8oPJV
 MRhH0qgE0c5QfISlXaWxszx7dGh/a1UoZ2EcGirGB7A5nT2nVkjLSbbAP+vlZXS4Poo0
 uZVyWeUDszC/3idZCmURmlnPaHIlggrywkb2pNueMO+0lkRFX+j/IJU+yq1WuZH3UGM9
 vkFucqqFa55MIVUnCJd8MF9MQxVwHD+IKHuYPr3vglyJiEULypOp0umBpobdLFvzWHud
 WkRQ==
X-Gm-Message-State: AKaTC01TzhyznSaGhSUDCngiKeYR/NcaFUGijHWN/gAw9LAXBIWXTYIto2q9xkFHQjLD9w==
X-Received: by 10.194.174.39 with SMTP id bp7mr31089134wjc.5.1480573944940;
 Wed, 30 Nov 2016 22:32:24 -0800 (PST)
Received: from chaz.gmail.com ([90.201.137.34])
 by smtp.gmail.com with ESMTPSA id v10sm76531814wji.29.2016.11.30.22.32.23
 (version=TLS1_2 cipher=AES128-SHA bits=128/128);
 Wed, 30 Nov 2016 22:32:23 -0800 (PST)
Date: Thu, 1 Dec 2016 06:32:22 +0000
From: Stephane Chazelas <stephane.chazelas@HIDDEN>
To: Paul Eggert <eggert@HIDDEN>
Subject: Re: bug#24924: GNU pr only working with singlebyte 1-width characters
Message-ID: <20161201063222.GA4922@HIDDEN>
References: <8737iyatfd.fsf@HIDDEN> <20161130113034.GA7005@HIDDEN>
 <69288e76-9f5d-c792-9bba-6a984461463b@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <69288e76-9f5d-c792-9bba-6a984461463b@HIDDEN>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Score: 0.5 (/)
X-Debbugs-Envelope-To: 24924
Cc: 24924 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.5 (/)

2016-11-30 18:37:05 -0800, Paul Eggert:
> On 11/30/2016 03:30 AM, Stephane Chazelas wrote:
> >That can also be seen as a POSIX conformance bug
> 
> Not really, as POSIX does not require support for UTF-8 (except in
> the pax utility, which is not part of coreutils).
[...]

POSIX does not require support for any charset. It only
specifies one locale (C/POSIX), doesn't specify the charset in
that locale  other than it should be a single byte charset that
covers the portable character set. Examples of such charsets are
ASCII, iso8859-x or EBCDIC. In practice, that tends to be ASCII
(except for some rare EBCDIC based IBM systems) as tha

But it does support a localisation API and allows system to
support other locales with other charsets. That API does support
multi-byte encodings, including stateful ones (though how they
are /defined/ is implementation defined for lock-shift ones and
in practice those are unworkable so I'd expect those would
eventually be removed from the standard). It doesn't require
compliant systems to have locales with multi-byte character sets,
but if they have (if they show up in the output of locale -a),
then they have to be supported throughout (as specified, for all
the utilities for instance).

Basically, on systems that have locales with multi-byte
encodings --UTF-8 or other-- (most Unix-like ones including GNU
systems like Debian), GNU pr (and many other GNU utilities) is
not POSIX compliant.

See
http://pubs.opengroup.org/onlinepubs/9699919799.2016edition/basedefs/V1_chap06.html

for details.

-- 
Stephane




Information forwarded to bug-coreutils@HIDDEN:
bug#24924; Package coreutils. Full text available.

Message received at 24924 <at> debbugs.gnu.org:


Received: (at 24924) by debbugs.gnu.org; 1 Dec 2016 02:37:14 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Nov 30 21:37:14 2016
Received: from localhost ([127.0.0.1]:47951 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1cCHFK-0003Vg-Hc
	for submit <at> debbugs.gnu.org; Wed, 30 Nov 2016 21:37:14 -0500
Received: from zimbra.cs.ucla.edu ([131.179.128.68]:34110)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eggert@HIDDEN>) id 1cCHFI-0003VT-Nr
 for 24924 <at> debbugs.gnu.org; Wed, 30 Nov 2016 21:37:13 -0500
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id 91199160074;
 Wed, 30 Nov 2016 18:37:06 -0800 (PST)
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032)
 with ESMTP id eENVVwQNfSFv; Wed, 30 Nov 2016 18:37:05 -0800 (PST)
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id E7D5F16008A;
 Wed, 30 Nov 2016 18:37:05 -0800 (PST)
X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026)
 with ESMTP id RVjv9EB-ShC1; Wed, 30 Nov 2016 18:37:05 -0800 (PST)
Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200])
 by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id D09BB160074;
 Wed, 30 Nov 2016 18:37:05 -0800 (PST)
Subject: Re: bug#24924: GNU pr only working with singlebyte 1-width characters
To: Stephane Chazelas <stephane.chazelas@HIDDEN>, 24924 <at> debbugs.gnu.org
References: <8737iyatfd.fsf@HIDDEN> <20161130113034.GA7005@HIDDEN>
From: Paul Eggert <eggert@HIDDEN>
Organization: UCLA Computer Science Department
Message-ID: <69288e76-9f5d-c792-9bba-6a984461463b@HIDDEN>
Date: Wed, 30 Nov 2016 18:37:05 -0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.4.0
MIME-Version: 1.0
In-Reply-To: <20161130113034.GA7005@HIDDEN>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: -2.9 (--)
X-Debbugs-Envelope-To: 24924
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.9 (--)

On 11/30/2016 03:30 AM, Stephane Chazelas wrote:
> That can also be seen as a POSIX conformance bug

Not really, as POSIX does not require support for UTF-8 (except in the 
pax utility, which is not part of coreutils).

It'd be nice if pr etc. could be made to work cleanly for UTF-8. In the 
meantime if you could submit a patch for the documentation that should 
fix the immediate documentation problem.





Information forwarded to bug-coreutils@HIDDEN:
bug#24924; Package coreutils. Full text available.

Message received at 24924 <at> debbugs.gnu.org:


Received: (at 24924) by debbugs.gnu.org; 30 Nov 2016 11:30:44 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Nov 30 06:30:44 2016
Received: from localhost ([127.0.0.1]:46997 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1cC364-0001y8-49
	for submit <at> debbugs.gnu.org; Wed, 30 Nov 2016 06:30:44 -0500
Received: from mail-wj0-f179.google.com ([209.85.210.179]:33117)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <stephane.chazelas@HIDDEN>) id 1cC362-0001xu-75
 for 24924 <at> debbugs.gnu.org; Wed, 30 Nov 2016 06:30:42 -0500
Received: by mail-wj0-f179.google.com with SMTP id xy5so170988788wjc.0
 for <24924 <at> debbugs.gnu.org>; Wed, 30 Nov 2016 03:30:42 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=date:from:to:subject:message-id:mime-version:content-disposition
 :content-transfer-encoding:user-agent;
 bh=T0CKMEZuAU0qL8uEC4tayK/LG3kf+F/djAh4U3a0+EE=;
 b=ec3QcbB2W4Lzfd2nv1/f8lAcaT5Hd9P+aC5fEMf6hi1fS87s79m4ErsckAo8wmEmrf
 g2TBoW7+QruVYnvk2+huUky9DWGbVi4atvTcnRGcmB60meE4PyzNxgMqVowNjc7yzNuQ
 J/hAc+JItV+xNgbXjULJElInYzODmKhtAlLWftuHpYBOqjc6WAGW2DIzjKSvpO6bM66y
 P7EcgzdmbvAMM7R+Q77IkXjn1bIorf4NleX4+CqSOTU09v9rrde5xr0KUtIwaT5VcXfI
 HbIYviEQlWC9rsTSRcoQvhg5fs8rn7ixjJgKtai1ccIM2OXVxossqbllWamGDlpXOgx7
 1S+g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:date:from:to:subject:message-id:mime-version
 :content-disposition:content-transfer-encoding:user-agent;
 bh=T0CKMEZuAU0qL8uEC4tayK/LG3kf+F/djAh4U3a0+EE=;
 b=NXgZpnQgQuaH8T2Q0QsRzoYTz5boUXJ2RYQkXwJwrUXoYUDpLoVnF4PYgzI9YZOI97
 +ZWQqNgr27KGKJwSqc7N16L4nyzjTdJKTe4sYB9FYqWBbD2N2WU8o0z5QeSAxEwtu0i8
 G1cKQ3O5Th6fGw6pUEIf4JAddG1Y7EE9OiML058aEheIWfn29/Tkm7/Q5kuAIkHZtNeQ
 JfDbQAQuIk/jb9UJ7vVdlqWlESf2CswgGgA9CNa0BtjrJ68psEdJJWvnnYMU+JDL5qcc
 otdsSUYifFfWtYYNUMEeXIMnCisgVMwpJfaoo/0np30mkMgfmXTYOLFDIaoOMp0TdU7s
 u2NQ==
X-Gm-Message-State: AKaTC02OUKK5BdYREJqBIPDHFsCZt0hc8pXN1QeMZFe3GN1ya3TAhDFDj2kGHEZ7tl1WnQ==
X-Received: by 10.194.26.133 with SMTP id l5mr28012695wjg.4.1480505436233;
 Wed, 30 Nov 2016 03:30:36 -0800 (PST)
Received: from chaz.gmail.com ([90.201.137.34])
 by smtp.gmail.com with ESMTPSA id g184sm7541910wme.23.2016.11.30.03.30.34
 for <24924 <at> debbugs.gnu.org>
 (version=TLS1_2 cipher=AES128-SHA bits=128/128);
 Wed, 30 Nov 2016 03:30:35 -0800 (PST)
Date: Wed, 30 Nov 2016 11:30:34 +0000
From: Stephane Chazelas <stephane.chazelas@HIDDEN>
To: 24924 <at> debbugs.gnu.org
Subject: GNU pr only working with singlebyte 1-width characters
Message-ID: <20161130113034.GA7005@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Score: -0.0 (/)
X-Debbugs-Envelope-To: 24924
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.0 (/)

Only arguing on the classification of this bug here.

Let's call a cat a cat. When something doesn't work as
documented, it's a bug, not a wishlist entry.

AFAICT, there's nothing in the GNU coreutils documentation that
states that pr only works on input that consists exclusively of
single-byte characters that are neither zero-width (though it
copes OK with ASCII BS and TAB) nor double-width (or on
ASCII-only input).

Today, UTF-8 is the most commonly  used character set, so it
even affects English text (where £ (the British currency symbol)
is encoded on two bytes in UTF-8 for instance), and even
US-English text like for the ‘quoting characters’ (3 bytes each
in UTF-8) now that ASCII ' has been demoted to just an
apostrophe.

That can also be seen as a POSIX conformance bug (though GNU
coreutils doesn't claim POSIX conformance, only "The GNU
utilities documented here are /mostly/ compatible with the
POSIX standard").

$ pr -tm --sep-string='|'  <(du --version) <(truncate --version)
du (GNU coreutils) 8.25            |truncate (GNU coreutils) 8.25
Copyright (C) 2016 Free Software Fo|Copyright (C) 2016 Free Software Fo
License GPLv3+: GNU GPL version 3 o|License GPLv3+: GNU GPL version 3 o
This is free software: you are free|This is free software: you are free
There is NO WARRANTY, to the extent|There is NO WARRANTY, to the extent
                                   |
Written by Torbjörn Granlund, David |Written by Pádraig Brady.
and Jim Meyering.                  |

-- 
Stephane




Information forwarded to bug-coreutils@HIDDEN:
bug#24924; Package coreutils. Full text available.

Message received at 24924 <at> debbugs.gnu.org:


Received: (at 24924) by debbugs.gnu.org; 12 Nov 2016 10:12:46 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Nov 12 05:12:46 2016
Received: from localhost ([127.0.0.1]:54441 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1c5VIk-0004De-6a
	for submit <at> debbugs.gnu.org; Sat, 12 Nov 2016 05:12:46 -0500
Received: from homie.mail.dreamhost.com ([208.97.132.208]:39943
 helo=homiemail-a8.g.dreamhost.com)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <jidanni@HIDDEN>) id 1c5VIi-0004DV-8s
 for 24924 <at> debbugs.gnu.org; Sat, 12 Nov 2016 05:12:44 -0500
Received: from homiemail-a8.g.dreamhost.com (localhost [127.0.0.1])
 by homiemail-a8.g.dreamhost.com (Postfix) with ESMTP id 001885F2067;
 Sat, 12 Nov 2016 02:12:40 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=jidanni.org; h=from:to:cc
 :subject:references:date:message-id:mime-version:content-type;
 s=jidanni.org; bh=sbrsDe4oeJhjSQdDLh0Q32K+Vnk=; b=eXoDVSxeuFMz8
 3RqrNHNLkNsU3I3OaP7bc3NlT/DpbhypmWSKqHhrATM6TRlXNwuxZNVLFzY5eQY+
 JulQ8rYNFClaNEv1+/IvdaJw7HzlV4e4AQMeVhetpiNw9lpq2c0Bf7KTCG50mLXO
 qW6k0MmnNE5AA6FuB2mGh4Nm/02pqU=
Received: from jidanni.org (111-246-98-191.dynamic.hinet.net [111.246.98.191])
 (using TLSv1 with cipher AES256-SHA (256/256 bits))
 (No client certificate requested)
 (Authenticated sender: jidanni@HIDDEN)
 by homiemail-a8.g.dreamhost.com (Postfix) with ESMTPSA id 14F6F5F2065;
 Sat, 12 Nov 2016 02:12:40 -0800 (PST)
From: =?utf-8?B?56mN5Li55bC8?= Dan Jacobson <jidanni@HIDDEN>
To: Assaf Gordon <assafgordon@HIDDEN>
Subject: Re: bug#24924: pr has no concept of wide characters
References: <8737iyatfd.fsf@HIDDEN>
Date: Sat, 12 Nov 2016 18:12:36 +0800
Message-ID: <878tsp6m7f.fsf@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Score: 0.5 (/)
X-Debbugs-Envelope-To: 24924
Cc: 24924 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.5 (/)

>>>>> "AG" == Assaf Gordon <assafgordon@HIDDEN> writes:

AG> I would very much appreciate if you could help me test it as there
AG> are many edge-cases with multibyte support and wide-characters.

Sure but you need to send me a .deb or
$ which pr|xargs file
/usr/bin/pr: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux
2.6.32, BuildID[sha1]=14376d20f6383ec9348da986ecc693c6bb45a0ee, stripped

AG> As a curiosity,
AG> are you using UTF-8 locales exclusively, or do you have experience
AG> with Shift-JIS or EUC-JP locales?

Nope I just use zh_TW.utf8 all the time.




Information forwarded to bug-coreutils@HIDDEN:
bug#24924; Package coreutils. Full text available.

Message received at 24924 <at> debbugs.gnu.org:


Received: (at 24924) by debbugs.gnu.org; 11 Nov 2016 16:36:29 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Nov 11 11:36:29 2016
Received: from localhost ([127.0.0.1]:54127 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1c5EoX-0000d9-2Y
	for submit <at> debbugs.gnu.org; Fri, 11 Nov 2016 11:36:29 -0500
Received: from mail-qk0-f196.google.com ([209.85.220.196]:35948)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <assafgordon@HIDDEN>) id 1c5EoV-0000cx-Jw
 for 24924 <at> debbugs.gnu.org; Fri, 11 Nov 2016 11:36:28 -0500
Received: by mail-qk0-f196.google.com with SMTP id h201so2440924qke.3
 for <24924 <at> debbugs.gnu.org>; Fri, 11 Nov 2016 08:36:27 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=subject:to:references:from:cc:message-id:date:user-agent
 :mime-version:in-reply-to:content-transfer-encoding;
 bh=wgsGBLjZdXlo8ypwmQBfC7u0qVeW2X6JjzVEv3pgJKQ=;
 b=dsyu91PboTKqM+K7TT7K/j3ghzzx1BBKHvwcGIDEbE/L4Uu7NJcLzqf7/obeNttCCD
 S9Pw9TkPZCJ7P+rRfxQXkFxlAIIR0L3dnLHIqp5bNhUalZeKOioy5BeYT5iGiTVj4u4f
 miTrcTeoIU33V/XeuqSHxpW9HRBwwdudkOLR6TsCMRaMz4IBv7K3laxCnnKTJL6wzLF0
 IJrnINXMwi3B7YQnrwr2XQSMcGmeU6XvDNw9UsX0yOhBMOWs2nfT/NF645n+QvGdb7II
 M+Gu3WMRSRh3/q+sV7jcOdEjEeDid2LRPtYsC4GCvRbvb6kzZ8cYH3dbv07Sk2XbrjMV
 Ug8Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:subject:to:references:from:cc:message-id:date
 :user-agent:mime-version:in-reply-to:content-transfer-encoding;
 bh=wgsGBLjZdXlo8ypwmQBfC7u0qVeW2X6JjzVEv3pgJKQ=;
 b=OCReW7H75aiHWSDADxh1kCIgiszUPekNtUpiQY7MUGLnjwesjxrf7OixcDKyQUBsll
 8a32nDc2M9k7v0VTzVK1OnrgsLgp4l7DdYzIMvNeIUxoGKXb6TiIEdbupPK0tUAaftx0
 x6w4c0TCqInfA98ZpL9udXzuCERBMOnRFAPKaA1vr5VzoXR/uojXRchQgzdZ3XeXQjPD
 lkApUOt+dUlgZPXx7uGHP4AgBQJXiD7c2Cb+amJJViDASO9XUddH+SwWUWuW4F+NmzNb
 mfxQrwLi7aOSixRXMPMYSPY5asb9+J7Eom1oRQWm58lz1NL0FPNLUTY3blPK0qsMG/1R
 46OA==
X-Gm-Message-State: ABUngvewJwzQn0svyDgPYtpHtJkqfkJZ0K0tcsvjsbciYLnUbvAe85kdsPA19Ow3z5dI4A==
X-Received: by 10.55.136.4 with SMTP id k4mr4017258qkd.57.1478882181985;
 Fri, 11 Nov 2016 08:36:21 -0800 (PST)
Received: from disco.erlich.nygenome.org ([69.74.14.178])
 by smtp.gmail.com with ESMTPSA id n191sm445654qke.19.2016.11.11.08.36.21
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Fri, 11 Nov 2016 08:36:21 -0800 (PST)
Subject: Re: bug#24924: pr has no concept of wide characters
To: =?UTF-8?B?56mN5Li55bC8IERhbiBKYWNvYnNvbg==?= <jidanni@HIDDEN>
References: <8737iyatfd.fsf@HIDDEN>
From: Assaf Gordon <assafgordon@HIDDEN>
Message-ID: <c758a694-ba58-8e9b-7d97-81c0c44bff41@HIDDEN>
Date: Fri, 11 Nov 2016 11:36:20 -0500
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.3.0
MIME-Version: 1.0
In-Reply-To: <8737iyatfd.fsf@HIDDEN>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Score: -0.2 (/)
X-Debbugs-Envelope-To: 24924
Cc: 24924 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.2 (/)

severity 24924 wishlist
tags 24924 wishlist notabug
thanks

Hello Dan,

On 11/11/2016 11:10 AM, 積丹尼 Dan Jacobson wrote:
> The pr documentation (man, info) doesn't mention how it has no concept
> of wide characters.
> $ pr -m --sep-string='^^^'  file file

Indeed, most of the current coreutils programs do not support wide or multi-byte characters correctly.
The current official implementation does not support it (which is why I marked this item as 'wishlist' and not a bug).
On RedHat systems, there is the 'i18n' patch, which adds some support but also introduces some problematic issues:
   https://github.com/pixelb/coreutils/tree/i18n

However, there is an active effort to make all of them multibyte aware.
The latest updates are (in reverse chronological order, these are somewhat long threads):
   http://lists.gnu.org/archive/html/coreutils/2016-09/msg00026.html
   http://lists.gnu.org/archive/html/coreutils/2016-09/msg00011.html
   http://lists.gnu.org/archive/html/coreutils/2016-07/msg00013.html

'cut' and 'expand' were the first two programs I worked on.
'pr' is definitely on the list - once I have a proof-of-concept working, I would very much appreciate if you could help me test it as there are many edge-cases with multibyte support and wide-characters.

As a curiosity,
are you using UTF-8 locales exclusively, or do you have experience with Shift-JIS or EUC-JP locales?


I'm leaving this ticket open, and welcome discussion and comments.
regards,
  - assaf


P.S.
The usual disclaimer applies: there is currently no ETA for multibyte support in coreutils.







Information forwarded to bug-coreutils@HIDDEN:
bug#24924; Package coreutils. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 11 Nov 2016 16:11:09 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Nov 11 11:11:09 2016
Received: from localhost ([127.0.0.1]:54100 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1c5EQ0-0008Qw-S7
	for submit <at> debbugs.gnu.org; Fri, 11 Nov 2016 11:11:09 -0500
Received: from eggs.gnu.org ([208.118.235.92]:58545)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <jidanni@HIDDEN>) id 1c5EPz-0008Qj-0c
 for submit <at> debbugs.gnu.org; Fri, 11 Nov 2016 11:11:07 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <jidanni@HIDDEN>) id 1c5EPs-0001kz-M0
 for submit <at> debbugs.gnu.org; Fri, 11 Nov 2016 11:11:01 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: *
X-Spam-Status: No, score=1.3 required=5.0 tests=BAYES_50,LOTS_OF_MONEY,
 RCVD_IN_SORBS_SPAM,T_DKIM_INVALID autolearn=disabled version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:55460)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <jidanni@HIDDEN>) id 1c5EPs-0001kt-ID
 for submit <at> debbugs.gnu.org; Fri, 11 Nov 2016 11:11:00 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:49516)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <jidanni@HIDDEN>) id 1c5EPr-0000rX-7q
 for bug-coreutils@HIDDEN; Fri, 11 Nov 2016 11:11:00 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <jidanni@HIDDEN>) id 1c5EPo-0001jz-0w
 for bug-coreutils@HIDDEN; Fri, 11 Nov 2016 11:10:59 -0500
Received: from homie.mail.dreamhost.com ([208.97.132.208]:38808
 helo=homiemail-a14.g.dreamhost.com)
 by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <jidanni@HIDDEN>) id 1c5EPn-0001j4-K8
 for bug-coreutils@HIDDEN; Fri, 11 Nov 2016 11:10:55 -0500
Received: from homiemail-a14.g.dreamhost.com (localhost [127.0.0.1])
 by homiemail-a14.g.dreamhost.com (Postfix) with ESMTP id 5A68739207E
 for <bug-coreutils@HIDDEN>; Fri, 11 Nov 2016 08:10:50 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=jidanni.org; h=from:to
 :subject:date:message-id:mime-version:content-type:
 content-transfer-encoding; s=jidanni.org; bh=9EQ+kaUuGuhlHV0tPS6
 Ir3N3nKg=; b=VQAxtyCISnY1H4UKzZu4+C2wap7YrnJGEDMT6ayQ56emu7ToW34
 eG/S00pUMuOGZc2dHGHW8mYdP5INH+1Jg7+q/p0xXmgHmSgAP5BC5/CUxE9BOFms
 EKzGcUI675WufzD0ZPtetog3Uj1A4E16d36mzuzsf6fJfNLY6J0vW8AQ=
Received: from jidanni.org (111-246-99-93.dynamic.hinet.net [111.246.99.93])
 (using TLSv1 with cipher AES256-SHA (256/256 bits))
 (No client certificate requested)
 (Authenticated sender: jidanni@HIDDEN)
 by homiemail-a14.g.dreamhost.com (Postfix) with ESMTPSA id B0FBC392076
 for <bug-coreutils@HIDDEN>; Fri, 11 Nov 2016 08:10:49 -0800 (PST)
From: =?utf-8?B?56mN5Li55bC8?= Dan Jacobson <jidanni@HIDDEN>
To: bug-coreutils@HIDDEN
Subject: pr has no concept of wide characters
Date: Sat, 12 Nov 2016 00:10:46 +0800
Message-ID: <8737iyatfd.fsf@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no
 timestamps) [generic]
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -4.5 (----)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -4.5 (----)

The pr documentation (man, info) doesn't mention how it has no concept
of wide characters.

$ pr -m --sep-string=3D'^^^'  file file

2016-11-12 00:06                                                  Page 1


<!DOCTYPE HTML PUBLIC "-//W3C//DTD^^^<!DOCTYPE HTML PUBLIC "-//W3C//DTD
"http://www.w3.org/TR/html4/strict^^^"http://www.w3.org/TR/html4/strict
<html lang=3D"zh-tw">               ^^^<html lang=3D"zh-tw">
<head>                            ^^^<head>
 <meta http-equiv=3D"Content-Type" c^^^ <meta http-equiv=3D"Content-Type"=
 c
 "text/html; charset=3Dutf-8">      ^^^ "text/html; charset=3Dutf-8">
 <meta name=3D"viewport" content=3D"wi^^^ <meta name=3D"viewport" content=
=3D"wi
 <title>My groups ordered by ...</^^^ <title>My groups ordered by ...</
 <base href=3D"https://www.facebook.^^^ <base href=3D"https://www.faceboo=
k.
</head>                           ^^^</head>
<body>                            ^^^<body>
 <dl>                             ^^^ <dl>
  <dt>"=E5=90=8C=E5=BF=97|Queer|Gdi"</dt>               ^^^  <dt>"=E5=90=8C=
=E5=BF=97|Queer|Gdi"</dt>
   <dd>  5 o =E5=8F=B0=E7=81=A3=E5=90=8C=E5=BF=97=E9=81=8A=E8=A1=8C=E8=81=
=AF=E7=9B=9F Taiwan LGBT Pride Co^^^   <dd>  5 o =E5=8F=B0=E7=81=A3=E5=90=
=8C=E5=BF=97=E9=81=8A=E8=A1=8C=E8=81=AF=E7=9B=9F Taiwan LGBT Pride Co
   <dd>  0 o =E5=8F=B0=E7=81=A3=E5=90=8C=E5=BF=97=E4=BA=A4=E5=8F=8B=E8=81=
=AF=E7=9B=9F 301797916498866<BR> ^^^   <dd>  0 o =E5=8F=B0=E7=81=A3=E5=90=
=8C=E5=BF=97=E4=BA=A4=E5=8F=8B=E8=81=AF=E7=9B=9F 301797916498866<BR>
   <dd> 25 o =E6=88=91=E6=98=AF(=E7=9B=B4)=E5=90=8C=E5=BF=97=EF=BC=8C=E6=88=
=91=E5=BE=88=E9=A9=95=E5=82=B2! 185779952675<BR> ^^^       <dd> 25 o =E6=88=
=91=E6=98=AF(=E7=9B=B4)=E5=90=8C=E5=BF=97=EF=BC=8C=E6=88=91=E5=BE=88=E9=A9=
=95=E5=82=B2! 185779952675<BR>
   <dd> 25 o =E5=8F=B0=E7=81=A3=E9=85=B7=E5=85=92=E6=AC=8A=E7=9B=8A=E6=8E=
=A8=E5=8B=95=E8=81=AF=E7=9B=9F Taiwan Gender Queer ^^^       <dd> 25 o =E5=
=8F=B0=E7=81=A3=E9=85=B7=E5=85=92=E6=AC=8A=E7=9B=8A=E6=8E=A8=E5=8B=95=E8=81=
=AF=E7=9B=9F Taiwan Gender Queer
  <dt>"=E6=80=A7=E5=88=A5|=E8=9D=B6=E5=9C=92" BUT NOT "TV"</dt>       ^^^=
  <dt>"=E6=80=A7=E5=88=A5|=E8=9D=B6=E5=9C=92" BUT NOT "TV"</dt>
   <dd>  0 c =E8=B7=A8=E6=80=A7=E5=88=A5=E8=88=87=E5=A5=B3=E6=80=A7=E4=B8=
=BB=E7=BE=A9 Transgender&amp;Femi^^^   <dd>  0 c =E8=B7=A8=E6=80=A7=E5=88=
=A5=E8=88=87=E5=A5=B3=E6=80=A7=E4=B8=BB=E7=BE=A9 Transgender&amp;Femi
   <dd>  2 c =E4=B8=AD=E9=83=A8=E6=80=A7=E5=88=A5=E5=9C=98=E9=AB=94=E8=81=
=AF=E7=9B=9F 293589073985313<BR> ^^^   <dd>  2 c =E4=B8=AD=E9=83=A8=E6=80=
=A7=E5=88=A5=E5=9C=98=E9=AB=94=E8=81=AF=E7=9B=9F 293589073985313<BR>
   <dd>  1 o =E5=8F=B0=E7=81=A3TG=E8=9D=B6=E5=9C=92 320448571355058<BR^^^=
   <dd>  1 o =E5=8F=B0=E7=81=A3TG=E8=9D=B6=E5=9C=92 320448571355058<BR
   <dd>  0 o =E4=B8=AD=E8=8F=AF=E6=B0=91=E5=9C=8B=E8=B7=A8=E6=80=A7=E5=88=
=A5=E8=80=85=E7=94=9F=E6=B4=BB=E6=AC=8A=E7=9B=8A=E4=BF=83=E9=80=B2=E5=90=88=
=E4=BD=9C=E7=A4=BE=E8=A8=8A=E6=81=AF=E7=99=BC=E5=B8=83=E7=AB=99 252346365=
161476<BR> ^^^       <dd>  0 o =E4=B8=AD=E8=8F=AF=E6=B0=91=E5=9C=8B=E8=B7=
=A8=E6=80=A7=E5=88=A5=E8=80=85=E7=94=9F=E6=B4=BB=E6=AC=8A=E7=9B=8A=E4=BF=83=
=E9=80=B2=E5=90=88=E4=BD=9C=E7=A4=BE=E8=A8=8A=E6=81=AF=E7=99=BC=E5=B8=83=E7=
=AB=99 252346365161476<BR>
   <dd>  3 o =E6=80=A7=E5=88=A5=E4=B8=8D=E6=98=8E=E9=97=9C=E6=87=B7=E5=8D=
=94=E6=9C=83(Beyond Gender) 17160^^^   <dd>  3 o =E6=80=A7=E5=88=A5=E4=B8=
=8D=E6=98=8E=E9=97=9C=E6=87=B7=E5=8D=94=E6=9C=83(Beyond Gender) 17160
   <dd>  0 o =E5=81=BD=E7=99=BE=E5=90=88=E8=88=87=E5=81=BD=E5=A8=98=E3=80=
=81=E8=B7=A8=E6=80=A7=E5=88=A5=E5=80=91=E7=9A=84=E5=93=B2=E5=AD=B8=E3=80=81=
=E6=80=9D=E6=83=B3=E4=BA=A4=E6=B5=81=E7=A4=BE=E7=BE=A4 810661859077873<BR=
> ^^^ <dd>  0 o =E5=81=BD=E7=99=BE=E5=90=88=E8=88=87=E5=81=BD=E5=A8=98=E3=
=80=81=E8=B7=A8=E6=80=A7=E5=88=A5=E5=80=91=E7=9A=84=E5=93=B2=E5=AD=B8=E3=80=
=81=E6=80=9D=E6=83=B3=E4=BA=A4=E6=B5=81=E7=A4=BE=E7=BE=A4 810661859077873=
<BR>
$ pr --version
pr (GNU coreutils) 8.25




Acknowledgement sent to 積丹尼 Dan Jacobson <jidanni@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-coreutils@HIDDEN. Full text available.
Report forwarded to bug-coreutils@HIDDEN:
bug#24924; Package coreutils. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Sun, 28 Oct 2018 07:30:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.