GNU bug report logs - #33775
multibyte: fold: multi-byte sequences as separate columns

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: coreutils; Severity: wishlist; Reported by: Michael Siegel <msi@HIDDEN>; dated Mon, 17 Dec 2018 02:15:01 UTC; Maintainer for coreutils is bug-coreutils@HIDDEN.

Message received at 33775 <at> debbugs.gnu.org:


Received: (at 33775) by debbugs.gnu.org; 23 Dec 2018 06:04:03 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Dec 23 01:04:03 2018
Received: from localhost ([127.0.0.1]:60454 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1gawrq-0007w7-QO
	for submit <at> debbugs.gnu.org; Sun, 23 Dec 2018 01:04:03 -0500
Received: from mail-pf1-f170.google.com ([209.85.210.170]:46578)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <assafgordon@HIDDEN>)
 id 1gawrm-0007vG-Hc; Sun, 23 Dec 2018 01:03:58 -0500
Received: by mail-pf1-f170.google.com with SMTP id c73so4491646pfe.13;
 Sat, 22 Dec 2018 22:03:58 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=subject:to:references:from:message-id:date:user-agent:mime-version
 :in-reply-to:content-language:content-transfer-encoding;
 bh=qPT0mitvhMz6h/Dfgh5OnjNx3/sINGUSRHuGc7OS2JQ=;
 b=YOuKw9RyMs24T1X3UKcLRJn+EG1pdxfYL7pHJXrPslpRLia508E8iILl/mnkX4gxho
 R3qBdHHlVr3q2AE6Qo0baJjwCj4Uq7hPD3tiPRNbh6g0Xno8GiBfkfOZ+xwA3cMEGTTx
 EFfKK71QsEEbfO3qYAZ4+EimB5S8t6p47P3sgxkXuHgOpNXeOF7fyFqub+9lwrUMXrO5
 +yWjFEbFETkwoVjVzAl8jcnErJgPJEhDZEoOjVEuhY+HNhyu+zo335xBTWSiTMePwORa
 9Tgq0h4/911NpZThb9yUmTmltHIkeVH8eXhDayCTMmv/PwV5ynIaQYmoWGG9P17W4Qtx
 UKwg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:subject:to:references:from:message-id:date
 :user-agent:mime-version:in-reply-to:content-language
 :content-transfer-encoding;
 bh=qPT0mitvhMz6h/Dfgh5OnjNx3/sINGUSRHuGc7OS2JQ=;
 b=cQVu37K9JWN3EajdEgRjgFXdi5EioaC8SUZFY5pQNpTkzeLpolbvNSNG7UfS0DXl6a
 wThcNUs1BkDrfYG4PY29lCMLEUC2+5S76vjuJFr1tyB6bWR1OnPxLcbHvwWUHlMIOcrS
 fDTComaeroEO2E5H8qbo0ajRwWHyLHO7Wqcup7aAIHQen7frJyQ0Wsrw3ZTWhvV7H1rh
 mskfQP3nSlDOD6L9NpzIs0jTZuxdz4cP0yUNuYh3iP8zJyR0f0XUctLhgPj+9df/OUkT
 6IivkABnc1top+A5Sa6zrc9soI0OG9AN69UaCAs8xHHDZ+cb2IYUcbYmEeCMYWe2ebnx
 w4LA==
X-Gm-Message-State: AJcUukc1LQxYoMaz8OhGB8xX9toap5PXYBiz9YdFQyp7fs2iLRKuHWPZ
 1c+53Wl+u+0cBuFOnKx5/vfKPyx4
X-Google-Smtp-Source: ALg8bN60EaFkHcdQROg26EDT8AKjx8rtvlw8zr5XsndbpeUpGbcr9nEzZC8K8OidAt3mvBOMfg8OYw==
X-Received: by 2002:a63:ba4d:: with SMTP id l13mr8294855pgu.194.1545545032160; 
 Sat, 22 Dec 2018 22:03:52 -0800 (PST)
Received: from tomato.housegordon.com (moose.housegordon.com. [184.68.105.38])
 by smtp.googlemail.com with ESMTPSA id
 4sm52445335pfq.10.2018.12.22.22.03.50
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Sat, 22 Dec 2018 22:03:51 -0800 (PST)
Subject: Re: bug#33775: fold: counting multi-byte utf-8 sequences as separate
 columns
To: Michael Siegel <msi@HIDDEN>, 33775 <at> debbugs.gnu.org
References: <cb32cf5c-2f40-ab75-2c03-113bcd19d7ad@HIDDEN>
From: Assaf Gordon <assafgordon@HIDDEN>
Message-ID: <4e9b7e51-4020-133d-0b3f-0cc89076a062@HIDDEN>
Date: Sat, 22 Dec 2018 23:03:50 -0700
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.3.0
MIME-Version: 1.0
In-Reply-To: <cb32cf5c-2f40-ab75-2c03-113bcd19d7ad@HIDDEN>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Spam-Score: -0.0 (/)
X-Debbugs-Envelope-To: 33775
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

severity 33775 wishlist
retitle 33775 multibyte: fold: multi-byte sequences as separate columns
stop

Hello,

On 2018-12-16 6:32 p.m., Michael Siegel wrote:
> I've just discovered an odd behavior of `fold' while trying to wrap a
> piece of text containing phonetic characters.
> 
> Take the following line, for example:

Thank you for reporting this issue and
providing clear, reproducible examples.

Adding complete multibyte/utf8 support to all coreutils
programs is an on-going effort.

I'm marking this as a "wishlist" item, which will remain
open until we complete the implementation.

Related multibyte items are listed here (with "multibyte" prefix):
https://debbugs.gnu.org/cgi/pkgreport.cgi?which=pkg&data=coreutils



regards,
  - assaf






Information forwarded to bug-coreutils@HIDDEN:
bug#33775; Package coreutils. Full text available.
Changed bug title to 'multibyte: fold: multi-byte sequences as separate columns' from 'fold: counting multi-byte utf-8 sequences as separate columns' Request was from Assaf Gordon <assafgordon@HIDDEN> to control <at> debbugs.gnu.org. Full text available.
Severity set to 'wishlist' from 'normal' Request was from Assaf Gordon <assafgordon@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 17 Dec 2018 02:14:30 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Dec 16 21:14:30 2018
Received: from localhost ([127.0.0.1]:50731 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1gYiQP-0001CB-KH
	for submit <at> debbugs.gnu.org; Sun, 16 Dec 2018 21:14:30 -0500
Received: from eggs.gnu.org ([208.118.235.92]:47016)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <msi@HIDDEN>) id 1gYhmS-0000AN-KO
 for submit <at> debbugs.gnu.org; Sun, 16 Dec 2018 20:33:13 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <msi@HIDDEN>) id 1gYhmM-0007dJ-G1
 for submit <at> debbugs.gnu.org; Sun, 16 Dec 2018 20:33:07 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_20 autolearn=disabled
 version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:33694)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <msi@HIDDEN>) id 1gYhmL-0007bw-KN
 for submit <at> debbugs.gnu.org; Sun, 16 Dec 2018 20:33:06 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:56626)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <msi@HIDDEN>) id 1gYhmK-0003aN-Nk
 for bug-coreutils@HIDDEN; Sun, 16 Dec 2018 20:33:05 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <msi@HIDDEN>) id 1gYhmF-0007UU-KI
 for bug-coreutils@HIDDEN; Sun, 16 Dec 2018 20:33:04 -0500
Received: from poseidon.malbolge.net ([5.45.108.48]:34886)
 by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <msi@HIDDEN>) id 1gYhmF-0007RP-92
 for bug-coreutils@HIDDEN; Sun, 16 Dec 2018 20:32:59 -0500
Received: from hermes.malbolge.net (hermes.malbolge.net [192.168.123.201])
 by poseidon.malbolge.net (OpenSMTPD) with ESMTPSA id 7a4d17b6
 (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256:NO)
 for <bug-coreutils@HIDDEN>; Mon, 17 Dec 2018 02:32:57 +0100 (CET)
Received: from kerberos.malbolge.net ([192.168.123.128] helo=127.0.0.1)
 by hermes.malbolge.net with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.89) (envelope-from <msi@HIDDEN>) id 1gYhmC-0007dp-Ce
 for bug-coreutils@HIDDEN; Mon, 17 Dec 2018 02:32:56 +0100
To: bug-coreutils@HIDDEN
From: Michael Siegel <msi@HIDDEN>
Subject: fold: counting multi-byte utf-8 sequences as separate columns
Openpgp: preference=signencrypt
Autocrypt: addr=msi@HIDDEN; prefer-encrypt=mutual; keydata=
 mQINBFtDUbwBEAC3pgB2zgT1GBe8wwTuzRdKMIWnnI1HHVQVT8MuURvlQcHTBOM9KwV7s6hl
 RF8gwyBYImptTGD/zCWckuIC8TBWarqslKCLi4r6FUwmS410fCSqIQbD2m0kV8wyz0XUuULU
 v6E6aICqmgrEMJXgBrPtoK6Euvc9X9iJjhP+eC6EJ+lLp2snkn9ttAnaBGKupZzGa8X2q/de
 eZl9T0LoqMoIuClzX1v+VMFv9Hmc1gj9SQ5EiYyR+6odzXLaSQgLMVnIzfQ0MuJQCeGiZyWj
 oQK8IXAM2/R+94M79yzYNSbNp85nzQ+7vqsMH19f/+4Z6I8I/9fZjynB5ykLJtnSxvsBp4NO
 W04iYSxppctEmX7K4wlb2DNK6+wsH0GfoLSEDcsE3gLoQfb8Va2UASGXzIwcHxn4mEfveQ2l
 a5spYKr0xkMbiA4ETPdzgsy1tHKEaSdVk80uYenBxmeUS05FjRR7xGE2jdCmJs3y8CoMH3KS
 +3Og9auBgbKK25qETrmEbbVAAMtHGuNaasOS+nIXvVyfUHEXSEYvCcn3HFHoiqKZIsOBMxlK
 3CvPPPI8EL33y0+VBcSDE1VNw2MrnooSccHA4F7ecQPjrrRdCNOF/egJucpqx36rBBNz45vB
 ZqWnbOntGdfd5dHCz9yRpwOy5/2VUwlCL8Zs7Gw0XgSFW4tzpQARAQABtCFNaWNoYWVsIFNp
 ZWdlbCA8bXNpQG1hbGJvbGdlLm5ldD6JAlQEEwEIAD4WIQQgFAgh+NDy+OS/sgzouIF4zeFu
 ZAUCW0NRvAIbAwUJA8JnAAULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgAAKCRDouIF4zeFuZNUC
 D/922JpXlJ/4Ny4PT0VZF/Ri7GkIwqMxr8nq/3+J0aLxtE5j12XmCZgupX7yCnSua/D3XnKE
 PKOYdNBa+gA2LJ7YtAZs8H8FCWeO052oGQwUqXf/qftax0vJzeSZLikmeRzcfmGaHOZH+99R
 UYI3b4zZTNavq35yLfZnVe2+VNVcReSTt9mEjiAf1M9JT2GNgmim3w/42ba/ol5Yx2zUhr0J
 pDEaS0a1JWzI4ttlhMDThGGAejUgN3aZP4/JnEmGhXQdAYlP2rKye0/wrQTIjCp9hNZ2/8xP
 D/SgxXDj+ePyyyL8nwj2BU/m6jydQLOSyJ5Uvq7SF96uF6OexyO7D/2bp/F4LIND7XVy/AUr
 bIZzpjKVblcFeOTQdtYDUjjhy9eV1eVa6ywbO32mEywrsLuY6R0BoeDxR0IWU7xT1AIWGRpD
 GZwM27X4iDiJjm1Hej0NAa+EPKTNQw0YOwVtM3cWRUNUq5my+DXVVYcT0n0tYdObWuhi+A88
 C4aWypoWw6jZIDi74bwgFg/zQAL1AnqGxTjICWcDdG14GWgaVyAGJWDR937QtVg3sxni6qZb
 BkgUwAYnOoo6Wbq2wPeJ9293agz3Aj+paILyCFFGwnRGPHN1FcD0gmdjcJMaTWEVbS0q4B76
 LvmHnJZXOk85pUcAx5qowoyZMt4poQRX6hDPfbkCDQRbQ1G8ARAA0P7k87V2rNjkHu/7TBoT
 8mSuEZTTtcmMKa8E+tErRpc4XQnDZUv4bzxOMMjWFlSIV6mQ8f3ZVA1LF86zOQUWbISp+b2Q
 K3aKDB83Pbsclt45CUKd1TZNkQQGxtNLU1w0Sy3266pV1GEMxkadvoqJWQEpu4KkMzAaGlud
 cHHi1TCkbJa0bmwaRbT1eirtAUEqffY6olRaM7UApeDgazSS1VlZsP4DwqoK4binSdzwe+3S
 +Bqm8Gi2zjtl7cG6aWIA74tyYdWF8Mec7JY3KIu6rjtRvAznm7Y3R8RW4T4eRrujt8u+bwNA
 tSjkFCH8mmO/w7NaVAZ4hDUNUCAT9bfYJWWZ3H8T80DQgOlBIMXt5F3ahHFVAIoNWbofJrAJ
 NAM4icFE6WeWEDZVh3pCMoFftEIrQHahOSkITkDwFgO1WkBy5HN3hSDPJvpMiylRKds7Ftiw
 LcA5sqWeB0nozAPKsp7Et70rH+AUFBpECDKKAJwnGkoBVcm1G5lOYFfnsYpD4Faxn2vIP8pN
 rluAZjvZQ2038Jb+cYaOdGeD7Cr6j598LYuDm62juiv9itwV6MHR+aokVbEYGwe5HnQHlWFh
 Gdj/Vx/j7CsnX9rcknWeFne560f7wpPiUfp4neM2/uSSvGHaZXONlMTtPBBY4TEnrZWnceNA
 xAl6HHF6bMVyhzsAEQEAAYkCPAQYAQgAJhYhBCAUCCH40PL45L+yDOi4gXjN4W5kBQJbQ1G8
 AhsMBQkDwmcAAAoJEOi4gXjN4W5kgqcQAKmjkQZJZmmA60fePgyUgKAtAhiPQrwC6+LD3hxw
 bTT1AF8OqG4bbTqu/mWhIuoY67X35rb+4JySp3ZLFp0NJzTNwsuHh9eFi8/dm16hydGrp6zV
 I1s5D+gWKW5YRNxbEJYxzYLRDyUkPLnzSRM8N1HnX3ElA0UBAkXWVZy3hnJUihPcWuVUipEP
 59qENCK6YO3ii/2drbNWhpOCXgc2cHd9BiICUOvcAOwfj/n78dSj+azGQAt5PTa1c4wJC8o5
 CMl5MvybttV2TzHA/r2rMH95/A3kqSuTmm4IP8EAe0uMLdmCX45KYzKdjcW20zEa48AEo6xD
 0ifr1sOs12+B1IsouEJeQnPEz5pwblGNwuADw4W5+f6DTju+SCA7BvbvMEZlnVDAivWwQskX
 HBdry3Xlo2ioUajEFNDwrx1ZI0Wp5X8wDMuwKXK1o7qgr9ZmKkA2ZFG9CjDK3pmwZ2V5oSdC
 R4wo7rQNkh+wbuhb3J4INSyoTlqEuD6FKxKBIXvlb3YaAUnViiA9tPDVsTcfB0GLDalYYShq
 JjErV3kXXb30sIYL0KLSCZFJW1uSLB9xc8WTkAs+4acJu1LCrY5uLhHIbZ3SC5dB6XnCCDKB
 WvUCyxYuOWK7fRWCk56J+xc0vTXYGN+Vnr+90hHyhtHsNYmDFROzIGvgh3SHDjzEXAmT
Message-ID: <cb32cf5c-2f40-ab75-2c03-113bcd19d7ad@HIDDEN>
Date: Mon, 17 Dec 2018 02:32:55 +0100
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:60.0) Gecko/20100101
 Thunderbird/60.3.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Language: de-DE
Content-Transfer-Encoding: 8bit
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
 recognized.
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -4.1 (----)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Sun, 16 Dec 2018 21:14:29 -0500
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -5.1 (-----)

Hello,

I've just discovered an odd behavior of `fold' while trying to wrap a
piece of text containing phonetic characters.

Take the following line, for example:

Tcl (pronounced "tickle" or tee cee ell /ˈtiː siː ɛl/) is a high-level,

It is 71 characters long. Still, running

echo "Tcl (pronounced "tickle" or tee cee ell /ˈtiː siː ɛl/) is a
high-level," | fold -w 72 -s

produces

Tcl (pronounced tickle or tee cee ell /ˈtiː siː ɛl/) is a
high-level,

I've had someone test this with FreeBSD's `fold', which didn't behave
that way. Instead, it filled out the line as expected.

Further investigation by developers of Adélie Linux revealed that GNU's
`fold' is counting multi-byte utf-8 sequences (in this case, the
phonetic characters) as separate columns:

awilcox on gwyn [pts/11 Sun 16 19:01] ~: cat testing.txt
1234567890 234567890 234567890 234567890 234567890 234567890 234567890
/ˈtiː siː ɛl/ Adélie en français español ¿que? ¡ay! here is 70 chars ^
yep.
awilcox on gwyn [pts/11 Sun 16 19:01] ~: fold -w 72 -s testing.txt
1234567890 234567890 234567890 234567890 234567890 234567890 234567890
/ˈtiː siː ɛl/ Adélie en français español ¿que? ¡ay! here is 70
chars ^
yep.



msi




Acknowledgement sent to Michael Siegel <msi@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-coreutils@HIDDEN. Full text available.
Report forwarded to bug-coreutils@HIDDEN:
bug#33775; Package coreutils. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Sun, 23 Dec 2018 06:15:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.