Received: (at 33775) by debbugs.gnu.org; 23 Dec 2018 06:04:03 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Sun Dec 23 01:04:03 2018 Received: from localhost ([127.0.0.1]:60454 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1gawrq-0007w7-QO for submit <at> debbugs.gnu.org; Sun, 23 Dec 2018 01:04:03 -0500 Received: from mail-pf1-f170.google.com ([209.85.210.170]:46578) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <assafgordon@HIDDEN>) id 1gawrm-0007vG-Hc; Sun, 23 Dec 2018 01:03:58 -0500 Received: by mail-pf1-f170.google.com with SMTP id c73so4491646pfe.13; Sat, 22 Dec 2018 22:03:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=qPT0mitvhMz6h/Dfgh5OnjNx3/sINGUSRHuGc7OS2JQ=; b=YOuKw9RyMs24T1X3UKcLRJn+EG1pdxfYL7pHJXrPslpRLia508E8iILl/mnkX4gxho R3qBdHHlVr3q2AE6Qo0baJjwCj4Uq7hPD3tiPRNbh6g0Xno8GiBfkfOZ+xwA3cMEGTTx EFfKK71QsEEbfO3qYAZ4+EimB5S8t6p47P3sgxkXuHgOpNXeOF7fyFqub+9lwrUMXrO5 +yWjFEbFETkwoVjVzAl8jcnErJgPJEhDZEoOjVEuhY+HNhyu+zo335xBTWSiTMePwORa 9Tgq0h4/911NpZThb9yUmTmltHIkeVH8eXhDayCTMmv/PwV5ynIaQYmoWGG9P17W4Qtx UKwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=qPT0mitvhMz6h/Dfgh5OnjNx3/sINGUSRHuGc7OS2JQ=; b=cQVu37K9JWN3EajdEgRjgFXdi5EioaC8SUZFY5pQNpTkzeLpolbvNSNG7UfS0DXl6a wThcNUs1BkDrfYG4PY29lCMLEUC2+5S76vjuJFr1tyB6bWR1OnPxLcbHvwWUHlMIOcrS fDTComaeroEO2E5H8qbo0ajRwWHyLHO7Wqcup7aAIHQen7frJyQ0Wsrw3ZTWhvV7H1rh mskfQP3nSlDOD6L9NpzIs0jTZuxdz4cP0yUNuYh3iP8zJyR0f0XUctLhgPj+9df/OUkT 6IivkABnc1top+A5Sa6zrc9soI0OG9AN69UaCAs8xHHDZ+cb2IYUcbYmEeCMYWe2ebnx w4LA== X-Gm-Message-State: AJcUukc1LQxYoMaz8OhGB8xX9toap5PXYBiz9YdFQyp7fs2iLRKuHWPZ 1c+53Wl+u+0cBuFOnKx5/vfKPyx4 X-Google-Smtp-Source: ALg8bN60EaFkHcdQROg26EDT8AKjx8rtvlw8zr5XsndbpeUpGbcr9nEzZC8K8OidAt3mvBOMfg8OYw== X-Received: by 2002:a63:ba4d:: with SMTP id l13mr8294855pgu.194.1545545032160; Sat, 22 Dec 2018 22:03:52 -0800 (PST) Received: from tomato.housegordon.com (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id 4sm52445335pfq.10.2018.12.22.22.03.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 22 Dec 2018 22:03:51 -0800 (PST) Subject: Re: bug#33775: fold: counting multi-byte utf-8 sequences as separate columns To: Michael Siegel <msi@HIDDEN>, 33775 <at> debbugs.gnu.org References: <cb32cf5c-2f40-ab75-2c03-113bcd19d7ad@HIDDEN> From: Assaf Gordon <assafgordon@HIDDEN> Message-ID: <4e9b7e51-4020-133d-0b3f-0cc89076a062@HIDDEN> Date: Sat, 22 Dec 2018 23:03:50 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.0 MIME-Version: 1.0 In-Reply-To: <cb32cf5c-2f40-ab75-2c03-113bcd19d7ad@HIDDEN> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 33775 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) severity 33775 wishlist retitle 33775 multibyte: fold: multi-byte sequences as separate columns stop Hello, On 2018-12-16 6:32 p.m., Michael Siegel wrote: > I've just discovered an odd behavior of `fold' while trying to wrap a > piece of text containing phonetic characters. > > Take the following line, for example: Thank you for reporting this issue and providing clear, reproducible examples. Adding complete multibyte/utf8 support to all coreutils programs is an on-going effort. I'm marking this as a "wishlist" item, which will remain open until we complete the implementation. Related multibyte items are listed here (with "multibyte" prefix): https://debbugs.gnu.org/cgi/pkgreport.cgi?which=pkg&data=coreutils regards, - assaf
bug-coreutils@HIDDEN
:bug#33775
; Package coreutils
.
Full text available.Assaf Gordon <assafgordon@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.Assaf Gordon <assafgordon@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.Received: (at submit) by debbugs.gnu.org; 17 Dec 2018 02:14:30 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Sun Dec 16 21:14:30 2018 Received: from localhost ([127.0.0.1]:50731 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1gYiQP-0001CB-KH for submit <at> debbugs.gnu.org; Sun, 16 Dec 2018 21:14:30 -0500 Received: from eggs.gnu.org ([208.118.235.92]:47016) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <msi@HIDDEN>) id 1gYhmS-0000AN-KO for submit <at> debbugs.gnu.org; Sun, 16 Dec 2018 20:33:13 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <msi@HIDDEN>) id 1gYhmM-0007dJ-G1 for submit <at> debbugs.gnu.org; Sun, 16 Dec 2018 20:33:07 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_20 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:33694) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from <msi@HIDDEN>) id 1gYhmL-0007bw-KN for submit <at> debbugs.gnu.org; Sun, 16 Dec 2018 20:33:06 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56626) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <msi@HIDDEN>) id 1gYhmK-0003aN-Nk for bug-coreutils@HIDDEN; Sun, 16 Dec 2018 20:33:05 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <msi@HIDDEN>) id 1gYhmF-0007UU-KI for bug-coreutils@HIDDEN; Sun, 16 Dec 2018 20:33:04 -0500 Received: from poseidon.malbolge.net ([5.45.108.48]:34886) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from <msi@HIDDEN>) id 1gYhmF-0007RP-92 for bug-coreutils@HIDDEN; Sun, 16 Dec 2018 20:32:59 -0500 Received: from hermes.malbolge.net (hermes.malbolge.net [192.168.123.201]) by poseidon.malbolge.net (OpenSMTPD) with ESMTPSA id 7a4d17b6 (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256:NO) for <bug-coreutils@HIDDEN>; Mon, 17 Dec 2018 02:32:57 +0100 (CET) Received: from kerberos.malbolge.net ([192.168.123.128] helo=127.0.0.1) by hermes.malbolge.net with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.89) (envelope-from <msi@HIDDEN>) id 1gYhmC-0007dp-Ce for bug-coreutils@HIDDEN; Mon, 17 Dec 2018 02:32:56 +0100 To: bug-coreutils@HIDDEN From: Michael Siegel <msi@HIDDEN> Subject: fold: counting multi-byte utf-8 sequences as separate columns Openpgp: preference=signencrypt Autocrypt: addr=msi@HIDDEN; prefer-encrypt=mutual; keydata= mQINBFtDUbwBEAC3pgB2zgT1GBe8wwTuzRdKMIWnnI1HHVQVT8MuURvlQcHTBOM9KwV7s6hl RF8gwyBYImptTGD/zCWckuIC8TBWarqslKCLi4r6FUwmS410fCSqIQbD2m0kV8wyz0XUuULU v6E6aICqmgrEMJXgBrPtoK6Euvc9X9iJjhP+eC6EJ+lLp2snkn9ttAnaBGKupZzGa8X2q/de eZl9T0LoqMoIuClzX1v+VMFv9Hmc1gj9SQ5EiYyR+6odzXLaSQgLMVnIzfQ0MuJQCeGiZyWj oQK8IXAM2/R+94M79yzYNSbNp85nzQ+7vqsMH19f/+4Z6I8I/9fZjynB5ykLJtnSxvsBp4NO W04iYSxppctEmX7K4wlb2DNK6+wsH0GfoLSEDcsE3gLoQfb8Va2UASGXzIwcHxn4mEfveQ2l a5spYKr0xkMbiA4ETPdzgsy1tHKEaSdVk80uYenBxmeUS05FjRR7xGE2jdCmJs3y8CoMH3KS +3Og9auBgbKK25qETrmEbbVAAMtHGuNaasOS+nIXvVyfUHEXSEYvCcn3HFHoiqKZIsOBMxlK 3CvPPPI8EL33y0+VBcSDE1VNw2MrnooSccHA4F7ecQPjrrRdCNOF/egJucpqx36rBBNz45vB ZqWnbOntGdfd5dHCz9yRpwOy5/2VUwlCL8Zs7Gw0XgSFW4tzpQARAQABtCFNaWNoYWVsIFNp ZWdlbCA8bXNpQG1hbGJvbGdlLm5ldD6JAlQEEwEIAD4WIQQgFAgh+NDy+OS/sgzouIF4zeFu ZAUCW0NRvAIbAwUJA8JnAAULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgAAKCRDouIF4zeFuZNUC D/922JpXlJ/4Ny4PT0VZF/Ri7GkIwqMxr8nq/3+J0aLxtE5j12XmCZgupX7yCnSua/D3XnKE PKOYdNBa+gA2LJ7YtAZs8H8FCWeO052oGQwUqXf/qftax0vJzeSZLikmeRzcfmGaHOZH+99R UYI3b4zZTNavq35yLfZnVe2+VNVcReSTt9mEjiAf1M9JT2GNgmim3w/42ba/ol5Yx2zUhr0J pDEaS0a1JWzI4ttlhMDThGGAejUgN3aZP4/JnEmGhXQdAYlP2rKye0/wrQTIjCp9hNZ2/8xP D/SgxXDj+ePyyyL8nwj2BU/m6jydQLOSyJ5Uvq7SF96uF6OexyO7D/2bp/F4LIND7XVy/AUr bIZzpjKVblcFeOTQdtYDUjjhy9eV1eVa6ywbO32mEywrsLuY6R0BoeDxR0IWU7xT1AIWGRpD GZwM27X4iDiJjm1Hej0NAa+EPKTNQw0YOwVtM3cWRUNUq5my+DXVVYcT0n0tYdObWuhi+A88 C4aWypoWw6jZIDi74bwgFg/zQAL1AnqGxTjICWcDdG14GWgaVyAGJWDR937QtVg3sxni6qZb BkgUwAYnOoo6Wbq2wPeJ9293agz3Aj+paILyCFFGwnRGPHN1FcD0gmdjcJMaTWEVbS0q4B76 LvmHnJZXOk85pUcAx5qowoyZMt4poQRX6hDPfbkCDQRbQ1G8ARAA0P7k87V2rNjkHu/7TBoT 8mSuEZTTtcmMKa8E+tErRpc4XQnDZUv4bzxOMMjWFlSIV6mQ8f3ZVA1LF86zOQUWbISp+b2Q K3aKDB83Pbsclt45CUKd1TZNkQQGxtNLU1w0Sy3266pV1GEMxkadvoqJWQEpu4KkMzAaGlud cHHi1TCkbJa0bmwaRbT1eirtAUEqffY6olRaM7UApeDgazSS1VlZsP4DwqoK4binSdzwe+3S +Bqm8Gi2zjtl7cG6aWIA74tyYdWF8Mec7JY3KIu6rjtRvAznm7Y3R8RW4T4eRrujt8u+bwNA tSjkFCH8mmO/w7NaVAZ4hDUNUCAT9bfYJWWZ3H8T80DQgOlBIMXt5F3ahHFVAIoNWbofJrAJ NAM4icFE6WeWEDZVh3pCMoFftEIrQHahOSkITkDwFgO1WkBy5HN3hSDPJvpMiylRKds7Ftiw LcA5sqWeB0nozAPKsp7Et70rH+AUFBpECDKKAJwnGkoBVcm1G5lOYFfnsYpD4Faxn2vIP8pN rluAZjvZQ2038Jb+cYaOdGeD7Cr6j598LYuDm62juiv9itwV6MHR+aokVbEYGwe5HnQHlWFh Gdj/Vx/j7CsnX9rcknWeFne560f7wpPiUfp4neM2/uSSvGHaZXONlMTtPBBY4TEnrZWnceNA xAl6HHF6bMVyhzsAEQEAAYkCPAQYAQgAJhYhBCAUCCH40PL45L+yDOi4gXjN4W5kBQJbQ1G8 AhsMBQkDwmcAAAoJEOi4gXjN4W5kgqcQAKmjkQZJZmmA60fePgyUgKAtAhiPQrwC6+LD3hxw bTT1AF8OqG4bbTqu/mWhIuoY67X35rb+4JySp3ZLFp0NJzTNwsuHh9eFi8/dm16hydGrp6zV I1s5D+gWKW5YRNxbEJYxzYLRDyUkPLnzSRM8N1HnX3ElA0UBAkXWVZy3hnJUihPcWuVUipEP 59qENCK6YO3ii/2drbNWhpOCXgc2cHd9BiICUOvcAOwfj/n78dSj+azGQAt5PTa1c4wJC8o5 CMl5MvybttV2TzHA/r2rMH95/A3kqSuTmm4IP8EAe0uMLdmCX45KYzKdjcW20zEa48AEo6xD 0ifr1sOs12+B1IsouEJeQnPEz5pwblGNwuADw4W5+f6DTju+SCA7BvbvMEZlnVDAivWwQskX HBdry3Xlo2ioUajEFNDwrx1ZI0Wp5X8wDMuwKXK1o7qgr9ZmKkA2ZFG9CjDK3pmwZ2V5oSdC R4wo7rQNkh+wbuhb3J4INSyoTlqEuD6FKxKBIXvlb3YaAUnViiA9tPDVsTcfB0GLDalYYShq JjErV3kXXb30sIYL0KLSCZFJW1uSLB9xc8WTkAs+4acJu1LCrY5uLhHIbZ3SC5dB6XnCCDKB WvUCyxYuOWK7fRWCk56J+xc0vTXYGN+Vnr+90hHyhtHsNYmDFROzIGvgh3SHDjzEXAmT Message-ID: <cb32cf5c-2f40-ab75-2c03-113bcd19d7ad@HIDDEN> Date: Mon, 17 Dec 2018 02:32:55 +0100 User-Agent: Mozilla/5.0 (X11; Linux i686; rv:60.0) Gecko/20100101 Thunderbird/60.3.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: de-DE Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.1 (----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sun, 16 Dec 2018 21:14:29 -0500 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -5.1 (-----) Hello, I've just discovered an odd behavior of `fold' while trying to wrap a piece of text containing phonetic characters. Take the following line, for example: Tcl (pronounced "tickle" or tee cee ell /ˈtiː siː ɛl/) is a high-level, It is 71 characters long. Still, running echo "Tcl (pronounced "tickle" or tee cee ell /ˈtiː siː ɛl/) is a high-level," | fold -w 72 -s produces Tcl (pronounced tickle or tee cee ell /ˈtiː siː ɛl/) is a high-level, I've had someone test this with FreeBSD's `fold', which didn't behave that way. Instead, it filled out the line as expected. Further investigation by developers of Adélie Linux revealed that GNU's `fold' is counting multi-byte utf-8 sequences (in this case, the phonetic characters) as separate columns: awilcox on gwyn [pts/11 Sun 16 19:01] ~: cat testing.txt 1234567890 234567890 234567890 234567890 234567890 234567890 234567890 /ˈtiː siː ɛl/ Adélie en français español ¿que? ¡ay! here is 70 chars ^ yep. awilcox on gwyn [pts/11 Sun 16 19:01] ~: fold -w 72 -s testing.txt 1234567890 234567890 234567890 234567890 234567890 234567890 234567890 /ˈtiː siː ɛl/ Adélie en français español ¿que? ¡ay! here is 70 chars ^ yep. msi
Michael Siegel <msi@HIDDEN>
:bug-coreutils@HIDDEN
.
Full text available.bug-coreutils@HIDDEN
:bug#33775
; Package coreutils
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.