X-Loop: help-debbugs@HIDDEN Subject: bug#26362: tr -cd -- Problem with UTF-8? Resent-From: Ronald Schaten <ronald@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-coreutils@HIDDEN Resent-Date: Tue, 04 Apr 2017 15:25:02 +0000 Resent-Message-ID: <handler.26362.B.14913194942290 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: report 26362 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 26362 <at> debbugs.gnu.org X-Debbugs-Original-To: bug-coreutils@HIDDEN Received: via spool by submit <at> debbugs.gnu.org id=B.14913194942290 (code B ref -1); Tue, 04 Apr 2017 15:25:02 +0000 Received: (at submit) by debbugs.gnu.org; 4 Apr 2017 15:24:54 +0000 Received: from localhost ([127.0.0.1]:60828 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1cvQKE-0000ar-4L for submit <at> debbugs.gnu.org; Tue, 04 Apr 2017 11:24:54 -0400 Received: from eggs.gnu.org ([208.118.235.92]:47120) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <ronald@HIDDEN>) id 1cvP2G-0005EP-6h for submit <at> debbugs.gnu.org; Tue, 04 Apr 2017 10:02:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <ronald@HIDDEN>) id 1cvP25-0007t8-Kg for submit <at> debbugs.gnu.org; Tue, 04 Apr 2017 10:02:11 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:47829) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from <ronald@HIDDEN>) id 1cvP25-0007t4-Hi for submit <at> debbugs.gnu.org; Tue, 04 Apr 2017 10:02:05 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38087) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <ronald@HIDDEN>) id 1cvP20-0000rU-KZ for bug-coreutils@HIDDEN; Tue, 04 Apr 2017 10:02:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <ronald@HIDDEN>) id 1cvP1x-0007qG-JR for bug-coreutils@HIDDEN; Tue, 04 Apr 2017 10:02:00 -0400 Received: from mail.scheunentor.de ([148.251.13.145]:59619 helo=ispmail01.scheunentor.de) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from <ronald@HIDDEN>) id 1cvP1x-0007ov-D4 for bug-coreutils@HIDDEN; Tue, 04 Apr 2017 10:01:57 -0400 Received: from localhost (localhost [127.0.0.1]) by ispmail01.scheunentor.de (Postfix) with ESMTP id 0F9861F579 for <bug-coreutils@HIDDEN>; Tue, 4 Apr 2017 16:01:53 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at ispmail01.scheunentor.de Received: from ispmail01.scheunentor.de ([127.0.0.1]) by localhost (ispmail01.intra.scheunentor.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mRtWWiDgxSao for <bug-coreutils@HIDDEN>; Tue, 4 Apr 2017 16:01:50 +0200 (CEST) Received: from shell.intra.scheunentor.de (shell.intra.scheunentor.de [192.168.0.206]) by ispmail01.scheunentor.de (Postfix) with SMTP id AB1051F548 for <bug-coreutils@HIDDEN>; Tue, 4 Apr 2017 16:01:50 +0200 (CEST) Received: (nullmailer pid 27293 invoked by uid 1000); Tue, 04 Apr 2017 14:01:52 -0000 Date: Tue, 4 Apr 2017 16:01:52 +0200 From: Ronald Schaten <ronald@HIDDEN> Message-ID: <20170404140150.GV3709@HIDDEN> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Mailman-Approved-At: Tue, 04 Apr 2017 11:24:51 -0400 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -5.0 (-----) Hey... I'm not sure if this is bug or if I'm using it wrong. As a matter of fact, I tested this on several systems, and on BSD-based systems (Mac) the tr tool gives different results -- the one I expected. The simplest way to reproduce this looks like this (sorry, umlaut ahead): $ echo -ne "\xc3\x82" | tr -cd "=E4" | xxd % 00000000: c3 . The echo prints a capital A with a circumflex (=C2), and I expect the tr command to delete everything except the small umlaut =E4. It looks as if tr just deletes the second byte. When I try without the umlaut it gives me the empty result, as expected: $ echo -ne "\xc3\x82" | tr -cd "a" | xxd [empty result] I tested several systems, the oldest is a Debian with coreutils 8.5, the newest an Ubuntu with coreutils 8.25. For the moment, I'll try to solve my problem differently, but... is this a bug? Thanks in advance! Regards, Ronald. --=20 There is no reason for any individual to have a computer in his home. (Ken Olsen, DEC)
Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) Content-Type: text/plain; charset=utf-8 X-Loop: help-debbugs@HIDDEN From: help-debbugs@HIDDEN (GNU bug Tracking System) To: Ronald Schaten <ronald@HIDDEN> Subject: bug#26362: Acknowledgement (tr -cd -- Problem with UTF-8?) Message-ID: <handler.26362.B.14913194942290.ack <at> debbugs.gnu.org> References: <20170404140150.GV3709@HIDDEN> X-Gnu-PR-Message: ack 26362 X-Gnu-PR-Package: coreutils Reply-To: 26362 <at> debbugs.gnu.org Date: Tue, 04 Apr 2017 15:25:02 +0000 Thank you for filing a new bug report with debbugs.gnu.org. This is an automatically generated reply to let you know your message has been received. Your message is being forwarded to the package maintainers and other interested parties for their attention; they will reply in due course. Your message has been sent to the package maintainer(s): bug-coreutils@HIDDEN If you wish to submit further information on this problem, please send it to 26362 <at> debbugs.gnu.org. Please do not send mail to help-debbugs@HIDDEN unless you wish to report a problem with the Bug-tracking system. --=20 26362: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D26362 GNU Bug Tracking System Contact help-debbugs@HIDDEN with problems
X-Loop: help-debbugs@HIDDEN Subject: bug#26362: tr -cd -- Problem with UTF-8? Resent-From: Assaf Gordon <assafgordon@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-coreutils@HIDDEN Resent-Date: Wed, 05 Apr 2017 02:20:01 +0000 Resent-Message-ID: <handler.26362.B26362.14913587655684 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 26362 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Ronald Schaten <ronald@HIDDEN> Cc: 26362 <at> debbugs.gnu.org Received: via spool by 26362-submit <at> debbugs.gnu.org id=B26362.14913587655684 (code B ref 26362); Wed, 05 Apr 2017 02:20:01 +0000 Received: (at 26362) by debbugs.gnu.org; 5 Apr 2017 02:19:25 +0000 Received: from localhost ([127.0.0.1]:33017 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1cvaXd-0001Ta-3t for submit <at> debbugs.gnu.org; Tue, 04 Apr 2017 22:19:25 -0400 Received: from mail-qk0-f173.google.com ([209.85.220.173]:36105) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <assafgordon@HIDDEN>) id 1cvaXb-0001TD-IX; Tue, 04 Apr 2017 22:19:24 -0400 Received: by mail-qk0-f173.google.com with SMTP id p22so55117qka.3; Tue, 04 Apr 2017 19:19:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:mime-version:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=74jdUgyxWSqjReugYzZWccQe7PW3zsXmh7CvdZDSt3A=; b=dP92BkvsTBMJ59+xafaLIyyxvW6wsIHVbplLzaqGPXgDZSsd4nP1j8Td7GZu0GGgBK EN3GmXtXmVJctfUC/kLsJpEElOlLySsotdhg1GVDicyeAGTeMN6pGjoRlfOeq3u8en5N /baSGNlwD0Wkb7Um1ex2MBguN3lRn6GM7x23zGblyroNESYWDtFVCcrwShor4Ol7GjA4 1wtA1rZvuepSVlooD6eeCpFz6qssEyWlXGxJRMnUy6b4nVvDHYcW2vjc7yFtOeJloLuu NAYOzmpWV5kHiexgGT4/4pWxrlAnY22BNCJargycD/s4aJpTUHXOmuev9POVnZSPeuiD kXMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:mime-version:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=74jdUgyxWSqjReugYzZWccQe7PW3zsXmh7CvdZDSt3A=; b=XIT/biRIYcwzwmvqXrgTz6cjArzS06qwGvq17vwwOU7vAImntRAW14DR2ZBqYwIxto 1WjJOJjSlyoZVHCInH93ptOnKqUMzc/lhkSloHbaY9ISCCWFBgO6sI2kfdrDaoduzmAX 98RaKi9S1QqkbSAnHPpXFf2vRrJuk6dmG1ar0HBqOOWhnWU+z9Upd289DP5MRi5yexGq BkBfK82FIwAihogA1oplIFBJmBN+C7z3qfdD/zDggUFPVFjuHYc4/hGPuYOrQff9d+5C pScVmNCfAjBEkqei8v9IODfPemk9A2YnfcLH4w40CrOJZG+TfW928dgCutzPyGICAklA ZxmA== X-Gm-Message-State: AN3rC/7/cTqfWJlwO1UKKZ5ZT1QGXCaoGMoY1Pa94+dvQZmf5AJEaVKZmhkaIXlEs8InhA== X-Received: by 10.55.102.193 with SMTP id a184mr315518qkc.309.1491358758055; Tue, 04 Apr 2017 19:19:18 -0700 (PDT) Received: from ix.home (pool-100-37-92-116.nycmny.fios.verizon.net. [100.37.92.116]) by smtp.gmail.com with ESMTPSA id p19sm13168506qtp.36.2017.04.04.19.19.16 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 04 Apr 2017 19:19:17 -0700 (PDT) Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2102\)) Content-Type: text/plain; charset=iso-8859-1 From: Assaf Gordon <assafgordon@HIDDEN> In-Reply-To: <20170404140150.GV3709@HIDDEN> Date: Tue, 4 Apr 2017 22:19:15 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <50AD3375-F204-4F23-A6EB-6BD3F3A79D4E@HIDDEN> References: <20170404140150.GV3709@HIDDEN> X-Mailer: Apple Mail (2.2102) X-Spam-Score: 0.5 (/) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: 0.5 (/) tags 26362 notabug wishlist stop 26362 Hello, > On Apr 4, 2017, at 10:01, Ronald Schaten <ronald@HIDDEN> = wrote: >=20 > I'm not sure if this is bug or if I'm using it wrong. Neither - it is simply the GNU tr does not yet support multibyte = characters. > The simplest way to reproduce this looks like this (sorry, umlaut > ahead): >=20 > $ echo -ne "\xc3\x82" | tr -cd "=E4" | xxd > % 00000000: c3 . >=20 > The echo prints a capital A with a circumflex (=C2), and I expect the = tr > command to delete everything except the small umlaut =E4. It looks as = if > tr just deletes the second byte. What happened here is this: 'tr' currently reads the input string parameter (SET1) as single-byte, = and so treats it as if you've given two octets: \xC3 \xA4 (which is the UTF-8 = encoding of small A with umlaut). Then, it reads the input octet-by-octet, keeps \xC3 and deletes \x82. > When I try without the umlaut it gives me the empty result, as = expected: >=20 > $ echo -ne "\xc3\x82" | tr -cd "a" | xxd Indeed, because here you're asking to keep only octets whose value is \x61 (the ASCII value of 'a') - neither "\xC3" not "\x82" match and so they are deleted. > For the moment, I'll try to solve my problem differently, but... is = this > a bug? Thanks in advance! Not a bug - but a yet-missing feature. For relevant discussion see here: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D24924#8 As a temporary work-around, you can use gnu sed which is = multibyte-aware: $ printf "abc \xc3\xA4\xc3\x82 def\n" | sed 's/[^=E4]//g' =E4 And 'sed' supports one more thing called "character equivalent class": The the following examples, all characters except those that are = equivalent to 'a' will be deleted: $ printf "abc \xc3\xA4\xc3\x82 def\n" | sed 's/[^[=3Da=3D]]//g' a=E4=C2 'Character equivalent class' will work with future 'tr' as well once multibyte-support is added. Lastly, "echo -en" is not portable. It is recommended to use "printf" instead. "printf" has the added advantage that it supports unicode code-points directly, instead of having to know the UTF-8 encoding of a unicode = character, e.g.: printf "\u00c2\n" will print capital A with circumflex (and will work in other locales if = they support this character, not just UTF-8). I'm thus marking this item as "wishlist" and "notabug", but I'll keep it open until it is implemented. Discussion can continue by replying to this thread. regards, - assaf
Received: (at control) by debbugs.gnu.org; 5 Apr 2017 02:19:25 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Tue Apr 04 22:19:25 2017 Received: from localhost ([127.0.0.1]:33019 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1cvaXd-0001Td-D0 for submit <at> debbugs.gnu.org; Tue, 04 Apr 2017 22:19:25 -0400 Received: from mail-qk0-f173.google.com ([209.85.220.173]:36105) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <assafgordon@HIDDEN>) id 1cvaXb-0001TD-IX; Tue, 04 Apr 2017 22:19:24 -0400 Received: by mail-qk0-f173.google.com with SMTP id p22so55117qka.3; Tue, 04 Apr 2017 19:19:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:mime-version:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=74jdUgyxWSqjReugYzZWccQe7PW3zsXmh7CvdZDSt3A=; b=dP92BkvsTBMJ59+xafaLIyyxvW6wsIHVbplLzaqGPXgDZSsd4nP1j8Td7GZu0GGgBK EN3GmXtXmVJctfUC/kLsJpEElOlLySsotdhg1GVDicyeAGTeMN6pGjoRlfOeq3u8en5N /baSGNlwD0Wkb7Um1ex2MBguN3lRn6GM7x23zGblyroNESYWDtFVCcrwShor4Ol7GjA4 1wtA1rZvuepSVlooD6eeCpFz6qssEyWlXGxJRMnUy6b4nVvDHYcW2vjc7yFtOeJloLuu NAYOzmpWV5kHiexgGT4/4pWxrlAnY22BNCJargycD/s4aJpTUHXOmuev9POVnZSPeuiD kXMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:mime-version:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=74jdUgyxWSqjReugYzZWccQe7PW3zsXmh7CvdZDSt3A=; b=XIT/biRIYcwzwmvqXrgTz6cjArzS06qwGvq17vwwOU7vAImntRAW14DR2ZBqYwIxto 1WjJOJjSlyoZVHCInH93ptOnKqUMzc/lhkSloHbaY9ISCCWFBgO6sI2kfdrDaoduzmAX 98RaKi9S1QqkbSAnHPpXFf2vRrJuk6dmG1ar0HBqOOWhnWU+z9Upd289DP5MRi5yexGq BkBfK82FIwAihogA1oplIFBJmBN+C7z3qfdD/zDggUFPVFjuHYc4/hGPuYOrQff9d+5C pScVmNCfAjBEkqei8v9IODfPemk9A2YnfcLH4w40CrOJZG+TfW928dgCutzPyGICAklA ZxmA== X-Gm-Message-State: AN3rC/7/cTqfWJlwO1UKKZ5ZT1QGXCaoGMoY1Pa94+dvQZmf5AJEaVKZmhkaIXlEs8InhA== X-Received: by 10.55.102.193 with SMTP id a184mr315518qkc.309.1491358758055; Tue, 04 Apr 2017 19:19:18 -0700 (PDT) Received: from ix.home (pool-100-37-92-116.nycmny.fios.verizon.net. [100.37.92.116]) by smtp.gmail.com with ESMTPSA id p19sm13168506qtp.36.2017.04.04.19.19.16 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 04 Apr 2017 19:19:17 -0700 (PDT) Subject: Re: bug#26362: tr -cd -- Problem with UTF-8? Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2102\)) Content-Type: text/plain; charset=iso-8859-1 From: Assaf Gordon <assafgordon@HIDDEN> In-Reply-To: <20170404140150.GV3709@HIDDEN> Date: Tue, 4 Apr 2017 22:19:15 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <50AD3375-F204-4F23-A6EB-6BD3F3A79D4E@HIDDEN> References: <20170404140150.GV3709@HIDDEN> To: Ronald Schaten <ronald@HIDDEN> X-Mailer: Apple Mail (2.2102) X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: control Cc: 26362 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: 0.5 (/) tags 26362 notabug wishlist stop 26362 Hello, > On Apr 4, 2017, at 10:01, Ronald Schaten <ronald@HIDDEN> = wrote: >=20 > I'm not sure if this is bug or if I'm using it wrong. Neither - it is simply the GNU tr does not yet support multibyte = characters. > The simplest way to reproduce this looks like this (sorry, umlaut > ahead): >=20 > $ echo -ne "\xc3\x82" | tr -cd "=E4" | xxd > % 00000000: c3 . >=20 > The echo prints a capital A with a circumflex (=C2), and I expect the = tr > command to delete everything except the small umlaut =E4. It looks as = if > tr just deletes the second byte. What happened here is this: 'tr' currently reads the input string parameter (SET1) as single-byte, = and so treats it as if you've given two octets: \xC3 \xA4 (which is the UTF-8 = encoding of small A with umlaut). Then, it reads the input octet-by-octet, keeps \xC3 and deletes \x82. > When I try without the umlaut it gives me the empty result, as = expected: >=20 > $ echo -ne "\xc3\x82" | tr -cd "a" | xxd Indeed, because here you're asking to keep only octets whose value is \x61 (the ASCII value of 'a') - neither "\xC3" not "\x82" match and so they are deleted. > For the moment, I'll try to solve my problem differently, but... is = this > a bug? Thanks in advance! Not a bug - but a yet-missing feature. For relevant discussion see here: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D24924#8 As a temporary work-around, you can use gnu sed which is = multibyte-aware: $ printf "abc \xc3\xA4\xc3\x82 def\n" | sed 's/[^=E4]//g' =E4 And 'sed' supports one more thing called "character equivalent class": The the following examples, all characters except those that are = equivalent to 'a' will be deleted: $ printf "abc \xc3\xA4\xc3\x82 def\n" | sed 's/[^[=3Da=3D]]//g' a=E4=C2 'Character equivalent class' will work with future 'tr' as well once multibyte-support is added. Lastly, "echo -en" is not portable. It is recommended to use "printf" instead. "printf" has the added advantage that it supports unicode code-points directly, instead of having to know the UTF-8 encoding of a unicode = character, e.g.: printf "\u00c2\n" will print capital A with circumflex (and will work in other locales if = they support this character, not just UTF-8). I'm thus marking this item as "wishlist" and "notabug", but I'll keep it open until it is implemented. Discussion can continue by replying to this thread. regards, - assaf
Received: (at control) by debbugs.gnu.org; 29 Oct 2018 03:04:04 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Sun Oct 28 23:04:04 2018 Received: from localhost ([127.0.0.1]:49681 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1gGxqW-00053j-LB for submit <at> debbugs.gnu.org; Sun, 28 Oct 2018 23:04:04 -0400 Received: from mail-pg1-f178.google.com ([209.85.215.178]:38421) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <assafgordon@HIDDEN>) id 1gGxqV-00052b-08 for control <at> debbugs.gnu.org; Sun, 28 Oct 2018 23:04:03 -0400 Received: by mail-pg1-f178.google.com with SMTP id f8-v6so3159499pgq.5 for <control <at> debbugs.gnu.org>; Sun, 28 Oct 2018 20:04:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=to:from:message-id:date:user-agent:mime-version:content-language :content-transfer-encoding; bh=fH6hAKZNEGf1YGpGk9nb5QX4JVfsrwpOAZT2F7WSURI=; b=FTpqkPUvrZdkaqNFKBHYQOcZc3ZiSwwQ4V/BdYvlA9IqlNwW6gNPbF76di+Gu0mVgX cf67Be7GNniA2VI7qofO+1HP09fnJ1Q2hK3TVn27K78I+C/zNyXLIwhAKP963LQHgUoJ 9XMIY8cOXV5TXYEQ6RT0nD8UNZ9w5ZTh/2Obrclqp0efXwXpbAlCpgqiMbv6MHIcDll/ YULyne2atUGo9oHE4Acz1gZUHjnQIkImsLuqK+PY54PNrNB8tSU3VSPbG5VFK3HMZiiy q67lveiIC9Sco/k8gD0cyx+0T7oKK22tyG7J9n+c3ySdchF38NHWHCG5je+JO4vdNYJ2 TC2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:from:message-id:date:user-agent:mime-version :content-language:content-transfer-encoding; bh=fH6hAKZNEGf1YGpGk9nb5QX4JVfsrwpOAZT2F7WSURI=; b=VJRkxxWEtm146tkLieQfU6nMnpqM9DMYl8dOiXGn3/UvqEI2AJ3ilFaAPV3I3sR7kL A7pyFoEouDgQ2Xl7OHKmqSF+fuc0baHxB9FZzG8YmUAD8jmpKRh6hGMC1mn/PQoppbBo Q5VSacSj3vN4vcabDLTLudcVpk1JtP76Kb5B+9OGw5OpXMKwN/tYaBbEEhADv1JGmvyp XqlNinX5LyWzSlyNycQq0J4JPElxScIKD00+mE6vgNyYgtyy2GBlUDyWV3beFHHz96nC 5GUHFCzAqJCkWehk+p4N0BRKQ6xjebsEGZuGQL8yOv+tjDnpEPoKa17BBgPkuyHdSp56 1+og== X-Gm-Message-State: AGRZ1gJ69S43IzxT1v73MlIvAIVsixm4ik4SHM5FvB+sOp1bd3z4adh9 iD4dBxRNP0pCqW7oAYtF9BGIn4FSIt4= X-Google-Smtp-Source: AJdET5cksNofN1TXHqTExk/X1hjdK5Po0nfGepurd+u+RqvrZtebyvFDDZMxqBKqnxckHfEhw9B7/g== X-Received: by 2002:a62:34c5:: with SMTP id b188-v6mr13784982pfa.65.1540782236611; Sun, 28 Oct 2018 20:03:56 -0700 (PDT) Received: from tomato.housegordon.com (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id t11-v6sm22307330pgn.38.2018.10.28.20.03.53 for <control <at> debbugs.gnu.org> (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 28 Oct 2018 20:03:55 -0700 (PDT) To: control <at> debbugs.gnu.org From: Assaf Gordon <assafgordon@HIDDEN> Message-ID: <d5241edc-7e81-c07d-0c69-5c25e42f83ff@HIDDEN> Date: Sun, 28 Oct 2018 21:03:52 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: 2.0 (++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: severity 26362 wishlist retitle 26362 multibyte: tr: "tr -cd" -- Problem with UTF-8? [...] Content analysis details: (2.0 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (assafgordon[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [209.85.215.178 listed in list.dnswl.org] 1.8 MISSING_SUBJECT Missing Subject: header 0.2 NO_SUBJECT Extra score for no subject X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: 1.0 (+) severity 26362 wishlist retitle 26362 multibyte: tr: "tr -cd" -- Problem with UTF-8?
Received: (at control) by debbugs.gnu.org; 29 Oct 2018 03:04:04 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Sun Oct 28 23:04:04 2018 Received: from localhost ([127.0.0.1]:49681 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1gGxqW-00053j-LB for submit <at> debbugs.gnu.org; Sun, 28 Oct 2018 23:04:04 -0400 Received: from mail-pg1-f178.google.com ([209.85.215.178]:38421) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <assafgordon@HIDDEN>) id 1gGxqV-00052b-08 for control <at> debbugs.gnu.org; Sun, 28 Oct 2018 23:04:03 -0400 Received: by mail-pg1-f178.google.com with SMTP id f8-v6so3159499pgq.5 for <control <at> debbugs.gnu.org>; Sun, 28 Oct 2018 20:04:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=to:from:message-id:date:user-agent:mime-version:content-language :content-transfer-encoding; bh=fH6hAKZNEGf1YGpGk9nb5QX4JVfsrwpOAZT2F7WSURI=; b=FTpqkPUvrZdkaqNFKBHYQOcZc3ZiSwwQ4V/BdYvlA9IqlNwW6gNPbF76di+Gu0mVgX cf67Be7GNniA2VI7qofO+1HP09fnJ1Q2hK3TVn27K78I+C/zNyXLIwhAKP963LQHgUoJ 9XMIY8cOXV5TXYEQ6RT0nD8UNZ9w5ZTh/2Obrclqp0efXwXpbAlCpgqiMbv6MHIcDll/ YULyne2atUGo9oHE4Acz1gZUHjnQIkImsLuqK+PY54PNrNB8tSU3VSPbG5VFK3HMZiiy q67lveiIC9Sco/k8gD0cyx+0T7oKK22tyG7J9n+c3ySdchF38NHWHCG5je+JO4vdNYJ2 TC2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:from:message-id:date:user-agent:mime-version :content-language:content-transfer-encoding; bh=fH6hAKZNEGf1YGpGk9nb5QX4JVfsrwpOAZT2F7WSURI=; b=VJRkxxWEtm146tkLieQfU6nMnpqM9DMYl8dOiXGn3/UvqEI2AJ3ilFaAPV3I3sR7kL A7pyFoEouDgQ2Xl7OHKmqSF+fuc0baHxB9FZzG8YmUAD8jmpKRh6hGMC1mn/PQoppbBo Q5VSacSj3vN4vcabDLTLudcVpk1JtP76Kb5B+9OGw5OpXMKwN/tYaBbEEhADv1JGmvyp XqlNinX5LyWzSlyNycQq0J4JPElxScIKD00+mE6vgNyYgtyy2GBlUDyWV3beFHHz96nC 5GUHFCzAqJCkWehk+p4N0BRKQ6xjebsEGZuGQL8yOv+tjDnpEPoKa17BBgPkuyHdSp56 1+og== X-Gm-Message-State: AGRZ1gJ69S43IzxT1v73MlIvAIVsixm4ik4SHM5FvB+sOp1bd3z4adh9 iD4dBxRNP0pCqW7oAYtF9BGIn4FSIt4= X-Google-Smtp-Source: AJdET5cksNofN1TXHqTExk/X1hjdK5Po0nfGepurd+u+RqvrZtebyvFDDZMxqBKqnxckHfEhw9B7/g== X-Received: by 2002:a62:34c5:: with SMTP id b188-v6mr13784982pfa.65.1540782236611; Sun, 28 Oct 2018 20:03:56 -0700 (PDT) Received: from tomato.housegordon.com (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id t11-v6sm22307330pgn.38.2018.10.28.20.03.53 for <control <at> debbugs.gnu.org> (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 28 Oct 2018 20:03:55 -0700 (PDT) To: control <at> debbugs.gnu.org From: Assaf Gordon <assafgordon@HIDDEN> Message-ID: <d5241edc-7e81-c07d-0c69-5c25e42f83ff@HIDDEN> Date: Sun, 28 Oct 2018 21:03:52 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: 2.0 (++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: severity 26362 wishlist retitle 26362 multibyte: tr: "tr -cd" -- Problem with UTF-8? [...] Content analysis details: (2.0 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (assafgordon[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [209.85.215.178 listed in list.dnswl.org] 1.8 MISSING_SUBJECT Missing Subject: header 0.2 NO_SUBJECT Extra score for no subject X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: 1.0 (+) severity 26362 wishlist retitle 26362 multibyte: tr: "tr -cd" -- Problem with UTF-8?
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.