Assaf Gordon <assafgordon@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.Assaf Gordon <assafgordon@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.Received: (at 21395) by debbugs.gnu.org; 2 Sep 2015 11:03:15 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Sep 02 07:03:15 2015 Received: from localhost ([127.0.0.1]:45983 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1ZX5ox-0004eu-Eq for submit <at> debbugs.gnu.org; Wed, 02 Sep 2015 07:03:15 -0400 Received: from mail2.vodafone.ie ([213.233.128.44]:37072) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <P@HIDDEN>) id 1ZX5ov-0004el-9s for 21395 <at> debbugs.gnu.org; Wed, 02 Sep 2015 07:03:13 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ag8OAFPW5lVtT8J4/2dsb2JhbABdgklSHzVqgT+BFU68JYV4AQICgTpMAQEBAQEBgQtBA4NgAQEEIw8BQRULDQsCAgUWCwICCQMCAQIBRQYBDAgBAQWIKQEItQWFb48cLIEihFaFdoUSgmmBQwWVSZYFkWAmgkGBPz2DAAEBAQ Received: from unknown (HELO localhost.localdomain) ([109.79.194.120]) by mail2.vodafone.ie with ESMTP; 02 Sep 2015 12:03:11 +0100 Message-ID: <55E6D76E.5070009@HIDDEN> Date: Wed, 02 Sep 2015 12:03:10 +0100 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= <P@HIDDEN> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Michael Lee <michaellee213@HIDDEN>, 21395 <at> debbugs.gnu.org Subject: Re: bug#21395: Bug with cut and Spanish characters from text file with UTF-8 encoding References: <1569154567.83126.1441154469654.JavaMail.yahoo@HIDDEN> In-Reply-To: <1569154567.83126.1441154469654.JavaMail.yahoo@HIDDEN> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 21395 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: 0.0 (/) On 02/09/15 01:41, Michael Lee wrote: > When using cut as, "cut -c 1" with a text file with Spanish characters, it does not display those characters. > For example, the character ã or á will not display if it is the first character and the file is trimmed using the cut command. Debian/Ubuntu do not use the i18n patch used in Fedora/RHEL/Suse for example, and so do not support multi-byte characters. Now that i18n patch is problematic and incomplete, and there are plans to bring the functionality upstream at some stage: http://www.pixelbeat.org/docs/coreutils_i18n/ cheers, Pádraig
bug-coreutils@HIDDEN
:bug#21395
; Package coreutils
.
Full text available.Received: (at submit) by debbugs.gnu.org; 2 Sep 2015 00:53:04 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Tue Sep 01 20:53:04 2015 Received: from localhost ([127.0.0.1]:45508 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1ZWwIR-0003zM-1m for submit <at> debbugs.gnu.org; Tue, 01 Sep 2015 20:53:04 -0400 Received: from eggs.gnu.org ([208.118.235.92]:44287) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <michaellee213@HIDDEN>) id 1ZWw7t-0003kF-0j for submit <at> debbugs.gnu.org; Tue, 01 Sep 2015 20:42:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <michaellee213@HIDDEN>) id 1ZWw7r-00068K-D3 for submit <at> debbugs.gnu.org; Tue, 01 Sep 2015 20:42:08 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: ** X-Spam-Status: No, score=2.9 required=5.0 tests=BAYES_50,FORGED_YAHOO_RCVD, FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FREEMAIL_REPLYTO_END_DIGIT, HTML_MESSAGE,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:55269) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <michaellee213@HIDDEN>) id 1ZWw7r-00068G-AC for submit <at> debbugs.gnu.org; Tue, 01 Sep 2015 20:42:07 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54290) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <michaellee213@HIDDEN>) id 1ZWw7p-000146-VM for bug-coreutils@HIDDEN; Tue, 01 Sep 2015 20:42:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <michaellee213@HIDDEN>) id 1ZWw7l-0005xn-S8 for bug-coreutils@HIDDEN; Tue, 01 Sep 2015 20:42:05 -0400 Received: from nm48-vm1.bullet.mail.bf1.yahoo.com ([216.109.115.156]:34020) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <michaellee213@HIDDEN>) id 1ZWw7l-0005uk-NH for bug-coreutils@HIDDEN; Tue, 01 Sep 2015 20:42:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1441154520; bh=DAntNfukD+3Xc1qh34idydtn9otK5sXsrE6J6+7aHf0=; h=Date:From:Reply-To:To:Subject:From:Subject; b=QsGe3OLzA+7togWYDs+jhfhfMA+tVX7BNqnmqTTsgCPHQPMVhSFO6TXhfIbd7Dx1shZEHaaRwA+6ApfLz5L427n08YBBsv/GeoZ8oZFI+yDz/5Rdu0WPoodeWb1pTI/DlrNozDYCweRXinenWNjHKVOLHTgN6Cw5zPo7mHysQ0ulOD4wetoVpWoARfKitVHx5Bn2v/zN7EBzyKAkuNSgyWrxpW34JIswlfUQq2+DfwkE8LDwYDjNOSpc0btOy0A5uRhGbZMVmL/p2ltXbMFQyZPZ736Xm5eFeQxjHcwwTC+qxTTwyyKjwpk9NrebDOLzHThc4nCoiWSO/drloYrLDw== Received: from [98.139.215.142] by nm48.bullet.mail.bf1.yahoo.com with NNFMP; 02 Sep 2015 00:42:00 -0000 Received: from [98.139.212.200] by tm13.bullet.mail.bf1.yahoo.com with NNFMP; 02 Sep 2015 00:42:00 -0000 Received: from [127.0.0.1] by omp1009.mail.bf1.yahoo.com with NNFMP; 02 Sep 2015 00:42:00 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 745314.72874.bm@HIDDEN X-YMail-OSG: vXoYRM8VM1l1ygK3PVatHhWIaM4x9lwh4EhpQp1vDqrCivI6p7q4Uei41KtbH0G 0eF9x02b.ryiEC6kl5KOZOOJOAyeHf5D7cFGdnl0ivphN1.R6.yx4tmXzxDaJ4P.DphswFEkfvSt vEBOZhbwBhiipiWuC1LlOF9V4xNBMfK1ApDwAvq2IwpvQixzXILEBaeYArJheXl6yRQipYcYi8Ko Zcgtrb5zJgBdWCZzUEV5vMXVSuByiDBxxEyMVif4s5MWvQTwcpX3WZBR1IcQ8HtPcPZIBEU5HgTf hoPA82HftKEhijhpx2.M130iGJle4BXqT4uRcb0cfH6ZhHQAcXVYvMR4_S_WwiIQhWFyexaI5eEq 1qXKOWUzSZd_Cmgf7nG2PNr7lb.x2wFs3D4rSlEI1woXAiXARIN8cFbtphRkPsaNWdbtNODv1TH. SAoYhLCmE8n5zUFFKV0DdtjaIqcAEkRDki.B9CcWLxCiVx8Gh9dj4dDPREbvM8CJ8rCWKreBkgEp cXiylWd7xlx8W Received: by 66.196.80.145; Wed, 02 Sep 2015 00:42:00 +0000 Date: Wed, 2 Sep 2015 00:41:09 +0000 (UTC) From: Michael Lee <michaellee213@HIDDEN> To: "bug-coreutils@HIDDEN" <bug-coreutils@HIDDEN> Message-ID: <1569154567.83126.1441154469654.JavaMail.yahoo@HIDDEN> Subject: Bug with cut and Spanish characters from text file with UTF-8 encoding MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_83125_1843901992.1441154469646" Content-Length: 7153 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 208.118.235.17 X-Spam-Score: -2.8 (--) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Tue, 01 Sep 2015 20:53:01 -0400 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Michael Lee <michaellee213@HIDDEN> List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.8 (--) ------=_Part_83125_1843901992.1441154469646 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable To whom it may concern: To preface the explanation of this possible bug, the following was tested: Encoding(s) was/were determined by opening the Spanish text files with vi a= nd using ":set" to view the encoding type(s). Text files containing Spanish letters/characters were used in this test.=C2= =A0 First, the locale in the bash shell was set to UTF-8 (default setting w= ith Ubuntu) and the encoding on the first test file was encoded with Latin1= .=C2=A0 Under these conditions head and tail were used to try to output sev= eral Spanish letters/characters with accents above the letter.=C2=A0 Trying= to use "head spanish.txt" and "tail spanish.txt" resulted in output with s= paces in place of the Spanish letters/characters. After spanish.txt was converted from Latin1 to UTF-8 with iconv, the test w= as repeated with the head and tail utilities and then the output was correc= t.=C2=A0 The Spanish letters/characters then displayed correctly instead of= what previously appeared to be blank spaces.=C2=A0 When the "cut" command = was added to this, the behavior of spaces taking the place of letters retur= ned. For example, "head -n 50 spanish.txt | cut -c 1" or "tail -n 50 spanish.txt= | cut -c 1" will result in the first character showing only blank spaces w= here there are Spanish letters/characters.=C2=A0 Letters with accents are d= isplayed as blank spaces.=C2=A0 Using only head or tail will show the Spani= sh letters correctly, but not with the cut command. When using cut as, "cut -c 1" with a text file with Spanish characters, it = does not display those characters. For example, the character =C3=A3 or =C3=A1 will not display if it is the f= irst character and the file is trimmed using the cut command. Converting the file from Latin1 to UTF-8 solved the problem with head and t= ail, but not cut. The cut command does not seem to output the special letters/characters corr= ectly. Is there an environment variable that could fix this or could it possibly b= e a bug? Thank you for your time. Sincerely,Michael Lee =20 ------=_Part_83125_1843901992.1441154469646 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <html><body><div style=3D"color:#000; background-color:#fff; font-family:He= lveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;fo= nt-size:13px"><div id=3D"yui_3_16_0_1_1441151974321_2656">To whom it may co= ncern:</div><div id=3D"yui_3_16_0_1_1441151974321_2657"><br></div><div id= =3D"yui_3_16_0_1_1441151974321_2658">To preface the explanation of this pos= sible bug, the following was tested:</div><div id=3D"yui_3_16_0_1_144115197= 4321_3644"><br></div><div dir=3D"ltr" id=3D"yui_3_16_0_1_1441151974321_3436= ">Encoding(s) was/were determined by opening the Spanish text files with vi= and using ":set" to view the encoding type(s).<br></div><div id=3D"yui_3_1= 6_0_1_1441151974321_3606" dir=3D"ltr"><br></div><div id=3D"yui_3_16_0_1_144= 1151974321_3604" dir=3D"ltr">Text files containing Spanish letters/characte= rs were used in this test. First, the locale in the bash shell was se= t to UTF-8 (default setting with Ubuntu) and the encoding on the first test= file was encoded with Latin1. Under these conditions head and tail w= ere used to try to output several Spanish letters/characters with accents a= bove the letter. Trying to use "head spanish.txt" and "tail spanish.t= xt" resulted in output with spaces in place of the Spanish letters/characte= rs.</div><div id=3D"yui_3_16_0_1_1441151974321_3603" dir=3D"ltr"><br></div>= <div id=3D"yui_3_16_0_1_1441151974321_3602" dir=3D"ltr">After spanish.txt w= as converted from Latin1 to UTF-8 with iconv, the test was repeated with th= e head and tail utilities and then the output was correct. The Spanis= h letters/characters then displayed correctly instead of what previously ap= peared to be blank spaces. When the "cut" command was added to this, = the behavior of spaces taking the place of letters returned.</div><div id= =3D"yui_3_16_0_1_1441151974321_3759" dir=3D"ltr"><br></div><div id=3D"yui_3= _16_0_1_1441151974321_3746" dir=3D"ltr">For example, "head -n 50 spanish.tx= t | cut -c 1" or "tail -n 50 spanish.txt | cut -c 1" will result in the fir= st character showing only blank spaces where there are Spanish letters/char= acters. Letters with accents are displayed as blank spaces. Usi= ng only head or tail will show the Spanish letters correctly, but not with = the cut command.<br></div><div id=3D"yui_3_16_0_1_1441151974321_3435"><br><= /div><div id=3D"yui_3_16_0_1_1441151974321_3418">When using cut as, "cut -c= 1" with a text file with Spanish characters, it does not display those cha= racters.</div><div id=3D"yui_3_16_0_1_1441151974321_2672"><br></div><div di= r=3D"ltr" class=3D"" id=3D"yui_3_16_0_1_1441151974321_3294" style=3D"margin= -bottom: 0in; line-height: 100%">For example, the character =C3=A3 or =C3= =A1 will not display if it is the first character and the file is trimmed u= sing the cut command.</div><div id=3D"yui_3_16_0_1_1441151974321_3393" dir= =3D"ltr" class=3D"" style=3D"margin-bottom: 0in; line-height: 100%"><br></d= iv><div id=3D"yui_3_16_0_1_1441151974321_3461" dir=3D"ltr" class=3D"" style= =3D"margin-bottom: 0in; line-height: 100%">Converting the file from Latin1 = to UTF-8 solved the problem with head and tail, but not cut.</div><div id= =3D"yui_3_16_0_1_1441151974321_3839" dir=3D"ltr" class=3D"" style=3D"margin= -bottom: 0in; line-height: 100%"><br></div><div id=3D"yui_3_16_0_1_14411519= 74321_3838" dir=3D"ltr" class=3D"" style=3D"margin-bottom: 0in; line-height= : 100%">The cut command does not seem to output the special letters/charact= ers correctly.</div><div id=3D"yui_3_16_0_1_1441151974321_3878" dir=3D"ltr"= class=3D"" style=3D"margin-bottom: 0in; line-height: 100%"><br></div><div = id=3D"yui_3_16_0_1_1441151974321_3877" dir=3D"ltr" class=3D"" style=3D"marg= in-bottom: 0in; line-height: 100%">Is there an environment variable that co= uld fix this or could it possibly be a bug?</div><div id=3D"yui_3_16_0_1_14= 41151974321_3876" dir=3D"ltr" class=3D"" style=3D"margin-bottom: 0in; line-= height: 100%"><br></div><div id=3D"yui_3_16_0_1_1441151974321_3875" dir=3D"= ltr" class=3D"" style=3D"margin-bottom: 0in; line-height: 100%">Thank you f= or your time.</div><div id=3D"yui_3_16_0_1_1441151974321_3874" dir=3D"ltr" = class=3D"" style=3D"margin-bottom: 0in; line-height: 100%"><br></div><div i= d=3D"yui_3_16_0_1_1441151974321_3872" dir=3D"ltr" class=3D"" style=3D"margi= n-bottom: 0in; line-height: 100%">Sincerely,</div><div id=3D"yui_3_16_0_1_1= 441151974321_3873" dir=3D"ltr" class=3D"" style=3D"margin-bottom: 0in; line= -height: 100%">Michael Lee<br></div><div dir=3D"ltr"> </div><div dir=3D"ltr"> </div><div dir=3D"ltr"> </div><div id=3D"yui_3_16_0_1_1441151974321_2709">=20 =09 =09 =09 =09 </div><div id=3D"yui_3_16_0_1_1441151974321_2611"><br></div></div></body></= html> ------=_Part_83125_1843901992.1441154469646--
Michael Lee <michaellee213@HIDDEN>
:bug-coreutils@HIDDEN
.
Full text available.bug-coreutils@HIDDEN
:bug#21395
; Package coreutils
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.