Received: (at 64277) by debbugs.gnu.org; 24 Jun 2023 22:17:44 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Sat Jun 24 18:17:44 2023 Received: from localhost ([127.0.0.1]:41657 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1qDBZg-0000OR-Gg for submit <at> debbugs.gnu.org; Sat, 24 Jun 2023 18:17:44 -0400 Received: from mail.cs.ucla.edu ([131.179.128.66]:34640) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eggert@HIDDEN>) id 1qDBZe-0000O7-9E for 64277 <at> debbugs.gnu.org; Sat, 24 Jun 2023 18:17:43 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id DCC4A3C02213D; Sat, 24 Jun 2023 15:17:35 -0700 (PDT) Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 8MfOCvWSYwxo; Sat, 24 Jun 2023 15:17:34 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id CE1E33C10C5E4; Sat, 24 Jun 2023 15:17:34 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.cs.ucla.edu CE1E33C10C5E4 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu; s=9D0B346E-2AEB-11ED-9476-E14B719DCE6C; t=1687645054; bh=lT0nfiPtxllHSmh+MEzy8QwgD2TEyrQmqoSP9IOw68M=; h=Message-ID:Date:MIME-Version:To:From; b=D9XsrQeObvZyy/pmp0WOCHxjV49idCu4rBuXRhcsC657QxAxhbKntRMlNUHsqcj+E LEletvVpYjqPMw9uVoi/H6QTMJIQZNgNv43zUEcU2p54/D/A0Nx67xBWgvF6VjJUsC GUHUbgj5aSNy8VCkzzH3gs8rEGm7pCtYZiwGE3qUsrB2+MR1xKHJs7i1+qKyN1azLR ao6O9cYqr2r2Dxnog0vJE4X10ixCyQ3M1X2e7jiFgFz+VFah+4iUcXoKoaJbwxPyJe DqyoxXjqzsGXOInJvxmWx6xLPjUKoTmICUBwtTuJynDWS/AV/SNJFARe0/My+FYVcK Gi7/4aCYiDX8Q== X-Virus-Scanned: amavisd-new at mail.cs.ucla.edu Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id SlQcHJd5qfa1; Sat, 24 Jun 2023 15:17:34 -0700 (PDT) Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com [172.91.119.151]) by mail.cs.ucla.edu (Postfix) with ESMTPSA id A70163C02213D; Sat, 24 Jun 2023 15:17:34 -0700 (PDT) Message-ID: <d57fa2a2-642e-87c4-d4dd-44dc744a2ec0@HIDDEN> Date: Sat, 24 Jun 2023 15:17:34 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Content-Language: en-US To: Jeremy Hetzler <jeremyhetzler@HIDDEN> References: <CAOh4nmnCN1GHyGwbR85UEHNYpBKSQ1pjEioRvw88i5M-0Rh8bg@HIDDEN> From: Paul Eggert <eggert@HIDDEN> Organization: UCLA Computer Science Department Subject: Re: bug#64277: [feature request] handle files encoded in utf-16le In-Reply-To: <CAOh4nmnCN1GHyGwbR85UEHNYpBKSQ1pjEioRvw88i5M-0Rh8bg@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.1 (-) X-Debbugs-Envelope-To: 64277 Cc: 64277 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.1 (--) On 2023-06-24 14:23, Jeremy Hetzler wrote: > I would like to request a feature to be added to grep which would enable it > to transparently decode UTF16-LE files so they can be conveniently searched. Not sure it's worth the effort as this format is not that common for GNU grep, it'd be a pain to add proper support for it, and anyway 16-bit encodings have been problematic ever since Unicode crossed the 16-bit Rubicon. I'm not saying we'd reject a patch if someone wrote it, but I'd say it should be low priority.
bug-grep@HIDDEN
:bug#64277
; Package grep
.
Full text available.Received: (at submit) by debbugs.gnu.org; 24 Jun 2023 21:23:49 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Sat Jun 24 17:23:49 2023 Received: from localhost ([127.0.0.1]:41588 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1qDAjU-0006uF-7U for submit <at> debbugs.gnu.org; Sat, 24 Jun 2023 17:23:49 -0400 Received: from lists.gnu.org ([209.51.188.17]:47704) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <jeremyhetzler@HIDDEN>) id 1qDAjO-0006tx-EX for submit <at> debbugs.gnu.org; Sat, 24 Jun 2023 17:23:46 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <jeremyhetzler@HIDDEN>) id 1qDAjM-0001DB-QS for bug-grep@HIDDEN; Sat, 24 Jun 2023 17:23:40 -0400 Received: from mail-lf1-x130.google.com ([2a00:1450:4864:20::130]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <jeremyhetzler@HIDDEN>) id 1qDAjJ-0001Wx-NZ for bug-grep@HIDDEN; Sat, 24 Jun 2023 17:23:40 -0400 Received: by mail-lf1-x130.google.com with SMTP id 2adb3069b0e04-4f86a7a5499so2563963e87.2 for <bug-grep@HIDDEN>; Sat, 24 Jun 2023 14:23:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687641813; x=1690233813; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=eDe2NEJo5MXh2jXKQC/KlsIb3apsoqFQ58XBnWZVpnY=; b=Jr2p5ejPU2fsT0XGxlZaj4SVdYhZHfV22dtfpqJhCT1XTkbmqASoVBIR8Uw5R+PLS1 YLFUUeBB3QiBN3Cnk8/+FQdtl0RKqZAb7eRBbOxyY6ofKWNRl//yEZOZsvsoa04SgJmS jpT5uiy4twPNhOtUWxub2uLk23PzeMqFww1eq3EB9IEye87uWTvJu1uvzFqjpHvPp9XT QRuvrwYFfRhqHZmurvN734b9UdTmH6J64w20lOZNYwlMZM8lvTzyNFqiAvmZX/dLfsXJ FAn7fL1xEH7/u2ZrI/Pm/bJJz/6kZhRsDk9R6xgBWWaWT1Pmj1ioAhVPiBmWYf1u5OjH HU8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687641813; x=1690233813; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=eDe2NEJo5MXh2jXKQC/KlsIb3apsoqFQ58XBnWZVpnY=; b=JygzYduWYzjpuHZ7A5ssdWfPGNHamXdRAV/R3twCMr6fx3z/4p3yDzg0GkzNIhkRCD NwnXaLVMpnLpj6mFQzbQRT3VCjswiFoBFRWKbjv8M4HZhRoFypYmHGD7zZiyT5l0pu0D C0whJMG2RE6E3VDf5KV4kxeKd83AabTdSAFKZeeyC76KkAUXtyH3w+h98PpRvPGK7h5k AXsoFIek6nqF6/XTLRaGyHD7JXuhStrnRW+w8N3OsT+4vCsPLBk2UYBulp5w2W3O/Ldz dIlbJdBnOD62cD0h7h3tAae/0U5Q3zB60Nx0sn6zXvgXGBeOk/fxy+NuDUCpaEYyFQk4 H6Yg== X-Gm-Message-State: AC+VfDxwTUOMgFi/ZGv0GiyO1DZ332dl3blK7rDzVcspIkBJ7xg0qM7X 1QhYKizsWq2/Ne7oaIv5LB18IvcFf3yx9zrY1zALZBDXrNM= X-Google-Smtp-Source: ACHHUZ6C7yWgu3a44OwAN9Ri6Kc3cQdsLZ0rOpbbauXBpO9ZNRb807MNSG8rFOMz959BzIZSuRz5KITr/Sk8WqhIYA0= X-Received: by 2002:a05:6512:68:b0:4f7:4098:9905 with SMTP id i8-20020a056512006800b004f740989905mr14449556lfo.65.1687641812887; Sat, 24 Jun 2023 14:23:32 -0700 (PDT) MIME-Version: 1.0 From: Jeremy Hetzler <jeremyhetzler@HIDDEN> Date: Sat, 24 Jun 2023 17:23:05 -0400 Message-ID: <CAOh4nmnCN1GHyGwbR85UEHNYpBKSQ1pjEioRvw88i5M-0Rh8bg@HIDDEN> Subject: [feature request] handle files encoded in utf-16le To: bug-grep@HIDDEN Content-Type: multipart/alternative; boundary="000000000000c3ee2e05fee6ba42" Received-SPF: pass client-ip=2a00:1450:4864:20::130; envelope-from=jeremyhetzler@HIDDEN; helo=mail-lf1-x130.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.3 (--) --000000000000c3ee2e05fee6ba42 Content-Type: text/plain; charset="UTF-8" Maintainers, I recently was confused as to why GNU grep did not find any matches in certain files, when vim clearly showed that the search string was present. Turns out the files (log files from a Windows application) are encoded in UTF16-LE. $ file '06-21-2023 03-22-46' > 06-21-2023 03-22-46: Unicode text, UTF-16, little-endian text, with CRLF > line terminators > $ /bin/od -Ad -w16 -t cz '06-21-2023 03-22-46' | head -10 > 0000000 377 376 [ \0 H \0 E \0 A \0 D \0 E \0 R \0 > >..[.H.E.A.D.E.R.< > 0000016 : \0 ] \0 \r \0 \n \0 [ \0 I \0 D \0 r \0 > >:.].....[.I.D.r.< > 0000032 i \0 v \0 e \0 \0 v \0 e \0 r \0 s \0 > >i.v.e. .v.e.r.s.< > 0000048 i \0 o \0 n \0 : \0 \0 6 \0 . \0 7 \0 > >i.o.n.:. .6...7.< > 0000064 . \0 4 \0 . \0 4 \0 6 \0 \0 R \0 e \0 > >..4...4.6. .R.e.< > 0000080 l \0 e \0 a \0 s \0 e \0 \0 D \0 a \0 > >l.e.a.s.e. .D.a.< > 0000096 t \0 e \0 : \0 \0 0 \0 6 \0 / \0 1 \0 > >t.e.:. .0.6./.1.< > 0000112 6 \0 / \0 2 \0 0 \0 2 \0 3 \0 ] \0 \r \0 > >6./.2.0.2.3.]...< > 0000128 \n \0 [ \0 I \0 n \0 t \0 e \0 r \0 a \0 > >..[.I.n.t.e.r.a.< > 0000144 c \0 t \0 i \0 v \0 e \0 \0 B \0 a \0 > >c.t.i.v.e. .B.a.< > $ grep --version > grep (GNU grep) 3.11 > Packaged by Cygwin (3.11-1) > Copyright (C) 2023 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later < > https://gnu.org/licenses/gpl.html>. > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. > There is no easy way to use grep to search these files, even if one knows the encoding in advance. I would like to request a feature to be added to grep which would enable it to transparently decode UTF16-LE files so they can be conveniently searched. Thanks, Jeremy Hetzler (he/him) --000000000000c3ee2e05fee6ba42 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr">Maintainers,<br><br>I recently was confused as to why GNU = grep did not find any matches in certain files, when vim clearly showed tha= t the search string was present.<br><br>Turns out the files (log files from= a Windows application) are encoded in UTF16-LE.<br><br><blockquote class= =3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rg= b(204,204,204);padding-left:1ex"><font face=3D"monospace">$ file '06-21= -2023 03-22-46'<br></font><font face=3D"monospace">06-21-2023 03-22-46:= Unicode text, UTF-16, little-endian text, with CRLF line terminators</font= ></blockquote><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px= 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><font face= =3D"monospace"><br></font><font face=3D"monospace">$ /bin/od -Ad -w16 -t cz= '06-21-2023 03-22-46' | head -10<br></font><font face=3D"monospace= ">0000000 377 376 =C2=A0 [ =C2=A0\0 =C2=A0 H =C2=A0\0 =C2=A0 E =C2=A0\0 =C2= =A0 A =C2=A0\0 =C2=A0 D =C2=A0\0 =C2=A0 E =C2=A0\0 =C2=A0 R =C2=A0\0 =C2=A0= >..[.H.E.A.D.E.R.<<br></font><font face=3D"monospace">0000016 =C2=A0 = : =C2=A0\0 =C2=A0 ] =C2=A0\0 =C2=A0\r =C2=A0\0 =C2=A0\n =C2=A0\0 =C2=A0 [ = =C2=A0\0 =C2=A0 I =C2=A0\0 =C2=A0 D =C2=A0\0 =C2=A0 r =C2=A0\0 =C2=A0>:.= ].....[.I.D.r.<<br></font><font face=3D"monospace">0000032 =C2=A0 i =C2= =A0\0 =C2=A0 v =C2=A0\0 =C2=A0 e =C2=A0\0 =C2=A0 =C2=A0 =C2=A0\0 =C2=A0 v = =C2=A0\0 =C2=A0 e =C2=A0\0 =C2=A0 r =C2=A0\0 =C2=A0 s =C2=A0\0 =C2=A0>i.= v.e. .v.e.r.s.<<br></font><font face=3D"monospace">0000048 =C2=A0 i =C2= =A0\0 =C2=A0 o =C2=A0\0 =C2=A0 n =C2=A0\0 =C2=A0 : =C2=A0\0 =C2=A0 =C2=A0 = =C2=A0\0 =C2=A0 6 =C2=A0\0 =C2=A0 . =C2=A0\0 =C2=A0 7 =C2=A0\0 =C2=A0>i.= o.n.:. .6...7.<<br></font><font face=3D"monospace">0000064 =C2=A0 . =C2= =A0\0 =C2=A0 4 =C2=A0\0 =C2=A0 . =C2=A0\0 =C2=A0 4 =C2=A0\0 =C2=A0 6 =C2=A0= \0 =C2=A0 =C2=A0 =C2=A0\0 =C2=A0 R =C2=A0\0 =C2=A0 e =C2=A0\0 =C2=A0>..4= ...4.6. .R.e.<<br></font><font face=3D"monospace">0000080 =C2=A0 l =C2= =A0\0 =C2=A0 e =C2=A0\0 =C2=A0 a =C2=A0\0 =C2=A0 s =C2=A0\0 =C2=A0 e =C2=A0= \0 =C2=A0 =C2=A0 =C2=A0\0 =C2=A0 D =C2=A0\0 =C2=A0 a =C2=A0\0 =C2=A0>l.e= .a.s.e. .D.a.<<br></font><font face=3D"monospace">0000096 =C2=A0 t =C2= =A0\0 =C2=A0 e =C2=A0\0 =C2=A0 : =C2=A0\0 =C2=A0 =C2=A0 =C2=A0\0 =C2=A0 0 = =C2=A0\0 =C2=A0 6 =C2=A0\0 =C2=A0 / =C2=A0\0 =C2=A0 1 =C2=A0\0 =C2=A0>t.= e.:. .0.6./.1.<<br></font><font face=3D"monospace">0000112 =C2=A0 6 =C2= =A0\0 =C2=A0 / =C2=A0\0 =C2=A0 2 =C2=A0\0 =C2=A0 0 =C2=A0\0 =C2=A0 2 =C2=A0= \0 =C2=A0 3 =C2=A0\0 =C2=A0 ] =C2=A0\0 =C2=A0\r =C2=A0\0 =C2=A0>6./.2.0.= 2.3.]...<<br></font><font face=3D"monospace">0000128 =C2=A0\n =C2=A0\0 = =C2=A0 [ =C2=A0\0 =C2=A0 I =C2=A0\0 =C2=A0 n =C2=A0\0 =C2=A0 t =C2=A0\0 =C2= =A0 e =C2=A0\0 =C2=A0 r =C2=A0\0 =C2=A0 a =C2=A0\0 =C2=A0>..[.I.n.t.e.r.= a.<<br></font><font face=3D"monospace">0000144 =C2=A0 c =C2=A0\0 =C2=A0 = t =C2=A0\0 =C2=A0 i =C2=A0\0 =C2=A0 v =C2=A0\0 =C2=A0 e =C2=A0\0 =C2=A0 =C2= =A0 =C2=A0\0 =C2=A0 B =C2=A0\0 =C2=A0 a =C2=A0\0 =C2=A0>c.t.i.v.e. .B.a.= <</font></blockquote><blockquote class=3D"gmail_quote" style=3D"margin:0= px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><= font face=3D"monospace"><br></font><font face=3D"monospace">$ grep --versio= n<br></font><font face=3D"monospace">grep (GNU grep) 3.11<br></font><font f= ace=3D"monospace">Packaged by Cygwin (3.11-1)<br></font><font face=3D"monos= pace">Copyright (C) 2023 Free Software Foundation, Inc.<br></font><font fac= e=3D"monospace">License GPLv3+: GNU GPL version 3 or later <<a href=3D"h= ttps://gnu.org/licenses/gpl.html">https://gnu.org/licenses/gpl.html</a>>= .<br></font><font face=3D"monospace">This is free software: you are free to= change and redistribute it.<br></font><font face=3D"monospace">There is NO= WARRANTY, to the extent permitted by law.</font></blockquote><div><blockqu= ote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px= solid rgb(204,204,204);padding-left:1ex"><br></blockquote></div><br>There = is no easy way to use grep to search these files, even if one knows the enc= oding in advance.<div><br></div><div><div>I would like to request a feature= to be added to grep which would enable it to transparently decode UTF16-LE= files so they can be conveniently searched.</div><div><br></div><div><div>= <div><div><div><div><div><div>Thanks,<br>Jeremy Hetzler<br>(he/him)</div></= div></div></div></div></div></div></div></div><div><br></div><div><br></div= ></div> --000000000000c3ee2e05fee6ba42--
Jeremy Hetzler <jeremyhetzler@HIDDEN>
:bug-grep@HIDDEN
.
Full text available.bug-grep@HIDDEN
:bug#64277
; Package grep
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.