GNU bug report logs - #31149
27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: emacs; Reported by: Stefan Monnier <monnier@HIDDEN>; dated Fri, 13 Apr 2018 20:56:02 UTC; Maintainer for emacs is bug-gnu-emacs@HIDDEN.

Message received at 31149 <at> debbugs.gnu.org:


Received: (at 31149) by debbugs.gnu.org; 29 Sep 2019 08:44:56 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Sep 29 04:44:56 2019
Received: from localhost ([127.0.0.1]:52216 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1iEUp6-0001pt-32
	for submit <at> debbugs.gnu.org; Sun, 29 Sep 2019 04:44:56 -0400
Received: from quimby.gnus.org ([80.91.231.51]:48908)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <larsi@HIDDEN>) id 1iEUp4-0001pj-3z
 for 31149 <at> debbugs.gnu.org; Sun, 29 Sep 2019 04:44:55 -0400
Received: from cm-84.212.202.86.getinternet.no ([84.212.202.86] helo=marnie)
 by quimby.gnus.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.89) (envelope-from <larsi@HIDDEN>)
 id 1iEUoy-0006ej-Em; Sun, 29 Sep 2019 10:44:50 +0200
From: Lars Ingebrigtsen <larsi@HIDDEN>
To: Stefan Monnier <monnier@HIDDEN>
Subject: Re: bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns
 mis-decoded text
References: <jwv36zyhlmp.fsf@HIDDEN>
Date: Sun, 29 Sep 2019 10:44:48 +0200
In-Reply-To: <jwv36zyhlmp.fsf@HIDDEN> (Stefan Monnier's message of
 "Fri, 13 Apr 2018 16:55:26 -0400")
Message-ID: <87h84vqynz.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org",
 has NOT identified this incoming email as spam.  The original
 message has been attached to this so you can view it or label
 similar future email.  If you have any questions, see
 @@CONTACT_ADDRESS@@ for details.
 Content preview: Stefan Monnier <monnier@HIDDEN> writes: >
 (gui-get-selection
 nil 'text/html) > > returns utf-16 text when the primary selection is owned
 by Mozilla, but > we decode it as latin-1 instead, so it looks like garbage.
 Content analysis details:   (-2.9 points, 5.0 required)
 pts rule name              description
 ---- ---------------------- --------------------------------------------------
 -1.0 ALL_TRUSTED            Passed through trusted hosts only via SMTP
 -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
 [score: 0.0000]
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 31149
Cc: 31149 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

Stefan Monnier <monnier@HIDDEN> writes:

> (gui-get-selection nil 'text/html)
>
> returns utf-16 text when the primary selection is owned by Mozilla, but
> we decode it as latin-1 instead, so it looks like garbage.

This is still the case on the trunk:

#("=C3=BF=C3=BEM^@e^@r^@g^@e^@d^@" 0 14 (foreign-selection STRING charset i=
so-8859-1))

[...]

> I can't figure out if/where these kinds of things about the X11
> selection protocol is described, but at least in `xclip` they have
> a hack specifically for this case:
>
>     [...]
>     if (html !=3D None && sel_type =3D=3D html) {
> 	/* if the buffer contains UCS-2 (UTF-16), convert to
> 	 * UTF-8.  Mozilla-based browsers do this for the
> 	 * text/html target.
> 	 */
>     [...]
>
> and according to the subsequent code it's not even always the
> same endianness.

I think it would make sense for us to do the same here.  It should be
easy enough for us to detect that the string is utf-16, I think?  The
data has a BOM and everything...

--=20
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#31149; Package emacs. Full text available.

Message received at 31149 <at> debbugs.gnu.org:


Received: (at 31149) by debbugs.gnu.org; 19 May 2018 08:51:07 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat May 19 04:51:07 2018
Received: from localhost ([127.0.0.1]:40482 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1fJxZy-0007Vc-0c
	for submit <at> debbugs.gnu.org; Sat, 19 May 2018 04:51:07 -0400
Received: from eggs.gnu.org ([208.118.235.92]:50004)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1fJxZt-0007V6-FN
 for 31149 <at> debbugs.gnu.org; Sat, 19 May 2018 04:51:04 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <eliz@HIDDEN>) id 1fJxZl-0006Bf-0r
 for 31149 <at> debbugs.gnu.org; Sat, 19 May 2018 04:50:56 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_05 autolearn=disabled
 version=3.3.2
Received: from fencepost.gnu.org ([2001:4830:134:3::e]:42936)
 by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@HIDDEN>)
 id 1fJxZX-00068b-Bx; Sat, 19 May 2018 04:50:39 -0400
Received: from [176.228.60.248] (port=1950 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <eliz@HIDDEN>)
 id 1fJxZW-0007Yr-MB; Sat, 19 May 2018 04:50:39 -0400
Date: Sat, 19 May 2018 11:50:37 +0300
Message-Id: <83po1sghb6.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Kenichi Handa <handa@HIDDEN>
In-reply-to: <83vabuo8iy.fsf@HIDDEN> (message from Eli Zaretskii on Fri, 11
 May 2018 12:18:13 +0300)
Subject: Re: bug#31149: 27.0.50;
 (gui-get-selection nil 'text/html) returns mis-decoded text
References: <jwv36zyhlmp.fsf@HIDDEN> <83vacu47sm.fsf@HIDDEN>
 <83zi1sv5j5.fsf@HIDDEN> <83h8nmsasr.fsf@HIDDEN> <83vabuo8iy.fsf@HIDDEN>
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2001:4830:134:3::e
X-Spam-Score: -5.0 (-----)
X-Debbugs-Envelope-To: 31149
Cc: larsi@HIDDEN, 31149 <at> debbugs.gnu.org, monnier@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Reply-To: Eli Zaretskii <eliz@HIDDEN>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -6.0 (------)

Ping! Ping! Ping! Ping!

> Date: Fri, 11 May 2018 12:18:13 +0300
> From: Eli Zaretskii <eliz@HIDDEN>
> Cc: larsi@HIDDEN, 31149 <at> debbugs.gnu.org, monnier@HIDDEN
> 
> Ping! Ping! Ping!
> 
> > Date: Sat, 05 May 2018 12:37:24 +0300
> > From: Eli Zaretskii <eliz@HIDDEN>
> > Cc: larsi@HIDDEN, 31149 <at> debbugs.gnu.org, monnier@HIDDEN
> > 
> > Ping! Ping!
> > 
> > > Date: Tue, 24 Apr 2018 21:11:10 +0300
> > > From: Eli Zaretskii <eliz@HIDDEN>
> > > Cc: larsi@HIDDEN, 31149 <at> debbugs.gnu.org, monnier@HIDDEN
> > > 
> > > Ping!
> > > 
> > > > Date: Sat, 14 Apr 2018 09:32:41 +0300
> > > > From: Eli Zaretskii <eliz@HIDDEN>
> > > > Cc: larsi@HIDDEN, 31149 <at> debbugs.gnu.org
> > > > 
> > > > > From: Stefan Monnier <monnier@HIDDEN>
> > > > > Date: Fri, 13 Apr 2018 16:55:26 -0400
> > > > > Cc: Lars Ingebrigtsen <larsi@HIDDEN>
> > > > > 
> > > > > (gui-get-selection nil 'text/html)
> > > > > 
> > > > > returns utf-16 text when the primary selection is owned by Mozilla, but
> > > > > we decode it as latin-1 instead, so it looks like garbage.
> > > > > 
> > > > > I don't know why we're getting utf-16.  Is that what standards say it
> > > > > should do?  If so, we should adjust our code (which currently knows
> > > > > nothing about the `text/html` target-type).
> > > > > 
> > > > > As for why we decode it as latin-1, it's (under GNU/Linux; Lars may be
> > > > > using something else because he's getting something with a `charset`
> > > > > property which I don't get here) because:
> > > > > - selection_data_to_lisp_data (in xselect.c) makes a unibyte string with
> > > > >   the property `foreign-selection` set to `STRING` when the actual
> > > > >   string type is not known (as opposed to COMPOUND-TEXT and
> > > > >   UTF8-STRING, basically).
> > > > > - in gui-get-selection we then have a mapping from `STRING` to
> > > > >   `iso-8859-1` (which is apparently the right thing for the official
> > > > >   `STRING` target-type in X11).
> > > > > 
> > > > > I can't figure out if/where these kinds of things about the X11
> > > > > selection protocol is described, but at least in `xclip` they have
> > > > > a hack specifically for this case:
> > > > > 
> > > > >     [...]
> > > > >     if (html != None && sel_type == html) {
> > > > > 	/* if the buffer contains UCS-2 (UTF-16), convert to
> > > > > 	 * UTF-8.  Mozilla-based browsers do this for the
> > > > > 	 * text/html target.
> > > > > 	 */
> > > > >     [...]
> > > > > 
> > > > > and according to the subsequent code it's not even always the
> > > > > same endianness.
> > > > > 
> > > > > I don't know what is the difference between the `target-type` passed to
> > > > > x-get-selection-internal and the `foreign-selection` property we get on
> > > > > the returned string (they seem to be the same in my tests, except when
> > > > > the type is not one of the known ones, and where we then force
> > > > > `foreign-selection` to be `STRING`).
> > > > 
> > > > I hope Handa-san (CC'ed) could comment on this.
> > > 
> 
> 
> 
> 




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#31149; Package emacs. Full text available.

Message received at 31149 <at> debbugs.gnu.org:


Received: (at 31149) by debbugs.gnu.org; 11 May 2018 09:18:47 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri May 11 05:18:47 2018
Received: from localhost ([127.0.0.1]:57130 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1fH4CK-0005pt-2j
	for submit <at> debbugs.gnu.org; Fri, 11 May 2018 05:18:47 -0400
Received: from eggs.gnu.org ([208.118.235.92]:33453)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1fH4CG-0005pe-2O
 for 31149 <at> debbugs.gnu.org; Fri, 11 May 2018 05:18:43 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <eliz@HIDDEN>) id 1fH4C6-0006Pj-Vg
 for 31149 <at> debbugs.gnu.org; Fri, 11 May 2018 05:18:35 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_40 autolearn=disabled
 version=3.3.2
Received: from fencepost.gnu.org ([2001:4830:134:3::e]:50597)
 by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@HIDDEN>)
 id 1fH4Bs-0006Iu-GI; Fri, 11 May 2018 05:18:16 -0400
Received: from [176.228.60.248] (port=2058 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <eliz@HIDDEN>)
 id 1fH4Br-0000OE-TL; Fri, 11 May 2018 05:18:16 -0400
Date: Fri, 11 May 2018 12:18:13 +0300
Message-Id: <83vabuo8iy.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Kenichi Handa <handa@HIDDEN>
In-reply-to: <83h8nmsasr.fsf@HIDDEN> (message from Eli Zaretskii on Sat, 05
 May 2018 12:37:24 +0300)
Subject: Re: bug#31149: 27.0.50;
 (gui-get-selection nil 'text/html) returns mis-decoded text
References: <jwv36zyhlmp.fsf@HIDDEN> <83vacu47sm.fsf@HIDDEN>
 <83zi1sv5j5.fsf@HIDDEN> <83h8nmsasr.fsf@HIDDEN>
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2001:4830:134:3::e
X-Spam-Score: -5.0 (-----)
X-Debbugs-Envelope-To: 31149
Cc: larsi@HIDDEN, 31149 <at> debbugs.gnu.org, monnier@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Reply-To: Eli Zaretskii <eliz@HIDDEN>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -6.0 (------)

Ping! Ping! Ping!

> Date: Sat, 05 May 2018 12:37:24 +0300
> From: Eli Zaretskii <eliz@HIDDEN>
> Cc: larsi@HIDDEN, 31149 <at> debbugs.gnu.org, monnier@HIDDEN
> 
> Ping! Ping!
> 
> > Date: Tue, 24 Apr 2018 21:11:10 +0300
> > From: Eli Zaretskii <eliz@HIDDEN>
> > Cc: larsi@HIDDEN, 31149 <at> debbugs.gnu.org, monnier@HIDDEN
> > 
> > Ping!
> > 
> > > Date: Sat, 14 Apr 2018 09:32:41 +0300
> > > From: Eli Zaretskii <eliz@HIDDEN>
> > > Cc: larsi@HIDDEN, 31149 <at> debbugs.gnu.org
> > > 
> > > > From: Stefan Monnier <monnier@HIDDEN>
> > > > Date: Fri, 13 Apr 2018 16:55:26 -0400
> > > > Cc: Lars Ingebrigtsen <larsi@HIDDEN>
> > > > 
> > > > (gui-get-selection nil 'text/html)
> > > > 
> > > > returns utf-16 text when the primary selection is owned by Mozilla, but
> > > > we decode it as latin-1 instead, so it looks like garbage.
> > > > 
> > > > I don't know why we're getting utf-16.  Is that what standards say it
> > > > should do?  If so, we should adjust our code (which currently knows
> > > > nothing about the `text/html` target-type).
> > > > 
> > > > As for why we decode it as latin-1, it's (under GNU/Linux; Lars may be
> > > > using something else because he's getting something with a `charset`
> > > > property which I don't get here) because:
> > > > - selection_data_to_lisp_data (in xselect.c) makes a unibyte string with
> > > >   the property `foreign-selection` set to `STRING` when the actual
> > > >   string type is not known (as opposed to COMPOUND-TEXT and
> > > >   UTF8-STRING, basically).
> > > > - in gui-get-selection we then have a mapping from `STRING` to
> > > >   `iso-8859-1` (which is apparently the right thing for the official
> > > >   `STRING` target-type in X11).
> > > > 
> > > > I can't figure out if/where these kinds of things about the X11
> > > > selection protocol is described, but at least in `xclip` they have
> > > > a hack specifically for this case:
> > > > 
> > > >     [...]
> > > >     if (html != None && sel_type == html) {
> > > > 	/* if the buffer contains UCS-2 (UTF-16), convert to
> > > > 	 * UTF-8.  Mozilla-based browsers do this for the
> > > > 	 * text/html target.
> > > > 	 */
> > > >     [...]
> > > > 
> > > > and according to the subsequent code it's not even always the
> > > > same endianness.
> > > > 
> > > > I don't know what is the difference between the `target-type` passed to
> > > > x-get-selection-internal and the `foreign-selection` property we get on
> > > > the returned string (they seem to be the same in my tests, except when
> > > > the type is not one of the known ones, and where we then force
> > > > `foreign-selection` to be `STRING`).
> > > 
> > > I hope Handa-san (CC'ed) could comment on this.
> > 




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#31149; Package emacs. Full text available.

Message received at 31149 <at> debbugs.gnu.org:


Received: (at 31149) by debbugs.gnu.org; 5 May 2018 09:37:57 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat May 05 05:37:57 2018
Received: from localhost ([127.0.0.1]:50241 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1fEtdZ-0000Tt-PY
	for submit <at> debbugs.gnu.org; Sat, 05 May 2018 05:37:57 -0400
Received: from eggs.gnu.org ([208.118.235.92]:52898)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1fEtdV-0000Td-9J
 for 31149 <at> debbugs.gnu.org; Sat, 05 May 2018 05:37:52 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <eliz@HIDDEN>) id 1fEtdM-0002hQ-93
 for 31149 <at> debbugs.gnu.org; Sat, 05 May 2018 05:37:44 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_05 autolearn=disabled
 version=3.3.2
Received: from fencepost.gnu.org ([2001:4830:134:3::e]:60729)
 by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@HIDDEN>)
 id 1fEtd7-0002cB-3g; Sat, 05 May 2018 05:37:25 -0400
Received: from [176.228.60.248] (port=3077 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <eliz@HIDDEN>)
 id 1fEtd6-0007YA-6i; Sat, 05 May 2018 05:37:24 -0400
Date: Sat, 05 May 2018 12:37:24 +0300
Message-Id: <83h8nmsasr.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Kenichi Handa <handa@HIDDEN>
In-reply-to: <83zi1sv5j5.fsf@HIDDEN> (message from Eli Zaretskii on Tue, 24
 Apr 2018 21:11:10 +0300)
Subject: Re: bug#31149: 27.0.50;
 (gui-get-selection nil 'text/html) returns mis-decoded text
References: <jwv36zyhlmp.fsf@HIDDEN> <83vacu47sm.fsf@HIDDEN>
 <83zi1sv5j5.fsf@HIDDEN>
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2001:4830:134:3::e
X-Spam-Score: -5.0 (-----)
X-Debbugs-Envelope-To: 31149
Cc: larsi@HIDDEN, 31149 <at> debbugs.gnu.org, monnier@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Reply-To: Eli Zaretskii <eliz@HIDDEN>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -6.0 (------)

Ping! Ping!

> Date: Tue, 24 Apr 2018 21:11:10 +0300
> From: Eli Zaretskii <eliz@HIDDEN>
> Cc: larsi@HIDDEN, 31149 <at> debbugs.gnu.org, monnier@HIDDEN
> 
> Ping!
> 
> > Date: Sat, 14 Apr 2018 09:32:41 +0300
> > From: Eli Zaretskii <eliz@HIDDEN>
> > Cc: larsi@HIDDEN, 31149 <at> debbugs.gnu.org
> > 
> > > From: Stefan Monnier <monnier@HIDDEN>
> > > Date: Fri, 13 Apr 2018 16:55:26 -0400
> > > Cc: Lars Ingebrigtsen <larsi@HIDDEN>
> > > 
> > > (gui-get-selection nil 'text/html)
> > > 
> > > returns utf-16 text when the primary selection is owned by Mozilla, but
> > > we decode it as latin-1 instead, so it looks like garbage.
> > > 
> > > I don't know why we're getting utf-16.  Is that what standards say it
> > > should do?  If so, we should adjust our code (which currently knows
> > > nothing about the `text/html` target-type).
> > > 
> > > As for why we decode it as latin-1, it's (under GNU/Linux; Lars may be
> > > using something else because he's getting something with a `charset`
> > > property which I don't get here) because:
> > > - selection_data_to_lisp_data (in xselect.c) makes a unibyte string with
> > >   the property `foreign-selection` set to `STRING` when the actual
> > >   string type is not known (as opposed to COMPOUND-TEXT and
> > >   UTF8-STRING, basically).
> > > - in gui-get-selection we then have a mapping from `STRING` to
> > >   `iso-8859-1` (which is apparently the right thing for the official
> > >   `STRING` target-type in X11).
> > > 
> > > I can't figure out if/where these kinds of things about the X11
> > > selection protocol is described, but at least in `xclip` they have
> > > a hack specifically for this case:
> > > 
> > >     [...]
> > >     if (html != None && sel_type == html) {
> > > 	/* if the buffer contains UCS-2 (UTF-16), convert to
> > > 	 * UTF-8.  Mozilla-based browsers do this for the
> > > 	 * text/html target.
> > > 	 */
> > >     [...]
> > > 
> > > and according to the subsequent code it's not even always the
> > > same endianness.
> > > 
> > > I don't know what is the difference between the `target-type` passed to
> > > x-get-selection-internal and the `foreign-selection` property we get on
> > > the returned string (they seem to be the same in my tests, except when
> > > the type is not one of the known ones, and where we then force
> > > `foreign-selection` to be `STRING`).
> > 
> > I hope Handa-san (CC'ed) could comment on this.
> 
> 
> 
> 




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#31149; Package emacs. Full text available.

Message received at 31149 <at> debbugs.gnu.org:


Received: (at 31149) by debbugs.gnu.org; 24 Apr 2018 18:11:39 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Apr 24 14:11:39 2018
Received: from localhost ([127.0.0.1]:38410 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1fB2Pg-0006gD-3a
	for submit <at> debbugs.gnu.org; Tue, 24 Apr 2018 14:11:39 -0400
Received: from eggs.gnu.org ([208.118.235.92]:49775)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1fB2Pb-0006fv-6d
 for 31149 <at> debbugs.gnu.org; Tue, 24 Apr 2018 14:11:34 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <eliz@HIDDEN>) id 1fB2PS-0004pU-Ao
 for 31149 <at> debbugs.gnu.org; Tue, 24 Apr 2018 14:11:26 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_40 autolearn=disabled
 version=3.3.2
Received: from fencepost.gnu.org ([2001:4830:134:3::e]:41972)
 by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@HIDDEN>)
 id 1fB2PN-0004nA-HI; Tue, 24 Apr 2018 14:11:17 -0400
Received: from [176.228.60.248] (port=2228 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <eliz@HIDDEN>)
 id 1fB2PM-0008Lk-UD; Tue, 24 Apr 2018 14:11:17 -0400
Date: Tue, 24 Apr 2018 21:11:10 +0300
Message-Id: <83zi1sv5j5.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Kenichi Handa <handa@HIDDEN>
In-reply-to: <83vacu47sm.fsf@HIDDEN> (message from Eli Zaretskii on Sat, 14
 Apr 2018 09:32:41 +0300)
Subject: Re: bug#31149: 27.0.50;
 (gui-get-selection nil 'text/html) returns mis-decoded text
References: <jwv36zyhlmp.fsf@HIDDEN> <83vacu47sm.fsf@HIDDEN>
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2001:4830:134:3::e
X-Spam-Score: -5.0 (-----)
X-Debbugs-Envelope-To: 31149
Cc: larsi@HIDDEN, 31149 <at> debbugs.gnu.org, monnier@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Reply-To: Eli Zaretskii <eliz@HIDDEN>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -6.0 (------)

Ping!

> Date: Sat, 14 Apr 2018 09:32:41 +0300
> From: Eli Zaretskii <eliz@HIDDEN>
> Cc: larsi@HIDDEN, 31149 <at> debbugs.gnu.org
> 
> > From: Stefan Monnier <monnier@HIDDEN>
> > Date: Fri, 13 Apr 2018 16:55:26 -0400
> > Cc: Lars Ingebrigtsen <larsi@HIDDEN>
> > 
> > (gui-get-selection nil 'text/html)
> > 
> > returns utf-16 text when the primary selection is owned by Mozilla, but
> > we decode it as latin-1 instead, so it looks like garbage.
> > 
> > I don't know why we're getting utf-16.  Is that what standards say it
> > should do?  If so, we should adjust our code (which currently knows
> > nothing about the `text/html` target-type).
> > 
> > As for why we decode it as latin-1, it's (under GNU/Linux; Lars may be
> > using something else because he's getting something with a `charset`
> > property which I don't get here) because:
> > - selection_data_to_lisp_data (in xselect.c) makes a unibyte string with
> >   the property `foreign-selection` set to `STRING` when the actual
> >   string type is not known (as opposed to COMPOUND-TEXT and
> >   UTF8-STRING, basically).
> > - in gui-get-selection we then have a mapping from `STRING` to
> >   `iso-8859-1` (which is apparently the right thing for the official
> >   `STRING` target-type in X11).
> > 
> > I can't figure out if/where these kinds of things about the X11
> > selection protocol is described, but at least in `xclip` they have
> > a hack specifically for this case:
> > 
> >     [...]
> >     if (html != None && sel_type == html) {
> > 	/* if the buffer contains UCS-2 (UTF-16), convert to
> > 	 * UTF-8.  Mozilla-based browsers do this for the
> > 	 * text/html target.
> > 	 */
> >     [...]
> > 
> > and according to the subsequent code it's not even always the
> > same endianness.
> > 
> > I don't know what is the difference between the `target-type` passed to
> > x-get-selection-internal and the `foreign-selection` property we get on
> > the returned string (they seem to be the same in my tests, except when
> > the type is not one of the known ones, and where we then force
> > `foreign-selection` to be `STRING`).
> 
> I hope Handa-san (CC'ed) could comment on this.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#31149; Package emacs. Full text available.

Message received at 31149 <at> debbugs.gnu.org:


Received: (at 31149) by debbugs.gnu.org; 14 Apr 2018 06:33:18 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Apr 14 02:33:18 2018
Received: from localhost ([127.0.0.1]:51287 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1f7EkN-00047B-6D
	for submit <at> debbugs.gnu.org; Sat, 14 Apr 2018 02:33:18 -0400
Received: from eggs.gnu.org ([208.118.235.92]:56266)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1f7EkH-00046u-SC
 for 31149 <at> debbugs.gnu.org; Sat, 14 Apr 2018 02:33:13 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <eliz@HIDDEN>) id 1f7Ek8-0004Qy-RC
 for 31149 <at> debbugs.gnu.org; Sat, 14 Apr 2018 02:33:04 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_05 autolearn=disabled
 version=3.3.2
Received: from fencepost.gnu.org ([2001:4830:134:3::e]:43386)
 by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@HIDDEN>)
 id 1f7Ejr-0004M7-M9; Sat, 14 Apr 2018 02:32:43 -0400
Received: from [176.228.60.248] (port=3659 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <eliz@HIDDEN>)
 id 1f7Ejq-0003dN-56; Sat, 14 Apr 2018 02:32:42 -0400
Date: Sat, 14 Apr 2018 09:32:41 +0300
Message-Id: <83vacu47sm.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Stefan Monnier <monnier@HIDDEN>, Kenichi Handa <handa@HIDDEN>
In-reply-to: <jwv36zyhlmp.fsf@HIDDEN> (message from Stefan Monnier
 on Fri, 13 Apr 2018 16:55:26 -0400)
Subject: Re: bug#31149: 27.0.50;
 (gui-get-selection nil 'text/html) returns mis-decoded text
References: <jwv36zyhlmp.fsf@HIDDEN>
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2001:4830:134:3::e
X-Spam-Score: -5.0 (-----)
X-Debbugs-Envelope-To: 31149
Cc: larsi@HIDDEN, 31149 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Reply-To: Eli Zaretskii <eliz@HIDDEN>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -6.0 (------)

> From: Stefan Monnier <monnier@HIDDEN>
> Date: Fri, 13 Apr 2018 16:55:26 -0400
> Cc: Lars Ingebrigtsen <larsi@HIDDEN>
> 
> (gui-get-selection nil 'text/html)
> 
> returns utf-16 text when the primary selection is owned by Mozilla, but
> we decode it as latin-1 instead, so it looks like garbage.
> 
> I don't know why we're getting utf-16.  Is that what standards say it
> should do?  If so, we should adjust our code (which currently knows
> nothing about the `text/html` target-type).
> 
> As for why we decode it as latin-1, it's (under GNU/Linux; Lars may be
> using something else because he's getting something with a `charset`
> property which I don't get here) because:
> - selection_data_to_lisp_data (in xselect.c) makes a unibyte string with
>   the property `foreign-selection` set to `STRING` when the actual
>   string type is not known (as opposed to COMPOUND-TEXT and
>   UTF8-STRING, basically).
> - in gui-get-selection we then have a mapping from `STRING` to
>   `iso-8859-1` (which is apparently the right thing for the official
>   `STRING` target-type in X11).
> 
> I can't figure out if/where these kinds of things about the X11
> selection protocol is described, but at least in `xclip` they have
> a hack specifically for this case:
> 
>     [...]
>     if (html != None && sel_type == html) {
> 	/* if the buffer contains UCS-2 (UTF-16), convert to
> 	 * UTF-8.  Mozilla-based browsers do this for the
> 	 * text/html target.
> 	 */
>     [...]
> 
> and according to the subsequent code it's not even always the
> same endianness.
> 
> I don't know what is the difference between the `target-type` passed to
> x-get-selection-internal and the `foreign-selection` property we get on
> the returned string (they seem to be the same in my tests, except when
> the type is not one of the known ones, and where we then force
> `foreign-selection` to be `STRING`).

I Hope Handa-san (CC'ed) could comment on this.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#31149; Package emacs. Full text available.

Message received at 31149 <at> debbugs.gnu.org:


Received: (at 31149) by debbugs.gnu.org; 13 Apr 2018 21:05:51 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Apr 13 17:05:51 2018
Received: from localhost ([127.0.0.1]:51002 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1f75tG-0002Df-Q7
	for submit <at> debbugs.gnu.org; Fri, 13 Apr 2018 17:05:50 -0400
Received: from hermes.netfonds.no ([80.91.224.195]:46133)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <larsi@HIDDEN>) id 1f75tE-0002DU-5s
 for 31149 <at> debbugs.gnu.org; Fri, 13 Apr 2018 17:05:49 -0400
Received: from 46.67.12.60.tmi.telenormobil.no ([46.67.12.60] helo=corrigan)
 by hermes.netfonds.no with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <larsi@HIDDEN>)
 id 1f75t7-0004Wl-1L; Fri, 13 Apr 2018 23:05:43 +0200
Received: from larsi by corrigan with local (Exim 4.89)
 (envelope-from <larsi@HIDDEN>)
 id 1f75t1-0002xg-2N; Fri, 13 Apr 2018 23:05:35 +0200
From: Lars Ingebrigtsen <larsi@HIDDEN>
To: Stefan Monnier <monnier@HIDDEN>
Subject: Re: bug#31149: 27.0.50;
 (gui-get-selection nil 'text/html) returns mis-decoded text
References: <jwv36zyhlmp.fsf@HIDDEN>
Date: Fri, 13 Apr 2018 23:05:34 +0200
In-Reply-To: <jwv36zyhlmp.fsf@HIDDEN> (Stefan Monnier's message of
 "Fri, 13 Apr 2018 16:55:26 -0400")
Message-ID: <871sfizujl.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 31149
Cc: 31149 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

Stefan Monnier <monnier@HIDDEN> writes:

> As for why we decode it as latin-1, it's (under GNU/Linux; Lars may be
> using something else because he's getting something with a `charset`
> property which I don't get here) because:

I'm also running under GNU/Linux -- it's the latest Debian (9, which
is...  stretch?), but not with Gnome.  Instead I'm using xfce -- I guess
Gnome could get involved with the selection stuff somehow.

Another data point: If I select some HTML in Chromium,
(gui-get-selection nil 'text/html) returns nil.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#31149; Package emacs. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 13 Apr 2018 20:55:43 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Apr 13 16:55:43 2018
Received: from localhost ([127.0.0.1]:50986 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1f75jT-0001yN-5M
	for submit <at> debbugs.gnu.org; Fri, 13 Apr 2018 16:55:43 -0400
Received: from eggs.gnu.org ([208.118.235.92]:52185)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <monnier@HIDDEN>) id 1f75jQ-0001yA-WA
 for submit <at> debbugs.gnu.org; Fri, 13 Apr 2018 16:55:41 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <monnier@HIDDEN>) id 1f75jK-0002TK-6d
 for submit <at> debbugs.gnu.org; Fri, 13 Apr 2018 16:55:35 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled
 version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:56963)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <monnier@HIDDEN>)
 id 1f75jK-0002TD-2k
 for submit <at> debbugs.gnu.org; Fri, 13 Apr 2018 16:55:34 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:43128)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <monnier@HIDDEN>) id 1f75jI-0000pC-8d
 for bug-gnu-emacs@HIDDEN; Fri, 13 Apr 2018 16:55:33 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <monnier@HIDDEN>) id 1f75jE-0002Qq-5V
 for bug-gnu-emacs@HIDDEN; Fri, 13 Apr 2018 16:55:32 -0400
Received: from pruche.dit.umontreal.ca ([132.204.246.22]:42346)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <monnier@HIDDEN>) id 1f75jD-0002Q6-UD
 for bug-gnu-emacs@HIDDEN; Fri, 13 Apr 2018 16:55:28 -0400
Received: from ceviche.home (lechon.iro.umontreal.ca [132.204.27.242])
 by pruche.dit.umontreal.ca (8.14.7/8.14.1) with ESMTP id w3DKtQDG024538
 for <bug-gnu-emacs@HIDDEN>; Fri, 13 Apr 2018 16:55:26 -0400
Received: by ceviche.home (Postfix, from userid 20848)
 id 37EBF6639A; Fri, 13 Apr 2018 16:55:26 -0400 (EDT)
From: Stefan Monnier <monnier@HIDDEN>
To: bug-gnu-emacs@HIDDEN
Subject: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text
X-Debbugs-Cc: Lars Ingebrigtsen <larsi@HIDDEN>
Date: Fri, 13 Apr 2018 16:55:26 -0400
Message-ID: <jwv36zyhlmp.fsf@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain
X-NAI-Spam-Flag: NO
X-NAI-Spam-Level: 
X-NAI-Spam-Threshold: 5
X-NAI-Spam-Score: 0.9
X-NAI-Spam-Rules: 5 Rules triggered
 BEC_TRC1=0.4, BEC_TRC1_W_GEN_SPAM_FEATRE=0.4, GEN_SPAM_FEATRE=0.1, 
 EDT_SA_DN_PASS=0, RV6264=0
X-NAI-Spam-Version: 2.3.0.9418 : core <6264> : inlines <6560> : streams
 <1783938> : uri <2625014>
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
 recognized.
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -4.0 (----)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -5.0 (-----)

Package: Emacs
Version: 27.0.50


(gui-get-selection nil 'text/html)

returns utf-16 text when the primary selection is owned by Mozilla, but
we decode it as latin-1 instead, so it looks like garbage.

I don't know why we're getting utf-16.  Is that what standards say it
should do?  If so, we should adjust our code (which currently knows
nothing about the `text/html` target-type).

As for why we decode it as latin-1, it's (under GNU/Linux; Lars may be
using something else because he's getting something with a `charset`
property which I don't get here) because:
- selection_data_to_lisp_data (in xselect.c) makes a unibyte string with
  the property `foreign-selection` set to `STRING` when the actual
  string type is not known (as opposed to COMPOUND-TEXT and
  UTF8-STRING, basically).
- in gui-get-selection we then have a mapping from `STRING` to
  `iso-8859-1` (which is apparently the right thing for the official
  `STRING` target-type in X11).

I can't figure out if/where these kinds of things about the X11
selection protocol is described, but at least in `xclip` they have
a hack specifically for this case:

    [...]
    if (html != None && sel_type == html) {
	/* if the buffer contains UCS-2 (UTF-16), convert to
	 * UTF-8.  Mozilla-based browsers do this for the
	 * text/html target.
	 */
    [...]

and according to the subsequent code it's not even always the
same endianness.

I don't know what is the difference between the `target-type` passed to
x-get-selection-internal and the `foreign-selection` property we get on
the returned string (they seem to be the same in my tests, except when
the type is not one of the known ones, and where we then force
`foreign-selection` to be `STRING`).


        Stefan



In GNU Emacs 27.0.50 (build 1, i686-pc-linux-gnu, GTK+ Version 2.24.32)
 of 2018-03-23 built on ceviche
Repository revision: ef4cd3805771e2cccd395d0f0b35f56816940508
Windowing system distributor 'The X.Org Foundation', version 11.0.11906000
System Description: Debian GNU/Linux buster/sid

Recent messages:
Saving file /home/monnier/src/emacs/work/src/xselect.c...
Wrote /home/monnier/src/emacs/work/src/xselect.c
Mark set
user-error: Minibuffer window is not active
Mark set
Mark saved where search started
Mark set
Making completion list... [2 times]
Quit [2 times]
Mark set

Configured using:
 'configure -C --enable-checking --with-modules --enable-check-lisp-object-type
 'CFLAGS=-Wall -g3 -Og -Wno-pointer-sign'
 PKG_CONFIG_PATH=/home/monnier/lib/pkgconfig'

Configured features:
XPM JPEG TIFF GIF PNG RSVG SOUND GPM DBUS GSETTINGS NOTIFY GNUTLS
LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS GTK2 X11
MODULES THREADS

Important settings:
  value of $LANG: fr_CH.UTF-8
  locale-coding-system: utf-8-unix

Major mode: InactiveMinibuffer

Minor modes in effect:
  csv-field-index-mode: t
  shell-dirtrack-mode: t
  diff-auto-refine-mode: t
  electric-pair-mode: t
  global-reveal-mode: t
  reveal-mode: t
  auto-insert-mode: t
  savehist-mode: t
  minibuffer-electric-default-mode: t
  global-compact-docstrings-mode: t
  url-handler-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  global-prettify-symbols-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
/home/monnier/src/emacs/elpa/packages/svg/svg hides /home/monnier/src/emacs/work/lisp/svg
/home/monnier/src/emacs/elpa/packages/ada-mode/ada-xref hides /home/monnier/src/emacs/work/lisp/progmodes/ada-xref
/home/monnier/src/emacs/elpa/packages/ada-mode/ada-mode hides /home/monnier/src/emacs/work/lisp/progmodes/ada-mode
/home/monnier/src/emacs/elpa/packages/ada-mode/ada-stmt hides /home/monnier/src/emacs/work/lisp/progmodes/ada-stmt
/home/monnier/src/emacs/elpa/packages/ada-mode/ada-prj hides /home/monnier/src/emacs/work/lisp/progmodes/ada-prj
/home/monnier/src/emacs/elpa/packages/hyperbole/set hides /home/monnier/src/emacs/work/lisp/emacs-lisp/set
/home/monnier/src/emacs/elpa/packages/landmark/landmark hides /home/monnier/src/emacs/work/lisp/obsolete/landmark
/home/monnier/src/emacs/elpa/packages/crisp/crisp hides /home/monnier/src/emacs/work/lisp/obsolete/crisp

Features:
(mule-diag csv-mode mailcap reporter debian-bug debian-el-loaddefs
image-file iimage skeleton html5-schema rng-xsd xsd-regexp rng-cmpct
rng-nxml nxml-mode nxml-outln nxml-rap sgml-mode dom reftex-dcr reftex
reftex-loaddefs reftex-vars latexenc sort mail-extr emacsbug tildify rst
rng-valid refer refer-to-bibtex refbib printing picture nroff-mode
enriched ebnf2ps ps-print ps-print-loaddefs ps-def lpr delim-col
bib-mode view cal-china lunar solar cal-dst cal-bahai cal-islam
cal-hebrew holidays hol-loaddefs cal-french diary-lib diary-loaddefs
cal-move battery log-view srecode/document semantic/doc srecode/semantic
semantic/senator semantic/decorate semantic/ctxt semantic/format
srecode/extract srecode/insert srecode/filters srecode/find srecode/map
srecode/ctxt semantic/tag-ls semantic/find srecode/compile
semantic/util-modes semantic/util semantic semantic/tag semantic/lex
semantic/fw srecode/args ede/speedbar ede/files ede ede/detect ede/base
ede/auto ede/source eieio-speedbar eieio-custom cedet srecode/dictionary
srecode/table eieio-base srecode mode-local informat texinfo tex-mode
vc-dir grep rect gdb-mi bindat gud ffap cl-print ox-odt rng-loc rng-uri
rng-parse rng-match rng-dt rng-util rng-pttrn nxml-parse nxml-ns
nxml-enc xmltok nxml-util ox-latex ox-icalendar ox-html table ox-ascii
ox-publish ox org-protocol org-mouse org-mobile org-agenda org-indent
org-feed org-crypt org-capture org-attach org-id org-rmail org-mhe
org-irc org-info org-gnus nnir gnus-sum gnus-group gnus-undo gnus-start
gnus-cloud nnimap nnmail mail-source tls gnutls utf7 netrc nnoo
parse-time gnus-spec gnus-int gnus-range gnus-win gnus nnheader
org-docview org-bibtex bibtex org-bbdb org-w3m org-element avl-tree
generator org org-macro org-footnote org-pcomplete org-list org-faces
org-entities org-version ob-emacs-lisp ob ob-tangle org-src ob-ref
ob-lob ob-table ob-keys ob-exp ob-comint ob-core ob-eval org-compat
org-macs org-loaddefs cal-menu calendar cal-loaddefs autorevert
filenotify doc-view jka-compr image-mode vc-bzr vc-src vc-sccs vc-svn
vc-cvs vc-rcs dabbrev log-edit message sendmail rmc puny dired
dired-loaddefs format-spec rfc822 mml mml-sec gnus-util rmail
rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse rfc2231 rfc2047
rfc2045 mm-util ietf-drums mail-prsvr mailabbrev mail-utils mailheader
pcvs-util bug-reference add-log sh-script make-mode autoload shell
pcomplete pulse etags xref project epa-file epa derived epg sm-c-mode
smie whitespace misearch multi-isearch eieio-opt speedbar sb-image
ezimage dframe cl-extra help-fns radix-tree executable copyright
lisp-mnt xscheme unsafep trace testcover shadow scheme re-builder
profiler inf-lisp ielm gmm-utils ert pp find-func ewoc debug elp edebug
cl-indent cus-edit cus-start cus-load wid-edit vc vc-dispatcher
smerge-mode vc-git diff-mode filecache server time-date flymake-proc
flymake compile comint ansi-color ring warnings noutline outline
easy-mmode flyspell ispell checkdoc thingatpt help-mode load-dir
elec-pair reveal autoinsert proof-site proof-autoloads cl pg-vars
savehist minibuf-eldef disp-table compact-docstrings cl-seq inline
kotl-autoloads advice info realgud-recursive-autoloads finder-inf
url-auth package easymenu epg-config url-handlers url-parse auth-source
eieio eieio-core cl-macs eieio-loaddefs password-cache json map url-vars
seq byte-opt gv bytecomp byte-compile cconv cl-loaddefs cl-lib mule-util
tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type
mwheel term/x-win x-win term/common-win x-dnd tool-bar dnd fontset image
regexp-opt fringe tabulated-list replace newcomment text-mode elisp-mode
lisp-mode prog-mode register page menu-bar rfn-eshadow isearch timer
select scroll-bar mouse jit-lock font-lock syntax font-core
term/tty-colors frame cl-generic cham georgian utf-8-lang misc-lang
vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932
hebrew greek romanian slovak czech european ethiopic indian cyrillic
chinese composite charscript charprop case-table epa-hook jka-cmpr-hook
help simple abbrev obarray minibuffer cl-preloaded nadvice loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote dbusbind inotify dynamic-setting system-font-setting
font-render-setting move-toolbar gtk x-toolkit x multi-tty
make-network-process emacs)

Memory information:
((conses 8 904625 146270)
 (symbols 24 56914 156) (miscs 20 15608 1993) (strings 16 269351 14086)
 (string-bytes 1 8339699)
 (vectors 12 109056) (vector-slots 4 3333709 279700) (floats 8 1341 1410)
 (intervals 28 57426 412)
 (buffers 536 153))




Acknowledgement sent to Stefan Monnier <monnier@HIDDEN>:
New bug report received and forwarded. Copy sent to larsi@HIDDEN, bug-gnu-emacs@HIDDEN. Full text available.
Report forwarded to larsi@HIDDEN, bug-gnu-emacs@HIDDEN:
bug#31149; Package emacs. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Sun, 29 Sep 2019 08:45:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.