GNU bug report logs - #50560
28.0.50; 'insert-file-contents-literally' on multibyte buffers

Previous Next

Package: emacs;

Reported by: Augusto Stoffel <arstoffel <at> gmail.com>

Date: Mon, 13 Sep 2021 06:59:02 UTC

Severity: normal

Found in version 28.0.50

Fixed in version 28.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 50560 in the body.
You can then email your comments to 50560 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#50560; Package emacs. (Mon, 13 Sep 2021 06:59:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Augusto Stoffel <arstoffel <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Mon, 13 Sep 2021 06:59:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Augusto Stoffel <arstoffel <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 28.0.50; 'insert-file-contents-literally' on multibyte buffers
Date: Mon, 13 Sep 2021 08:58:06 +0200
I thought 'insert-file-contents-literally' literally just inserted the
file contents, as bytes, but I noticed that in the following code

    (create-image
     (with-temp-buffer
       (set-buffer-multibyte nil)
       (insert-file-contents-literally "picure.jpg")
       (buffer-substring-no-properties (point-min) (point-max)))
     nil t)

the call to 'set-buffer-multibyte' is really essential.

Is this intended?  If so, I think a note in the doctring is due.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#50560; Package emacs. (Mon, 13 Sep 2021 07:12:02 GMT) Full text and rfc822 format available.

Message #8 received at 50560 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Augusto Stoffel <arstoffel <at> gmail.com>
Cc: 50560 <at> debbugs.gnu.org
Subject: Re: bug#50560: 28.0.50; 'insert-file-contents-literally' on
 multibyte buffers
Date: Mon, 13 Sep 2021 09:10:59 +0200
Augusto Stoffel <arstoffel <at> gmail.com> writes:

> I thought 'insert-file-contents-literally' literally just inserted the
> file contents, as bytes, but I noticed that in the following code
>
>     (create-image
>      (with-temp-buffer
>        (set-buffer-multibyte nil)
>        (insert-file-contents-literally "picure.jpg")
>        (buffer-substring-no-properties (point-min) (point-max)))
>      nil t)
>
> the call to 'set-buffer-multibyte' is really essential.

In what way?  If the first byte in a binary file is #xff, inserting the
file literally in a buffer and saying `(following-char)' on the first
character in the buffer will say #xff.

But, yes, when dealing with octet streams, it's a lot less confusing if
you're using unibyte buffers (and strings).

> Is this intended?  If so, I think a note in the doctring is due.

The doc string doesn't say anything about bytes, so I think that's an
interpretation on your side.

`insert-file-contents-literally' does insert "literally" -- but the byte
contents of the internal buffer structure can't be violated (emacs uses
utf-8 (plus extensions) for multibyte buffers).

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#50560; Package emacs. (Mon, 13 Sep 2021 07:18:01 GMT) Full text and rfc822 format available.

Message #11 received at 50560 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Augusto Stoffel <arstoffel <at> gmail.com>
Cc: 50560 <at> debbugs.gnu.org
Subject: Re: bug#50560: 28.0.50; 'insert-file-contents-literally' on
 multibyte buffers
Date: Mon, 13 Sep 2021 09:16:59 +0200
Lars Ingebrigtsen <larsi <at> gnus.org> writes:

> In what way?  If the first byte in a binary file is #xff, inserting the
> file literally in a buffer and saying `(following-char)' on the first
> character in the buffer will say #xff.

Sorry, I meant `(get-byte (point))'.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#50560; Package emacs. (Mon, 13 Sep 2021 08:14:02 GMT) Full text and rfc822 format available.

Message #14 received at 50560 <at> debbugs.gnu.org (full text, mbox):

From: Augusto Stoffel <arstoffel <at> gmail.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 50560 <at> debbugs.gnu.org
Subject: Re: bug#50560: 28.0.50; 'insert-file-contents-literally' on
 multibyte buffers
Date: Mon, 13 Sep 2021 10:13:23 +0200
On Mon, 13 Sep 2021 at 09:10, Lars Ingebrigtsen <larsi <at> gnus.org> wrote:

> Augusto Stoffel <arstoffel <at> gmail.com> writes:
>
>> I thought 'insert-file-contents-literally' literally just inserted the
>> file contents, as bytes, but I noticed that in the following code
>>
>>     (create-image
>>      (with-temp-buffer
>>        (set-buffer-multibyte nil)
>>        (insert-file-contents-literally "picure.jpg")
>>        (buffer-substring-no-properties (point-min) (point-max)))
>>      nil t)
>>
>> the call to 'set-buffer-multibyte' is really essential.
>
> In what way?  If the first byte in a binary file is #xff, inserting the
> file literally in a buffer and saying `(following-char)' on the first
> character in the buffer will say #xff.
>
> But, yes, when dealing with octet streams, it's a lot less confusing if
> you're using unibyte buffers (and strings).
>
>> Is this intended?  If so, I think a note in the doctring is due.
>
> The doc string doesn't say anything about bytes, so I think that's an
> interpretation on your side.
>
> `insert-file-contents-literally' does insert "literally" -- but the byte
> contents of the internal buffer structure can't be violated (emacs uses
> utf-8 (plus extensions) for multibyte buffers).

Ah, sure, there is no coding _conversion_, but the bytes are still
interpreted according to the buffer's coding system.

I guess that's obvious in hindsight.  Still, reading the bytes from a
file is slightly trickier than it might seem, so there could be a word
of caution somewhere.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#50560; Package emacs. (Mon, 13 Sep 2021 08:20:01 GMT) Full text and rfc822 format available.

Message #17 received at 50560 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Augusto Stoffel <arstoffel <at> gmail.com>
Cc: 50560 <at> debbugs.gnu.org
Subject: Re: bug#50560: 28.0.50; 'insert-file-contents-literally' on
 multibyte buffers
Date: Mon, 13 Sep 2021 10:19:26 +0200
Augusto Stoffel <arstoffel <at> gmail.com> writes:

>> `insert-file-contents-literally' does insert "literally" -- but the byte
>> contents of the internal buffer structure can't be violated (emacs uses
>> utf-8 (plus extensions) for multibyte buffers).
>
> Ah, sure, there is no coding _conversion_, but the bytes are still
> interpreted according to the buffer's coding system.

No, quite the opposite -- `insert-file-contents-literally' inserts the
octets from the file in a way that makes them not be interpreted as
characters:  You end up with a buffer where each point in the buffer has
something that represents one octet.  (In reality, there's usually more
than one byte "in the background", since it takes several bytes to
represent an octet like #x90 in a multibyte buffer.)

> I guess that's obvious in hindsight.  Still, reading the bytes from a
> file is slightly trickier than it might seem, so there could be a word
> of caution somewhere.

I think this is all covered in the lispref manual.  It's a very
complicated and confusing subject, and I don't think this docstring is
the place to get into it.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#50560; Package emacs. (Mon, 13 Sep 2021 08:44:01 GMT) Full text and rfc822 format available.

Message #20 received at 50560 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Martín <mardani29 <at> yahoo.es>
To: Augusto Stoffel <arstoffel <at> gmail.com>
Cc: 50560 <at> debbugs.gnu.org
Subject: Re: bug#50560: 28.0.50; 'insert-file-contents-literally' on
 multibyte buffers
Date: Mon, 13 Sep 2021 10:42:51 +0200
Augusto Stoffel <arstoffel <at> gmail.com> writes:

> I thought 'insert-file-contents-literally' literally just inserted the
> file contents, as bytes, but I noticed that in the following code
>
>     (create-image
>      (with-temp-buffer
>        (set-buffer-multibyte nil)
>        (insert-file-contents-literally "picure.jpg")
>        (buffer-substring-no-properties (point-min) (point-max)))
>      nil t)
>
> the call to 'set-buffer-multibyte' is really essential.
>
> Is this intended?  If so, I think a note in the doctring is due.

It is intended, and the source of confusion may be the apparently
symmetric `find-file-literally`, which _does_ make the buffer unibyte
before filling the new buffer with the contents from a file (and
documents this behavior).

But if you think about it, it makes sense that
`insert-file-contents-literally` does not set the buffer as unibyte,
because it's intended for programmatic cases where you insert the
content inside a buffer that may already have other content, so making
the buffer unibyte unconditionally may cause unexpected results.

So yeah, perhaps we can add a small sentence that clarifies this
behavior.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#50560; Package emacs. (Mon, 13 Sep 2021 11:53:01 GMT) Full text and rfc822 format available.

Message #23 received at 50560 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Augusto Stoffel <arstoffel <at> gmail.com>
Cc: 50560 <at> debbugs.gnu.org
Subject: Re: bug#50560: 28.0.50;
 'insert-file-contents-literally' on multibyte buffers
Date: Mon, 13 Sep 2021 14:52:00 +0300
> From: Augusto Stoffel <arstoffel <at> gmail.com>
> Date: Mon, 13 Sep 2021 08:58:06 +0200
> 
> I thought 'insert-file-contents-literally' literally just inserted the
> file contents, as bytes, but I noticed that in the following code
> 
>     (create-image
>      (with-temp-buffer
>        (set-buffer-multibyte nil)
>        (insert-file-contents-literally "picure.jpg")
>        (buffer-substring-no-properties (point-min) (point-max)))
>      nil t)
> 
> the call to 'set-buffer-multibyte' is really essential.

It is only essential for some very specific uses of the resulting
buffer, but definitely not for all.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#50560; Package emacs. (Mon, 13 Sep 2021 12:06:02 GMT) Full text and rfc822 format available.

Message #26 received at 50560 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Daniel Martín <mardani29 <at> yahoo.es>
Cc: arstoffel <at> gmail.com, 50560 <at> debbugs.gnu.org
Subject: Re: bug#50560: 28.0.50;
 'insert-file-contents-literally' on multibyte buffers
Date: Mon, 13 Sep 2021 15:05:20 +0300
> Cc: 50560 <at> debbugs.gnu.org
> Date: Mon, 13 Sep 2021 10:42:51 +0200
> From:  Daniel Martín via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> 
> So yeah, perhaps we can add a small sentence that clarifies this
> behavior.

What kind of sentence would you like to add there?  IME, this stuff
can rarely be explained by small sentences ;-)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#50560; Package emacs. (Mon, 13 Sep 2021 12:45:01 GMT) Full text and rfc822 format available.

Message #29 received at 50560 <at> debbugs.gnu.org (full text, mbox):

From: Augusto Stoffel <arstoffel <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 50560 <at> debbugs.gnu.org
Subject: Re: bug#50560: 28.0.50; 'insert-file-contents-literally' on
 multibyte buffers
Date: Mon, 13 Sep 2021 14:44:24 +0200
On Mon, 13 Sep 2021 at 14:52, Eli Zaretskii <eliz <at> gnu.org> wrote:

>> From: Augusto Stoffel <arstoffel <at> gmail.com>
>> Date: Mon, 13 Sep 2021 08:58:06 +0200
>> 
>> I thought 'insert-file-contents-literally' literally just inserted the
>> file contents, as bytes, but I noticed that in the following code
>> 
>>     (create-image
>>      (with-temp-buffer
>>        (set-buffer-multibyte nil)
>>        (insert-file-contents-literally "picure.jpg")
>>        (buffer-substring-no-properties (point-min) (point-max)))
>>      nil t)
>> 
>> the call to 'set-buffer-multibyte' is really essential.
>
> It is only essential for some very specific uses of the resulting
> buffer, but definitely not for all.

That's a good point.  Maybe the issue is actually with 'create-image',
which seems to only work correctly when the data is passed as a unibyte
string, but gives no warning if you pass a multibyte one.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#50560; Package emacs. (Mon, 13 Sep 2021 13:19:01 GMT) Full text and rfc822 format available.

Message #32 received at 50560 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: arstoffel <at> gmail.com, 50560 <at> debbugs.gnu.org,
 Daniel Martín <mardani29 <at> yahoo.es>
Subject: Re: bug#50560: 28.0.50; 'insert-file-contents-literally' on
 multibyte buffers
Date: Mon, 13 Sep 2021 15:18:00 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

>> So yeah, perhaps we can add a small sentence that clarifies this
>> behavior.
>
> What kind of sentence would you like to add there?  IME, this stuff
> can rarely be explained by small sentences ;-)

I've added a paragraph to the doc string mentioning that there might be
issues, but referring the user to `(elisp)Character Codes'.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




bug marked as fixed in version 28.1, send any further explanations to 50560 <at> debbugs.gnu.org and Augusto Stoffel <arstoffel <at> gmail.com> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Mon, 13 Sep 2021 13:19:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#50560; Package emacs. (Mon, 13 Sep 2021 13:27:02 GMT) Full text and rfc822 format available.

Message #37 received at 50560 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Augusto Stoffel <arstoffel <at> gmail.com>
Cc: 50560 <at> debbugs.gnu.org
Subject: Re: bug#50560: 28.0.50; 'insert-file-contents-literally' on
 multibyte buffers
Date: Mon, 13 Sep 2021 16:26:10 +0300
> From: Augusto Stoffel <arstoffel <at> gmail.com>
> Cc: 50560 <at> debbugs.gnu.org
> Date: Mon, 13 Sep 2021 14:44:24 +0200
> 
> > It is only essential for some very specific uses of the resulting
> > buffer, but definitely not for all.
> 
> That's a good point.  Maybe the issue is actually with 'create-image',
> which seems to only work correctly when the data is passed as a unibyte
> string, but gives no warning if you pass a multibyte one.

Maybe we should have create-image convert the :data string to unibyte
if it isn't already so.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#50560; Package emacs. (Mon, 13 Sep 2021 21:39:01 GMT) Full text and rfc822 format available.

Message #40 received at 50560 <at> debbugs.gnu.org (full text, mbox):

From: Augusto Stoffel <arstoffel <at> gmail.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 50560 <at> debbugs.gnu.org,
 Daniel Martín <mardani29 <at> yahoo.es>
Subject: Re: bug#50560: 28.0.50; 'insert-file-contents-literally' on
 multibyte buffers
Date: Mon, 13 Sep 2021 23:37:51 +0200
On Mon, 13 Sep 2021 at 15:18, Lars Ingebrigtsen <larsi <at> gnus.org> wrote:

> Eli Zaretskii <eliz <at> gnu.org> writes:
>
>>> So yeah, perhaps we can add a small sentence that clarifies this
>>> behavior.
>>
>> What kind of sentence would you like to add there?  IME, this stuff
>> can rarely be explained by small sentences ;-)
>
> I've added a paragraph to the doc string mentioning that there might be
> issues, but referring the user to `(elisp)Character Codes'.

Thanks, I think it's a good clarification.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 12 Oct 2021 11:24:09 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 190 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.