GNU bug report logs - #35766
emacs saves utf-16 le xml files as utf-16 be

Previous Next

Package: emacs;

Reported by: J S <jszabo_98 <at> hotmail.com>

Date: Thu, 16 May 2019 17:58:01 UTC

Severity: normal

Merged with 8282, 8283

Fixed in version 27.1

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 35766 in the body.
You can then email your comments to 35766 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#35766; Package emacs. (Thu, 16 May 2019 17:58:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to J S <jszabo_98 <at> hotmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 16 May 2019 17:58:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: J S <jszabo_98 <at> hotmail.com>
To: "bug-gnu-emacs <at> gnu.org" <bug-gnu-emacs <at> gnu.org>
Subject: emacs saves utf-16 le xml files as utf-16 be
Date: Thu, 16 May 2019 17:11:21 +0000
[Message part 1 (text/plain, inline)]
Xml files with this tag are saved as utf-16 be by emacs, even if the file was originally utf-16 le.  Using "UTF-16LE" instead will break the encoding and remove the BOM.

<?xml version="1.0" encoding="UTF-16"?>
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35766; Package emacs. (Thu, 16 May 2019 18:23:02 GMT) Full text and rfc822 format available.

Message #8 received at 35766 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: J S <jszabo_98 <at> hotmail.com>
Cc: 35766 <at> debbugs.gnu.org
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be
Date: Thu, 16 May 2019 21:22:19 +0300
> From: J S <jszabo_98 <at> hotmail.com>
> Date: Thu, 16 May 2019 17:11:21 +0000
> 
> Xml files with this tag are saved as utf-16 be by emacs, even if the file was originally utf-16 le.  Using
> "UTF-16LE" instead will break the encoding and remove the BOM.
> 
> <?xml version="1.0" encoding="UTF-16"?>

Did you try using utf-16le-with-signature?

Or maybe I don't understand the scenario: would you please describe a
full reproduction recipe, starting from "emacs -Q"?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35766; Package emacs. (Thu, 16 May 2019 19:24:01 GMT) Full text and rfc822 format available.

Message #11 received at 35766 <at> debbugs.gnu.org (full text, mbox):

From: J S <jszabo_98 <at> hotmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: "35766 <at> debbugs.gnu.org" <35766 <at> debbugs.gnu.org>
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be
Date: Thu, 16 May 2019 19:21:34 +0000
[Message part 1 (text/plain, inline)]
Try saving this xml file and opening it again:

<?xml version="1.0" encoding="UTF-16LE"?>

________________________________
From: J S <jszabo_98 <at> hotmail.com>
Sent: Thursday, May 16, 2019 7:15 PM
To: Eli Zaretskii
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be

Try saving this xml file and opening it again:

<?xml version="1.0" encoding="UTF-16LE"?>


________________________________
From: Eli Zaretskii <eliz <at> gnu.org>
Sent: Thursday, May 16, 2019 6:22 PM
To: J S
Cc: 35766 <at> debbugs.gnu.org
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be

> From: J S <jszabo_98 <at> hotmail.com>
> Date: Thu, 16 May 2019 17:11:21 +0000
>
> Xml files with this tag are saved as utf-16 be by emacs, even if the file was originally utf-16 le.  Using
> "UTF-16LE" instead will break the encoding and remove the BOM.
>
> <?xml version="1.0" encoding="UTF-16"?>

Did you try using utf-16le-with-signature?

Or maybe I don't understand the scenario: would you please describe a
full reproduction recipe, starting from "emacs -Q"?

Thanks.
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35766; Package emacs. (Thu, 16 May 2019 20:58:01 GMT) Full text and rfc822 format available.

Message #14 received at 35766 <at> debbugs.gnu.org (full text, mbox):

From: J S <jszabo_98 <at> hotmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: "35766 <at> debbugs.gnu.org" <35766 <at> debbugs.gnu.org>
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be
Date: Thu, 16 May 2019 20:57:34 +0000
[Message part 1 (text/plain, inline)]
I should say that I'm using emacs for windows.  And it's preferring saving in big endian to little endian when this is the tag:

<?xml version="1.0" encoding="UTF-16"?>

________________________________
From: J S <jszabo_98 <at> hotmail.com>
Sent: Thursday, May 16, 2019 7:21 PM
To: Eli Zaretskii
Cc: 35766 <at> debbugs.gnu.org
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be

Try saving this xml file and opening it again:

<?xml version="1.0" encoding="UTF-16LE"?>

________________________________
From: J S <jszabo_98 <at> hotmail.com>
Sent: Thursday, May 16, 2019 7:15 PM
To: Eli Zaretskii
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be

Try saving this xml file and opening it again:

<?xml version="1.0" encoding="UTF-16LE"?>


________________________________
From: Eli Zaretskii <eliz <at> gnu.org>
Sent: Thursday, May 16, 2019 6:22 PM
To: J S
Cc: 35766 <at> debbugs.gnu.org
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be

> From: J S <jszabo_98 <at> hotmail.com>
> Date: Thu, 16 May 2019 17:11:21 +0000
>
> Xml files with this tag are saved as utf-16 be by emacs, even if the file was originally utf-16 le.  Using
> "UTF-16LE" instead will break the encoding and remove the BOM.
>
> <?xml version="1.0" encoding="UTF-16"?>

Did you try using utf-16le-with-signature?

Or maybe I don't understand the scenario: would you please describe a
full reproduction recipe, starting from "emacs -Q"?

Thanks.
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35766; Package emacs. (Fri, 17 May 2019 09:27:01 GMT) Full text and rfc822 format available.

Message #17 received at 35766 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: J S <jszabo_98 <at> hotmail.com>
Cc: 35766 <at> debbugs.gnu.org
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be
Date: Fri, 17 May 2019 12:26:34 +0300
> From: J S <jszabo_98 <at> hotmail.com>
> CC: "35766 <at> debbugs.gnu.org" <35766 <at> debbugs.gnu.org>
> Date: Thu, 16 May 2019 20:57:34 +0000
> 
> I should say that I'm using emacs for windows.  And it's preferring saving in big endian to little endian when
> this is the tag:
> 
> <?xml version="1.0" encoding="UTF-16"?>

This is the default, yes.  "C-h C utf-16 RET" says:

  UTF-16 (detect endian on decoding, use big endian on encoding with BOM).
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you want to encode in UTF-16LE, you need to tell Emacs to do this
explicitly:

  C-x RET c utf-16le-with-signature RET C-x C-s

> Try saving this xml file and opening it again:
> 
> <?xml version="1.0" encoding="UTF-16LE"?>

AFAIU, encoding="UTF-16LE" is invalid in XML.  If you see this
documented somewhere in XML docs, please tell me where it is
described.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35766; Package emacs. (Fri, 17 May 2019 11:27:01 GMT) Full text and rfc822 format available.

Message #20 received at 35766 <at> debbugs.gnu.org (full text, mbox):

From: J S <jszabo_98 <at> hotmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: "35766 <at> debbugs.gnu.org" <35766 <at> debbugs.gnu.org>
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be
Date: Fri, 17 May 2019 11:26:14 +0000
[Message part 1 (text/plain, inline)]
It would change color in emacs if encoding="UTF16-LE" were invalid.  It's hard to find the docs for it.  UTF-16LE is listed here:  http://help.eclipse.org/kepler/index.jsp?topic=%2Forg.eclipse.wst.xmleditor.doc.user%2Ftopics%2Fcxmlenc.html


________________________________
From: Eli Zaretskii <eliz <at> gnu.org>
Sent: Friday, May 17, 2019 9:26 AM
To: J S
Cc: 35766 <at> debbugs.gnu.org
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be

> From: J S <jszabo_98 <at> hotmail.com>
> CC: "35766 <at> debbugs.gnu.org" <35766 <at> debbugs.gnu.org>
> Date: Thu, 16 May 2019 20:57:34 +0000
>
> I should say that I'm using emacs for windows.  And it's preferring saving in big endian to little endian when
> this is the tag:
>
> <?xml version="1.0" encoding="UTF-16"?>

This is the default, yes.  "C-h C utf-16 RET" says:

  UTF-16 (detect endian on decoding, use big endian on encoding with BOM).
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you want to encode in UTF-16LE, you need to tell Emacs to do this
explicitly:

  C-x RET c utf-16le-with-signature RET C-x C-s

> Try saving this xml file and opening it again:
>
> <?xml version="1.0" encoding="UTF-16LE"?>

AFAIU, encoding="UTF-16LE" is invalid in XML.  If you see this
documented somewhere in XML docs, please tell me where it is
described.
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35766; Package emacs. (Fri, 17 May 2019 11:49:01 GMT) Full text and rfc822 format available.

Message #23 received at 35766 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> gmail.com>
To: J S <jszabo_98 <at> hotmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>,
 "35766 <at> debbugs.gnu.org" <35766 <at> debbugs.gnu.org>
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be
Date: Fri, 17 May 2019 07:48:30 -0400
J S <jszabo_98 <at> hotmail.com> writes:

> It would change color in emacs if encoding="UTF16-LE" were invalid.
> It's hard to find the docs for it.  UTF-16LE is listed here:
> http://help.eclipse.org/kepler/index.jsp?topic=%2Forg.eclipse.wst.xmleditor.doc.user%2Ftopics%2Fcxmlenc.html

A more official reference:

https://www.w3.org/TR/xml/#NT-EncName

    It is RECOMMENDED that character encodings registered (as charsets)
    with the Internet Assigned Numbers Authority [IANA-CHARSETS], other
    than those just listed, be referred to using their registered names;

[IANA-CHARSETS]: http://www.iana.org/assignments/character-sets/character-sets.xhtml

    UTF-16LE    1014    [RFC2781]   [RFC2781]   csUTF16LE





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35766; Package emacs. (Fri, 17 May 2019 15:36:02 GMT) Full text and rfc822 format available.

Message #26 received at 35766 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Noam Postavsky <npostavs <at> gmail.com>
Cc: jszabo_98 <at> hotmail.com, 35766 <at> debbugs.gnu.org
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be
Date: Fri, 17 May 2019 18:34:48 +0300
> From: Noam Postavsky <npostavs <at> gmail.com>
> Cc: Eli Zaretskii <eliz <at> gnu.org>,  "35766\@debbugs.gnu.org" <35766 <at> debbugs.gnu.org>
> Date: Fri, 17 May 2019 07:48:30 -0400
> 
>     UTF-16LE    1014    [RFC2781]   [RFC2781]   csUTF16LE

Ouch, I was looking at the wrong column in that document.

The problem is that our detection of encoding of XML files is based on
the assumption that the header is in ASCII-compatible encoding, which
UTF-16 isn't.  So regexp search for the XML header fails, and the
detection fails with it.

The patch below make us at least recognize UTF-16 with BOM, and also
stop the encoding from frightening the user when she specifies UTF-16
with BOM at buffer-save time.  But by default, saving a buffer with
UTF-16BE or UTF-16LE still produces a file without BOM, and that
cannot be detected by our encoding-detection machinery, leaving it to
the user to use "C-x RET c" or "C-x RET r".

Perhaps we should by default produce encoding with BOM when XML header
specifies UTF-16?

diff --git a/lisp/international/mule-cmds.el b/lisp/international/mule-cmds.el
index dfa9e4e..a248ef8 100644
--- a/lisp/international/mule-cmds.el
+++ b/lisp/international/mule-cmds.el
@@ -1029,7 +1029,11 @@ select-safe-coding-system
 		 ;; This check perhaps isn't ideal, but is probably
 		 ;; the best thing to do.
 		 (not (auto-coding-alist-lookup (or file buffer-file-name "")))
-		 (not (coding-system-equal coding-system auto-cs)))
+		 (not (coding-system-equal coding-system auto-cs))
+                 (or (equal (coding-system-type auto-cs) 'charset)
+                     (not (coding-system-equal (coding-system-type auto-cs)
+                                               (coding-system-type
+                                                coding-system)))))
 	    (unless (yes-or-no-p
 		     (format "Selected encoding %s disagrees with \
 %s specified by file contents.  Really save (else edit coding cookies \
diff --git a/lisp/international/mule.el b/lisp/international/mule.el
index b5414de..fcdcd3c 100644
--- a/lisp/international/mule.el
+++ b/lisp/international/mule.el
@@ -2587,9 +2587,14 @@ xml-find-file-coding-system
       (let ((detected
              (with-coding-priority '(utf-8)
                (coding-system-base
-                (detect-coding-region (point-min) (point-max) t)))))
-        ;; Pure ASCII always comes back as undecided.
+                (detect-coding-region (point-min) (point-max) t))))
+            (bom (list (char-after 1) (char-after 2))))
         (cond
+         ((equal bom '(#xFE #xFF))
+          'utf-16be-with-signature)
+         ((equal bom '(#xFF #xFE))
+          'utf-16le-with-signature)
+         ;; Pure ASCII always comes back as undecided.
          ((memq detected '(utf-8 undecided))
           'utf-8)
          ((eq detected 'utf-16le-with-signature) 'utf-16le-with-signature)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35766; Package emacs. (Fri, 17 May 2019 16:29:02 GMT) Full text and rfc822 format available.

Message #29 received at 35766 <at> debbugs.gnu.org (full text, mbox):

From: npostavs <at> gmail.com
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: jszabo_98 <at> hotmail.com, 35766 <at> debbugs.gnu.org,
 Noam Postavsky <npostavs <at> gmail.com>
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be
Date: Fri, 17 May 2019 12:27:50 -0400
Eli Zaretskii <eliz <at> gnu.org> writes:

> Perhaps we should by default produce encoding with BOM when XML header
> specifies UTF-16?

I think yes, https://www.w3.org/TR/xml/#charencoding says

    Entities encoded in UTF-16 MUST [...] begin with the Byte Order Mark

By the way, is Bug#8282 the same as this one, or just closely related?
It's talking about sgml-html-meta-auto-coding-function (though maybe
sgml-xml-auto-coding-function is more relevant).  I'm getting a little
confused between all the different *-find/auto-coding-* functions.
There is also nxml-set-auto-coding which seems to be mostly unused.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35766; Package emacs. (Fri, 17 May 2019 16:58:02 GMT) Full text and rfc822 format available.

Message #32 received at 35766 <at> debbugs.gnu.org (full text, mbox):

From: J S <jszabo_98 <at> hotmail.com>
To: "npostavs <at> gmail.com" <npostavs <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>
Cc: "35766 <at> debbugs.gnu.org" <35766 <at> debbugs.gnu.org>
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be
Date: Fri, 17 May 2019 16:57:23 +0000
[Message part 1 (text/plain, inline)]
When an xml file just says encoding="UTF-16", how does an application pick big endian vs little endian?

________________________________
From: npostavs <at> gmail.com <npostavs <at> gmail.com>
Sent: Friday, May 17, 2019 4:27 PM
To: Eli Zaretskii
Cc: Noam Postavsky; jszabo_98 <at> hotmail.com; 35766 <at> debbugs.gnu.org
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be

Eli Zaretskii <eliz <at> gnu.org> writes:

> Perhaps we should by default produce encoding with BOM when XML header
> specifies UTF-16?

I think yes, https://www.w3.org/TR/xml/#charencoding says

    Entities encoded in UTF-16 MUST [...] begin with the Byte Order Mark

By the way, is Bug#8282 the same as this one, or just closely related?
It's talking about sgml-html-meta-auto-coding-function (though maybe
sgml-xml-auto-coding-function is more relevant).  I'm getting a little
confused between all the different *-find/auto-coding-* functions.
There is also nxml-set-auto-coding which seems to be mostly unused.
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35766; Package emacs. (Fri, 17 May 2019 19:48:02 GMT) Full text and rfc822 format available.

Message #35 received at 35766 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: J S <jszabo_98 <at> hotmail.com>
Cc: 35766 <at> debbugs.gnu.org, npostavs <at> gmail.com
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be
Date: Fri, 17 May 2019 22:46:59 +0300
> From: J S <jszabo_98 <at> hotmail.com>
> CC: "35766 <at> debbugs.gnu.org" <35766 <at> debbugs.gnu.org>
> Date: Fri, 17 May 2019 16:57:23 +0000
> 
> When an xml file just says encoding="UTF-16", how does an application pick big endian vs little endian?

What is "an application" in this context?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35766; Package emacs. (Fri, 17 May 2019 20:17:02 GMT) Full text and rfc822 format available.

Message #38 received at 35766 <at> debbugs.gnu.org (full text, mbox):

From: J S <jszabo_98 <at> hotmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: "35766 <at> debbugs.gnu.org" <35766 <at> debbugs.gnu.org>,
 "npostavs <at> gmail.com" <npostavs <at> gmail.com>
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be
Date: Fri, 17 May 2019 20:16:41 +0000
[Message part 1 (text/plain, inline)]
For example, if I save this xml file in emacs, it saves it as utf-16 big endian:


<?xml version="1.0" encoding="UTF-16"?>

If I do this in powershell (really a .net method), it saves it as utf-16 little endian (osx or windows):

[xml]$xml = get-content file.xml
$xml.save('file.xml')


________________________________
From: Eli Zaretskii <eliz <at> gnu.org>
Sent: Friday, May 17, 2019 7:46 PM
To: J S
Cc: npostavs <at> gmail.com; 35766 <at> debbugs.gnu.org
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be

> From: J S <jszabo_98 <at> hotmail.com>
> CC: "35766 <at> debbugs.gnu.org" <35766 <at> debbugs.gnu.org>
> Date: Fri, 17 May 2019 16:57:23 +0000
>
> When an xml file just says encoding="UTF-16", how does an application pick big endian vs little endian?

What is "an application" in this context?
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35766; Package emacs. (Sat, 18 May 2019 05:34:02 GMT) Full text and rfc822 format available.

Message #41 received at 35766 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: J S <jszabo_98 <at> hotmail.com>
Cc: 35766 <at> debbugs.gnu.org, npostavs <at> gmail.com
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be
Date: Sat, 18 May 2019 08:33:17 +0300
> From: J S <jszabo_98 <at> hotmail.com>
> CC: "npostavs <at> gmail.com" <npostavs <at> gmail.com>, "35766 <at> debbugs.gnu.org"
> 	<35766 <at> debbugs.gnu.org>
> Date: Fri, 17 May 2019 20:16:41 +0000
> 
> For example, if I save this xml file in emacs, it saves it as utf-16 big endian:
> 
> <?xml version="1.0" encoding="UTF-16"?>

This is the Emacs default, which is well documented, and is also
according to what the UTF-16 spec (RFC 2781) says.

> If I do this in powershell (really a .net method), it saves it as utf-16 little endian (osx or windows):

Then PowerShell behaves in violation of RFC 2781.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35766; Package emacs. (Sat, 18 May 2019 07:27:01 GMT) Full text and rfc822 format available.

Message #44 received at 35766 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: npostavs <at> gmail.com
Cc: jszabo_98 <at> hotmail.com, 35766 <at> debbugs.gnu.org
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be
Date: Sat, 18 May 2019 10:26:09 +0300
merge 8282 35766
close 36766
thanks

> From: npostavs <at> gmail.com
> Cc: Noam Postavsky <npostavs <at> gmail.com>,  jszabo_98 <at> hotmail.com,  35766 <at> debbugs.gnu.org
> Date: Fri, 17 May 2019 12:27:50 -0400
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > Perhaps we should by default produce encoding with BOM when XML header
> > specifies UTF-16?
> 
> I think yes, https://www.w3.org/TR/xml/#charencoding says
> 
>     Entities encoded in UTF-16 MUST [...] begin with the Byte Order Mark

OK, I did that as well, and pushed the changes to master.

> By the way, is Bug#8282 the same as this one, or just closely related?

It's the same problem; merged the bugs.

> It's talking about sgml-html-meta-auto-coding-function (though maybe
> sgml-xml-auto-coding-function is more relevant).  I'm getting a little
> confused between all the different *-find/auto-coding-* functions.

The function relevant for the recipe in bug#8282 is
sgml-xml-auto-coding-function, which is where I made the changes.  If
the HTML and/or SGML specs also mandate that we use BOM, then maybe we
need the same changes in sgml-html-meta-auto-coding-function as well.
Note that there's no equivalent for xml-find-file-coding-system for
non-XML files, so recognition of visited UTF-16 HTML files will not
work even if they do have a BOM.

> There is also nxml-set-auto-coding which seems to be mostly unused.

It is supposed to be used by packages that build on top of nXml,
AFAIU.

Thanks.




Merged 8282 8283 35766. Request was from Eli Zaretskii <eliz <at> gnu.org> to control <at> debbugs.gnu.org. (Sat, 18 May 2019 07:27:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 35766 <at> debbugs.gnu.org and J S <jszabo_98 <at> hotmail.com> Request was from Eli Zaretskii <eliz <at> gnu.org> to control <at> debbugs.gnu.org. (Sat, 18 May 2019 07:45:02 GMT) Full text and rfc822 format available.

bug Marked as fixed in versions 27.1. Request was from Noam Postavsky <npostavs <at> gmail.com> to control <at> debbugs.gnu.org. (Sat, 18 May 2019 11:30:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35766; Package emacs. (Sat, 18 May 2019 20:58:03 GMT) Full text and rfc822 format available.

Message #53 received at 35766 <at> debbugs.gnu.org (full text, mbox):

From: J S <jszabo_98 <at> hotmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: "35766 <at> debbugs.gnu.org" <35766 <at> debbugs.gnu.org>,
 "npostavs <at> gmail.com" <npostavs <at> gmail.com>
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be
Date: Sat, 18 May 2019 20:57:51 +0000
[Message part 1 (text/plain, inline)]
RFC 2781 under "4.3 Interpreting text labelled as UTF-16" says is that if a document is labelled "UTF-16", the application should check the byte order mark to see if it is little endian or big endian   Only if there's no byte order mark, should the document be interpreted as big endian.

________________________________
From: Eli Zaretskii <eliz <at> gnu.org>
Sent: Saturday, May 18, 2019 5:33 AM
To: J S
Cc: npostavs <at> gmail.com; 35766 <at> debbugs.gnu.org
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be

> From: J S <jszabo_98 <at> hotmail.com>
> CC: "npostavs <at> gmail.com" <npostavs <at> gmail.com>, "35766 <at> debbugs.gnu.org"
>        <35766 <at> debbugs.gnu.org>
> Date: Fri, 17 May 2019 20:16:41 +0000
>
> For example, if I save this xml file in emacs, it saves it as utf-16 big endian:
>
> <?xml version="1.0" encoding="UTF-16"?>

This is the Emacs default, which is well documented, and is also
according to what the UTF-16 spec (RFC 2781) says.

> If I do this in powershell (really a .net method), it saves it as utf-16 little endian (osx or windows):

Then PowerShell behaves in violation of RFC 2781.
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35766; Package emacs. (Sun, 19 May 2019 04:59:01 GMT) Full text and rfc822 format available.

Message #56 received at 35766 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: J S <jszabo_98 <at> hotmail.com>
Cc: "35766 <at> debbugs.gnu.org" <35766 <at> debbugs.gnu.org>,
 "npostavs <at> gmail.com" <npostavs <at> gmail.com>
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be
Date: Sun, 19 May 2019 07:58:01 +0300
On May 18, 2019 11:57:51 PM GMT+03:00, J S <jszabo_98 <at> hotmail.com> wrote:
> RFC 2781 under "4.3 Interpreting text labelled as UTF-16" says is that
> if a document is labelled "UTF-16", the application should check the
> byte order mark to see if it is little endian or big endian   Only if
> there's no byte order mark, should the document be interpreted as big
> endian.
> 


If you are talking about visiting an existing file, then the change I installed does just that.  I was talking about saving a file, in which case there's no BOM, since it isn't present in the buffer 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#35766; Package emacs. (Sun, 19 May 2019 14:13:02 GMT) Full text and rfc822 format available.

Message #59 received at 35766 <at> debbugs.gnu.org (full text, mbox):

From: J S <jszabo_98 <at> hotmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: "35766 <at> debbugs.gnu.org" <35766 <at> debbugs.gnu.org>,
 "npostavs <at> gmail.com" <npostavs <at> gmail.com>
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be
Date: Sun, 19 May 2019 14:12:14 +0000
[Message part 1 (text/plain, inline)]
Sounds good.
________________________________
From: Eli Zaretskii <eliz <at> gnu.org>
Sent: Sunday, May 19, 2019 4:58 AM
To: J S
Cc: npostavs <at> gmail.com; 35766 <at> debbugs.gnu.org
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be

On May 18, 2019 11:57:51 PM GMT+03:00, J S <jszabo_98 <at> hotmail.com> wrote:
> RFC 2781 under "4.3 Interpreting text labelled as UTF-16" says is that
> if a document is labelled "UTF-16", the application should check the
> byte order mark to see if it is little endian or big endian   Only if
> there's no byte order mark, should the document be interpreted as big
> endian.
>


If you are talking about visiting an existing file, then the change I installed does just that.  I was talking about saving a file, in which case there's no BOM, since it isn't present in the buffer
[Message part 2 (text/html, inline)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 17 Jun 2019 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 286 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.