GNU bug report logs - #9747
C-x h TAB and M-x untabify

Previous Next

Package: emacs;

Reported by: noloader <at> gmail.com

Date: Thu, 13 Oct 2011 23:33:01 UTC

Severity: normal

To reply to this bug, email your comments to 9747 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#9747; Package emacs. (Thu, 13 Oct 2011 23:33:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to noloader <at> gmail.com:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 13 Oct 2011 23:33:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jeffrey Walton <noloader <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: C-x h TAB and M-x untabify
Date: Thu, 13 Oct 2011 19:27:54 -0400
[Message part 1 (text/plain, inline)]
I often use C-x h TAB and M-x untabify to format C, C++, and Java code.

If a document has an errant UTF-8 byte order mark (a UTF-8 BOM is EF
BB BF), Emacs cannot always format the source file.

For example, the attached Java file (JavaEncryptor.java-backup) has
1845 BOMs sprinkled throughout. I'm not sure what editor put them in,
but Emacs does not properly handle some operations with them present.
If I strip the errant BOMs with the attached program
(efbbbf-strip.cpp), Emacs will properly format the file.
[JavaEncryptor.java-backup (application/octet-stream, attachment)]
[efbbbf-strip.cpp (text/x-c++src, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#9747; Package emacs. (Wed, 19 Oct 2011 23:57:01 GMT) Full text and rfc822 format available.

Message #8 received at 9747 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> jurta.org>
To: noloader <at> gmail.com
Cc: 9747 <at> debbugs.gnu.org
Subject: Re: bug#9747: C-x h TAB and M-x untabify
Date: Thu, 20 Oct 2011 02:32:14 +0300
> I often use C-x h TAB and M-x untabify to format C, C++, and Java code.
>
> If a document has an errant UTF-8 byte order mark (a UTF-8 BOM is EF
> BB BF), Emacs cannot always format the source file.
>
> For example, the attached Java file (JavaEncryptor.java-backup) has
> 1845 BOMs sprinkled throughout. I'm not sure what editor put them in,
> but Emacs does not properly handle some operations with them present.
> If I strip the errant BOMs with the attached program
> (efbbbf-strip.cpp), Emacs will properly format the file.

"BYTE ORDER MARK" is the old name of the U+FEFF character.
The new name is "ZERO WIDTH NO-BREAK SPACE".

You can add to your .emacs something like:

(eval-after-load "cc-mode"
  '(progn (modify-syntax-entry ?\uFEFF " " java-mode-syntax-table)))

and the most of indentation code will work correctly.

However, in some places in core packages we need to replace such code

  (skip-chars-forward " \t")

with

  (skip-chars-forward " \t\uFEFF")

to take into account other whitespace characters.




This bug report was last modified 209 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.