GNU bug report logs - #20891
emacs: Back off if .doc is not an Office document

Previous Next

Package: emacs;

Reported by: era+emacs <at> iki.fi

Date: Wed, 24 Jun 2015 11:20:03 UTC

Severity: minor

Tags: fixed

Found in version 24.4+1-4ubuntu5

Fixed in version 27.1

Done: Robert Pluim <rpluim <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 20891 in the body.
You can then email your comments to 20891 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to era+emacs <at> iki.fi, bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Wed, 24 Jun 2015 11:20:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to era+emacs <at> iki.fi:
New bug report received and forwarded. Copy sent to era+emacs <at> iki.fi, bug-gnu-emacs <at> gnu.org. (Wed, 24 Jun 2015 11:20:04 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: era+emacs <at> iki.fi
To: submit <at> debbugs.gnu.org
Subject: emacs: Back off if .doc is not an Office document
Date: Wed, 24 Jun 2015 14:19:18 +0300
Package: emacs
Severity: normal
Version: 24.4+1-4ubuntu5
X-Debbugs-Cc: era+emacs <at> iki.fi

(I am forwarding the following bug from the Ubuntu Launchpad bug
tracking system.  The original report contains some upset lanuage; the
boiled-down summary at the top is mine.)

https://bugs.launchpad.net/ubuntu/+source/emacs24/+bug/1466139

It is not uncommon for *.doc files to contain plain ASCII text. In this
case, the default behavior of Emacs is less than ideal, as described in
more detail in the problem report below. Perhaps the .doc file name
mapping should contain some additional heuristics, and fall back to
plain text if the file is not an Office document.

Original problem description follows.

-----

Today I downloaded the sources of secure delete. Inspected some files
with vi and some with Emacs 24. Did what I wanted to do, started to
listen to my favourite internet radio station, wanted to cite on
Facebook a citation from the secure delete docs.

I wanted to open the file "secure_delete.doc" (a pure ASCII text file)
in Emacs 24 and: "Whenever you see this buffer I'm going to make a
picture of it and you won't be able to edit anything." Haha, no this
really reminds me of the monkey face during the Ubuntu installation. But
don't make a monkey out of me because Emacs 24 is going to be replaced
with svi an extensible text base line editor yet to be written.

Emacs' open file is broken:
 - whenever it sees a file with the extension or post fix ".doc" it
 treats it like a Office document.
 - it takes an image of it
 - and shows you the image - which for a pure text file shows you the
 contents of the file as an image in that gone editor

They should use the /file/ utility to check for the file type - but
showing an unmutable picture of pure text is like making a monkey out of
the user.

-- 
If this were a real .signature, it would suck less.  Well, maybe not.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Thu, 01 Aug 2019 20:54:02 GMT) Full text and rfc822 format available.

Message #8 received at 20891 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: era+emacs <at> iki.fi
Cc: 20891 <at> debbugs.gnu.org
Subject: Re: bug#20891: emacs: Back off if .doc is not an Office document
Date: Thu, 01 Aug 2019 22:53:19 +0200
era+emacs <at> iki.fi writes:

> It is not uncommon for *.doc files to contain plain ASCII text. In this
> case, the default behavior of Emacs is less than ideal, as described in
> more detail in the problem report below. Perhaps the .doc file name
> mapping should contain some additional heuristics, and fall back to
> plain text if the file is not an Office document.

(I'm going through old bug reports that have unfortunately not gotten
any responses.)

I think this makes sense.  A fix in Emacs would mean moving the .doc
recognition from `auto-mode-alist' to...  `magic-fallback-mode-alist', I
guess.

According to the interwebs, the magic sequence for Word .doc files is:

D0 CF 11 E0 A1 B1 1A E1

Does anybody have an opinion here?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Wed, 06 Nov 2019 01:54:01 GMT) Full text and rfc822 format available.

Message #11 received at 20891 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: era+emacs <at> iki.fi, 20891 <at> debbugs.gnu.org
Subject: Re: bug#20891: emacs: Back off if .doc is not an Office document
Date: Wed, 06 Nov 2019 02:53:21 +0100
Lars Ingebrigtsen <larsi <at> gnus.org> writes:

> era+emacs <at> iki.fi writes:
>
>> It is not uncommon for *.doc files to contain plain ASCII text. In this
>> case, the default behavior of Emacs is less than ideal, as described in
>> more detail in the problem report below. Perhaps the .doc file name
>> mapping should contain some additional heuristics, and fall back to
>> plain text if the file is not an Office document.
>
> (I'm going through old bug reports that have unfortunately not gotten
> any responses.)
>
> I think this makes sense.  A fix in Emacs would mean moving the .doc
> recognition from `auto-mode-alist' to...  `magic-fallback-mode-alist', I
> guess.
>
> According to the interwebs, the magic sequence for Word .doc files is:
>
> D0 CF 11 E0 A1 B1 1A E1
>
> Does anybody have an opinion here?

I wasn't aware of the practice to name plain text files *.doc; I can't
remember having encountered any file like that.  Perhaps this practice
is rare.

Would implementing this risk make opening *.doc files slower for most
users?  Perhaps that could make the trade-off not worth it.  Other
than that, I see no problem with the proposal.

Best regards,
Stefan Kangas




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Wed, 06 Nov 2019 13:09:01 GMT) Full text and rfc822 format available.

Message #14 received at 20891 <at> debbugs.gnu.org (full text, mbox):

From: era <era+emacs <at> iki.fi>
To: 20891 <at> debbugs.gnu.org
Cc: Lars Ingebrigtsen <larsi <at> gnus.org>, Stefan Kangas <stefan <at> marxist.se>
Subject: Re: bug#20891: emacs: Back off if .doc is not an Office document
Date: Wed, 06 Nov 2019 15:08:14 +0200
On Wed, Nov 6, 2019, at 03:53, Stefan Kangas wrote:
> Lars Ingebrigtsen <larsi <at> gnus.org> writes:
> > era+emacs <at> iki.fi writes:
> >> It is not uncommon for *.doc files to contain plain ASCII text. In this
> >> case, the default behavior of Emacs is less than ideal
> > I think this makes sense.  A fix in Emacs would mean moving the .doc
> > recognition from `auto-mode-alist' to...  `magic-fallback-mode-alist', I
> > guess.
> I wasn't aware of the practice to name plain text files *.doc; I can't
> remember having encountered any file like that.  Perhaps this practice
> is rare.
> Would implementing this risk make opening *.doc files slower for most
> users?  Perhaps that could make the trade-off not worth it.  Other
> than that, I see no problem with the proposal.

I'd agree that this is probably increasingly rare, but it used to be a practice which wasn't entirely uncommon back when Microsoft was not yet a household brand name and Word wasn't taught in schools.

On the other hand, if the behavior described in the original bug report is still current, that's quirky and unexpected. Really, how many people *expect* Emacs to be able to open a Word document, and are any of them happy when they get a static image to look at in Emacs?

-- 
If this were a real .signature, it would suck less.  Well, maybe not.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Wed, 06 Nov 2019 23:20:01 GMT) Full text and rfc822 format available.

Message #17 received at 20891 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: era <era+emacs <at> iki.fi>
Cc: Lars Ingebrigtsen <larsi <at> gnus.org>, 20891 <at> debbugs.gnu.org
Subject: Re: bug#20891: emacs: Back off if .doc is not an Office document
Date: Thu, 07 Nov 2019 00:19:42 +0100
era <era+emacs <at> iki.fi> writes:

> On the other hand, if the behavior described in the original bug report is still
> current, that's quirky and unexpected. Really, how many people *expect* Emacs to
> be able to open a Word document, and are any of them happy when they get a
> static image to look at in Emacs?

AFAIU, the problem is that we do not have a mode to edit Microsoft
Word documents.  It would obviously be fantastic if someone would be
willing to write such a package, but it's a potentially big task.

So, as long as we lack editing capabilities, showing an image of the
document in Emacs is actually pretty useful.  More useful than getting
garbled text, at any rate.

Best regards,
Stefan Kangas




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Thu, 07 Nov 2019 04:46:02 GMT) Full text and rfc822 format available.

Message #20 received at 20891 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: Stefan Kangas <stefan <at> marxist.se>
Cc: era+emacs <at> iki.fi, larsi <at> gnus.org, 20891 <at> debbugs.gnu.org
Subject: Re: bug#20891: emacs: Back off if .doc is not an Office document
Date: Wed, 06 Nov 2019 23:45:42 -0500
[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > So, as long as we lack editing capabilities, showing an image of the
  > document in Emacs is actually pretty useful.

How would Emacs do that?

-- 
Dr Richard Stallman
Founder, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Thu, 07 Nov 2019 08:32:01 GMT) Full text and rfc822 format available.

Message #23 received at 20891 <at> debbugs.gnu.org (full text, mbox):

From: era <era+emacs <at> iki.fi>
To: "Richard Stallman" <rms <at> gnu.org>, 20891 <at> debbugs.gnu.org
Subject: Re: bug#20891: emacs: Back off if .doc is not an Office document
Date: Thu, 07 Nov 2019 10:29:15 +0200
On Thu, Nov 7, 2019, at 06:45, Richard Stallman wrote:
>   > So, as long as we lack editing capabilities, showing an image of the
>   > document in Emacs is actually pretty useful.
> How would Emacs do that?

The Emacs-side entry point seems to be doc-view-mode-maybe, which is hooked in auto-mode-alist for a number of file name extensions.

As described in https://www.emacswiki.org/emacs/DocViewMode it relies on external utilities to provide the actual image.

I was unable to quickly repro in a fresh Debian or Ubuntu image, but that might be because I didn't have the external utility installed.

Tangentially, googling for doc-view-mode-maybe suggests that lots of people are annoyed by it and want to turn it off, probably often for related but distinct reasons.

-- 
If this were a real .signature, it would suck less.  Well, maybe not.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Fri, 08 Nov 2019 21:00:02 GMT) Full text and rfc822 format available.

Message #26 received at 20891 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Stefan Kangas <stefan <at> marxist.se>
Cc: era+emacs <at> iki.fi, 20891 <at> debbugs.gnu.org
Subject: Re: bug#20891: emacs: Back off if .doc is not an Office document
Date: Fri, 08 Nov 2019 21:59:54 +0100
Stefan Kangas <stefan <at> marxist.se> writes:

> Would implementing this risk make opening *.doc files slower for most
> users?  Perhaps that could make the trade-off not worth it.  Other
> than that, I see no problem with the proposal.

I don't think it'd be any performance problem -- we'd just have to read
the first 8 bytes of the file to see whether the magic sequence is there.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Sat, 09 Nov 2019 06:26:01 GMT) Full text and rfc822 format available.

Message #29 received at 20891 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: era+emacs <at> iki.fi, 20891 <at> debbugs.gnu.org, stefan <at> marxist.se
Subject: Re: bug#20891: emacs: Back off if .doc is not an Office document
Date: Sat, 09 Nov 2019 08:25:38 +0200
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Date: Fri, 08 Nov 2019 21:59:54 +0100
> Cc: era+emacs <at> iki.fi, 20891 <at> debbugs.gnu.org
> 
> Stefan Kangas <stefan <at> marxist.se> writes:
> 
> > Would implementing this risk make opening *.doc files slower for most
> > users?  Perhaps that could make the trade-off not worth it.  Other
> > than that, I see no problem with the proposal.
> 
> I don't think it'd be any performance problem -- we'd just have to read
> the first 8 bytes of the file to see whether the magic sequence is there.

*.doc files are rare nowadays.  Do the *.docx files have the same
signature?  I doubt that, since they are actually *.zip files in
disguise.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Sat, 09 Nov 2019 20:15:01 GMT) Full text and rfc822 format available.

Message #32 received at 20891 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: era+emacs <at> iki.fi, 20891 <at> debbugs.gnu.org, stefan <at> marxist.se
Subject: Re: bug#20891: emacs: Back off if .doc is not an Office document
Date: Sat, 09 Nov 2019 21:14:52 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

>> I don't think it'd be any performance problem -- we'd just have to read
>> the first 8 bytes of the file to see whether the magic sequence is there.
>
> *.doc files are rare nowadays.  Do the *.docx files have the same
> signature?  I doubt that, since they are actually *.zip files in
> disguise.

Yeah, *.docx have a different signature, so this would be for *.doc files
only (and since the Windows *.doc files are becoming rarer, perhaps that
means that doing doc-view only on files that have the magic bytes is
more important than it used to be).

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Thu, 14 Nov 2019 08:55:02 GMT) Full text and rfc822 format available.

Message #35 received at 20891 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: era+emacs <at> iki.fi, 20891 <at> debbugs.gnu.org, stefan <at> marxist.se
Subject: Re: bug#20891: emacs: Back off if .doc is not an Office document
Date: Thu, 14 Nov 2019 10:54:20 +0200
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: stefan <at> marxist.se,  era+emacs <at> iki.fi,  20891 <at> debbugs.gnu.org
> Date: Sat, 09 Nov 2019 21:14:52 +0100
> 
> since the Windows *.doc files are becoming rarer, perhaps that
> means that doing doc-view only on files that have the magic bytes is
> more important than it used to be

Sorry, I don't follow that logic.  I'd expect that *.doc MS Word files
becoming rarer would mean plain-text *.doc files become relatively
more important, i.e. the opposite conclusion.  What did I miss?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Thu, 14 Nov 2019 09:56:02 GMT) Full text and rfc822 format available.

Message #38 received at 20891 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: era+emacs <at> iki.fi, 20891 <at> debbugs.gnu.org, stefan <at> marxist.se
Subject: Re: bug#20891: emacs: Back off if .doc is not an Office document
Date: Thu, 14 Nov 2019 10:55:35 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Lars Ingebrigtsen <larsi <at> gnus.org>
>> Cc: stefan <at> marxist.se,  era+emacs <at> iki.fi,  20891 <at> debbugs.gnu.org
>> Date: Sat, 09 Nov 2019 21:14:52 +0100
>> 
>> since the Windows *.doc files are becoming rarer, perhaps that
>> means that doing doc-view only on files that have the magic bytes is
>> more important than it used to be
>
> Sorry, I don't follow that logic.  I'd expect that *.doc MS Word files
> becoming rarer would mean plain-text *.doc files become relatively
> more important, i.e. the opposite conclusion.  What did I miss?

That's what I'm saying.  :-) Or at least I tried to.  It's more
important to add magic byte recognition to doc-mode for .doc files now
than before.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Thu, 14 Nov 2019 14:13:02 GMT) Full text and rfc822 format available.

Message #41 received at 20891 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: era+emacs <at> iki.fi, 20891 <at> debbugs.gnu.org, stefan <at> marxist.se
Subject: Re: bug#20891: emacs: Back off if .doc is not an Office document
Date: Thu, 14 Nov 2019 16:12:22 +0200
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: stefan <at> marxist.se,  era+emacs <at> iki.fi,  20891 <at> debbugs.gnu.org
> Date: Thu, 14 Nov 2019 10:55:35 +0100
> 
> > Sorry, I don't follow that logic.  I'd expect that *.doc MS Word files
> > becoming rarer would mean plain-text *.doc files become relatively
> > more important, i.e. the opposite conclusion.  What did I miss?
> 
> That's what I'm saying.  :-) Or at least I tried to.  It's more
> important to add magic byte recognition to doc-mode for .doc files now
> than before.

How would the magic signature recognition help with plain-text files?
They don't have any such signatures?  I'm still missing something,
sorry.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Thu, 14 Nov 2019 15:07:02 GMT) Full text and rfc822 format available.

Message #44 received at 20891 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: era+emacs <at> iki.fi, Lars Ingebrigtsen <larsi <at> gnus.org>, 20891 <at> debbugs.gnu.org,
 stefan <at> marxist.se
Subject: Re: bug#20891: emacs: Back off if .doc is not an Office document
Date: Thu, 14 Nov 2019 16:06:36 +0100
>>>>> On Thu, 14 Nov 2019 16:12:22 +0200, Eli Zaretskii <eliz <at> gnu.org> said:

    Eli> How would the magic signature recognition help with plain-text files?
    Eli> They don't have any such signatures?  I'm still missing something,
    Eli> sorry.

Today we go: ".doc extension -> show an image of the contents of the
file" which is manifestly the wrong thing to do for a non-doc file. If
we do the signature recognition, those files which are not recognized
end up in (probably) fundamental-mode

Robert




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Thu, 14 Nov 2019 16:21:02 GMT) Full text and rfc822 format available.

Message #47 received at 20891 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: era+emacs <at> iki.fi, larsi <at> gnus.org, 20891 <at> debbugs.gnu.org, stefan <at> marxist.se
Subject: Re: bug#20891: emacs: Back off if .doc is not an Office document
Date: Thu, 14 Nov 2019 18:19:43 +0200
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: Lars Ingebrigtsen <larsi <at> gnus.org>,  era+emacs <at> iki.fi,
>   20891 <at> debbugs.gnu.org,  stefan <at> marxist.se
> Date: Thu, 14 Nov 2019 16:06:36 +0100
> 
> Today we go: ".doc extension -> show an image of the contents of the
> file"

Where do we have the code or data which does that?

> which is manifestly the wrong thing to do for a non-doc file. If
> we do the signature recognition, those files which are not recognized
> end up in (probably) fundamental-mode

That's OK, but I'm still missing the code which makes this happen.
E.g., I just did "C-x C-f foo.doc RET" and got a buffer in Fundamental
mode.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Thu, 14 Nov 2019 16:34:02 GMT) Full text and rfc822 format available.

Message #50 received at 20891 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> suse.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: era+emacs <at> iki.fi, Robert Pluim <rpluim <at> gmail.com>, 20891 <at> debbugs.gnu.org,
 stefan <at> marxist.se, larsi <at> gnus.org
Subject: Re: bug#20891: emacs: Back off if .doc is not an Office document
Date: Thu, 14 Nov 2019 17:33:28 +0100
On Nov 14 2019, Eli Zaretskii wrote:

> Where do we have the code or data which does that?

See auto-mode-alist, and doc-view-mode-maybe.

> That's OK, but I'm still missing the code which makes this happen.
> E.g., I just did "C-x C-f foo.doc RET" and got a buffer in Fundamental
> mode.

It only works if you have a doc-view-odf->pdf-converter-program.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab <at> suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Thu, 14 Nov 2019 16:44:02 GMT) Full text and rfc822 format available.

Message #53 received at 20891 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andreas Schwab <schwab <at> suse.de>
Cc: era+emacs <at> iki.fi, rpluim <at> gmail.com, 20891 <at> debbugs.gnu.org,
 stefan <at> marxist.se, larsi <at> gnus.org
Subject: Re: bug#20891: emacs: Back off if .doc is not an Office document
Date: Thu, 14 Nov 2019 18:42:40 +0200
> From: Andreas Schwab <schwab <at> suse.de>
> Cc: Robert Pluim <rpluim <at> gmail.com>,  era+emacs <at> iki.fi,  larsi <at> gnus.org,  20891 <at> debbugs.gnu.org,  stefan <at> marxist.se
> Date: Thu, 14 Nov 2019 17:33:28 +0100
> 
> On Nov 14 2019, Eli Zaretskii wrote:
> 
> > Where do we have the code or data which does that?
> 
> See auto-mode-alist, and doc-view-mode-maybe.
> 
> > That's OK, but I'm still missing the code which makes this happen.
> > E.g., I just did "C-x C-f foo.doc RET" and got a buffer in Fundamental
> > mode.
> 
> It only works if you have a doc-view-odf->pdf-converter-program.

Thanks, I was blind.

So we want to remove docx? from auto-mode-alist and instead to add the
magic signature to magic-mode-alist?  But then AFAIK MS Word documents
had different signatures for different versions, so we should have
several.  And a literal docx should be left in auto-mode-alist, right?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Fri, 15 Nov 2019 07:51:02 GMT) Full text and rfc822 format available.

Message #56 received at 20891 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: era+emacs <at> iki.fi, 20891 <at> debbugs.gnu.org, stefan <at> marxist.se
Subject: Re: bug#20891: emacs: Back off if .doc is not an Office document
Date: Fri, 15 Nov 2019 08:50:46 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

>> That's what I'm saying.  :-) Or at least I tried to.  It's more
>> important to add magic byte recognition to doc-mode for .doc files now
>> than before.
>
> How would the magic signature recognition help with plain-text files?
> They don't have any such signatures?  I'm still missing something,
> sorry.

The magic signature recognition is for the MS .doc files, not the text
files.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Fri, 15 Nov 2019 07:52:02 GMT) Full text and rfc822 format available.

Message #59 received at 20891 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: era+emacs <at> iki.fi, Andreas Schwab <schwab <at> suse.de>, rpluim <at> gmail.com,
 20891 <at> debbugs.gnu.org, stefan <at> marxist.se
Subject: Re: bug#20891: emacs: Back off if .doc is not an Office document
Date: Fri, 15 Nov 2019 08:51:40 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

> But then AFAIK MS Word documents had different signatures for
> different versions, so we should have several.

All .doc files allegedly start with the same eight bytes.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Fri, 15 Nov 2019 08:49:01 GMT) Full text and rfc822 format available.

Message #62 received at 20891 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: era+emacs <at> iki.fi, schwab <at> suse.de, rpluim <at> gmail.com, 20891 <at> debbugs.gnu.org,
 stefan <at> marxist.se
Subject: Re: bug#20891: emacs: Back off if .doc is not an Office document
Date: Fri, 15 Nov 2019 10:48:26 +0200
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: Andreas Schwab <schwab <at> suse.de>,  rpluim <at> gmail.com,  era+emacs <at> iki.fi,
>   20891 <at> debbugs.gnu.org,  stefan <at> marxist.se
> Date: Fri, 15 Nov 2019 08:51:40 +0100
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > But then AFAIK MS Word documents had different signatures for
> > different versions, so we should have several.
> 
> All .doc files allegedly start with the same eight bytes.

Maybe my reading of the 'magic' file is wrong, but it seems to say
otherwise.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Fri, 15 Nov 2019 08:57:01 GMT) Full text and rfc822 format available.

Message #65 received at 20891 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: era+emacs <at> iki.fi, schwab <at> suse.de, rpluim <at> gmail.com, 20891 <at> debbugs.gnu.org,
 stefan <at> marxist.se
Subject: Re: bug#20891: emacs: Back off if .doc is not an Office document
Date: Fri, 15 Nov 2019 09:56:07 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

> Maybe my reading of the 'magic' file is wrong, but it seems to say
> otherwise.

I just consulted https://en.wikipedia.org/wiki/List_of_file_signatures,
but I have little practical experience with .doc files myself.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Fri, 15 Nov 2019 09:15:02 GMT) Full text and rfc822 format available.

Message #68 received at 20891 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: era+emacs <at> iki.fi, Andreas Schwab <schwab <at> suse.de>, larsi <at> gnus.org,
 20891 <at> debbugs.gnu.org, stefan <at> marxist.se
Subject: Re: bug#20891: emacs: Back off if .doc is not an Office document
Date: Fri, 15 Nov 2019 10:14:19 +0100
>>>>> On Thu, 14 Nov 2019 18:42:40 +0200, Eli Zaretskii <eliz <at> gnu.org> said:

    Eli> So we want to remove docx? from auto-mode-alist and instead to add the
    Eli> magic signature to magic-mode-alist?  But then AFAIK MS Word documents
    Eli> had different signatures for different versions, so we should have
    Eli> several.  And a literal docx should be left in auto-mode-alist, right?

Yes. The following detects a word 97 file for me, and a text .doc file
opens in fundamental-mode.

diff --git i/lisp/files.el w/lisp/files.el
index 053583b4cb..ea3d3deb34 100644
--- i/lisp/files.el
+++ w/lisp/files.el
@@ -2798,7 +2798,7 @@ auto-mode-alist
      ("\\.\\(diffs?\\|patch\\|rej\\)\\'" . diff-mode)
      ("\\.\\(dif\\|pat\\)\\'" . diff-mode) ; for MS-DOS
      ("\\.[eE]?[pP][sS]\\'" . ps-mode)
-     ("\\.\\(?:PDF\\|DVI\\|OD[FGPST]\\|DOCX?\\|XLSX?\\|PPTX?\\|pdf\\|djvu\\|dvi\\|od[fgpst]\\|docx?\\|xlsx?\\|pptx?\\)\\'" . doc-view-mode-maybe)
+     ("\\.\\(?:PDF\\|DVI\\|OD[FGPST]\\|DOCX\\|XLSX?\\|PPTX?\\|pdf\\|djvu\\|dvi\\|od[fgpst]\\|docx\\|xlsx?\\|pptx?\\)\\'" . doc-view-mode-maybe)
      ("configure\\.\\(ac\\|in\\)\\'" . autoconf-mode)
      ("\\.s\\(v\\|iv\\|ieve\\)\\'" . sieve-mode)
      ("BROWSE\\'" . ebrowse-tree-mode)
@@ -3062,6 +3062,7 @@ magic-fallback-mode-alist
             (comment-re (concat "\\(?:!--" incomment-re "*-->[ \t\r\n]*<\\)")))
        (concat "[ \t\r\n]*<" comment-re "*!DOCTYPE "))
      . sgml-mode)
+    ("\320\317\021\340\241\261\032\341" . doc-view-mode-maybe)
     ("%!PS" . ps-mode)
     ("# xmcd " . conf-unix-mode)))
   "Like `magic-mode-alist' but has lower priority than `auto-mode-alist'.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Fri, 15 Nov 2019 09:52:01 GMT) Full text and rfc822 format available.

Message #71 received at 20891 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: era+emacs <at> iki.fi, schwab <at> suse.de, rpluim <at> gmail.com, 20891 <at> debbugs.gnu.org,
 stefan <at> marxist.se
Subject: Re: bug#20891: emacs: Back off if .doc is not an Office document
Date: Fri, 15 Nov 2019 11:51:23 +0200
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: schwab <at> suse.de,  rpluim <at> gmail.com,  era+emacs <at> iki.fi,
>   20891 <at> debbugs.gnu.org,  stefan <at> marxist.se
> Date: Fri, 15 Nov 2019 09:56:07 +0100
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > Maybe my reading of the 'magic' file is wrong, but it seems to say
> > otherwise.
> 
> I just consulted https://en.wikipedia.org/wiki/List_of_file_signatures,
> but I have little practical experience with .doc files myself.

Thanks, I'm good with using just one signature for now.  I don't think
the other signatures, if they exist, are important enough to postpone
fixing this issue.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20891; Package emacs. (Fri, 15 Nov 2019 13:21:01 GMT) Full text and rfc822 format available.

Message #74 received at 20891 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: era+emacs <at> iki.fi, schwab <at> suse.de, Lars Ingebrigtsen <larsi <at> gnus.org>,
 20891 <at> debbugs.gnu.org, stefan <at> marxist.se
Subject: Re: bug#20891: emacs: Back off if .doc is not an Office document
Date: Fri, 15 Nov 2019 14:20:01 +0100
tags 20891 fixed
close 20891 27.1
quit

>>>>> On Fri, 15 Nov 2019 11:51:23 +0200, Eli Zaretskii <eliz <at> gnu.org> said:

    Eli> Thanks, I'm good with using just one signature for now.  I don't think
    Eli> the other signatures, if they exist, are important enough to postpone
    Eli> fixing this issue.

Closing.
Committed as 904146cf79

Robert




Added tag(s) fixed. Request was from Robert Pluim <rpluim <at> gmail.com> to control <at> debbugs.gnu.org. (Fri, 15 Nov 2019 13:21:02 GMT) Full text and rfc822 format available.

bug marked as fixed in version 27.1, send any further explanations to 20891 <at> debbugs.gnu.org and era+emacs <at> iki.fi Request was from Robert Pluim <rpluim <at> gmail.com> to control <at> debbugs.gnu.org. (Fri, 15 Nov 2019 13:21:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 14 Dec 2019 12:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 127 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.