GNU bug report logs - #7781
23.2.91; ispell problem with hunspell and UTF-8 file

Previous Next

Package: emacs;

Reported by: Reuben Thomas <rrt <at> sc3d.org>

Date: Mon, 3 Jan 2011 23:08:01 UTC

Severity: normal

Tags: notabug

Found in version 23.2.91

Done: Stefan Kangas <stefan <at> marxist.se>

Bug is archived. No further changes may be made.

Forwarded to https://sourceforge.net/tracker/?func=detail&aid=3178449&group_id=143754&atid=756395

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 7781 in the body.
You can then email your comments to 7781 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Mon, 03 Jan 2011 23:08:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Reuben Thomas <rrt <at> sc3d.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Mon, 03 Jan 2011 23:08:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Reuben Thomas <rrt <at> sc3d.org>
To: bug-gnu-emacs <at> gnu.org
Subject: 23.2.91; ispell problem with hunspell and UTF-8 file
Date: Mon, 03 Jan 2011 23:14:41 +0000
With the following text, and using emacs -Q, I get the errors you can
see in the messages log below when using hunspell to spell-check a UTF-8
buffer with some extended characters in it.

I did test this with emacs -Q, but the current session, in which I
reproduced the problem and am now composing this bug report, was not
started with -Q (this is so submitting the bug report works properly!).

I am running a freshly bzr-pulled build of the emacs-23 branch.

Text follows

----cut here----
---
title: Kindle 3 is a good first attempt
tags: computing, books
format: markdown
date: Mon, 03 Jan 2011 20:53:13 +0000
post-id: 2585181001
---

Giving my girlfriend a Kindle for Christmas was the carrot in a multi-pronged strategy to avoid needing more bookshelves (the stick being “I will start giving away your books” and my contribution being to archive books I’ve read (or return the many that aren’t even mine). This therefore required that I stocked it with books before she got her hands on it, which in turn was all the excuse I needed to play with the thing.

My lazy solution was simply to download all of [Feedbooks](http://www.feedbooks.com); I [wrote some scripts](http://rrt.sc3d.org/Software/Kindle/) to make this actually lazy, rather than brain-numbingly dull. In the process I found that while the Kindle is nice to hold and great to read, it struggles to cope with a large collection of books (even though the nearly 3,000 volumes of Feedbooks only half-filled its 4Gb memory), and is woeful as a research tool. And, of course, Amazon’s first-mover-evil surfaced early.

Here are the problems I had:

1. Amazon’s own store doesn’t seem to contain free books. I think it’s poor form not to give people a straightforward choice of free editions of out-of-copyright works. The Kindle may be a loss leader, but at £109 it’s still not cheap. Feedbooks, rather than integrating easily into the Kindle, like, say, a 3rd-party software provider into Ubuntu’s Software Center, provide a catalogue which itself is in the form of a book, doesn’t automatically update, and offers a list ordered only by title. In other words, it’s useless; one is better off using the built-in web browser to search the online catalogue…

2. …or better, another browser, since the Kindle’s is woefully slow (and I don’t just mean the screen update). It’s just about usable, and hence useful in an emergency, but is no good as, for example, an online research tool to use in parallel with the books you have downloaded, although…

3. …offline search is awful too. With just the few ebooks that come loaded on the device, it was slow; with the thousands of books I loaded, it simply locked up the device, even when trying to search in the manual, presumably already indexed. The Kindle seems to index its contents in the background, but even now, over a week later, search doesn’t work. The only effective navigation is by a book’s table of contents, and, to choose which books to read, the user-definable collections, though…

4. …collections are a pain to set up for many books, as you have to select each book manually; there is no way I have found to select a range. (Fortunately, I was able to define collections programmatically, but this will be beyond most users.)

In summary, it’s a lovely device, but the software is rather toytown. Amazon could improve it (and indeed, the 3.0.3 firmware update, at the experimental stage when I checked, claims, vaguely, “performance improvements”), but given that their main interest is in selling books and Kindles, I’m not hopeful that it will happen before the next hardware iteration; whether it happens at all depends on competition, and there should be plenty of that, to go by the number of other ebook readers.

----cut here----


In GNU Emacs 23.2.91.3 (i686-pc-linux-gnu, GTK+ Version 2.22.0)
 of 2011-01-03 on mord
Windowing system distributor `The X.Org Foundation', version 11.0.10900000
Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en_GB.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default enable-multibyte-characters: t

Major mode: Text

Minor modes in effect:
  longlines-mode: t
  buffer-face-mode: t
  flyspell-mode: t
  show-paren-mode: t
  savehist-mode: t
  minibuffer-electric-default-mode: t
  iswitchb-mode: t
  icomplete-mode: t
  global-auto-revert-mode: t
  desktop-save-mode: t
  smart-quotes-mode: t
  mouse-wheel-mode: t
  use-hard-newlines: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:
M-x r e p o r t - e m <tab> <return> h u n s p e l 
l SPC <M-backspace> i s p e l l SPC w i t h SPC h u 
n s l e <backspace> <backspace> s p e <backspace> <backspace> 
p e <backspace> <backspace> <backspace> p e l l SPC 
f a i l s C-g <down> <down> <down> <down> <down> <down> 
<down> <up> <up> <up> <up> <up> <up> <up> <up> <up> 
<up> <up> <up> <up> <up> <up> <up> M-x i s p e l l 
<return> SPC SPC SPC M-x i s p e <backspace> <backspace> 
<backspace> <backspace> <up> <up> <return>

Recent messages:
Scanning for "hard" Perl constructions... done
Applying style hooks... done
Scanning for "hard" Perl constructions... done
Scanning for "hard" Perl constructions... done
Scanning for "hard" Perl constructions... done
Scanning for "hard" Perl constructions... done
Lazy desktop load complete
Quit
Spell-checking Kindle 3 is a good first attempt using hunspell with british+accs dictionary...
Spell-checking region using hunspell with british+accs dictionary...done
ispell-process-line: Ispell misalignment: word `Feedbooks' point 1363; probably incompatible versions

Load-path shadows:
/usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-style hides /usr/share/emacs/site-lisp/auctex/tex-style
/usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-buf hides /usr/share/emacs/site-lisp/auctex/tex-buf
/usr/local/share/emacs/23.2.91/site-lisp/auctex/context hides /usr/share/emacs/site-lisp/auctex/context
/usr/local/share/emacs/23.2.91/site-lisp/auctex/bib-cite hides /usr/share/emacs/site-lisp/auctex/bib-cite
/usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-fold hides /usr/share/emacs/site-lisp/auctex/tex-fold
/usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-jp hides /usr/share/emacs/site-lisp/auctex/tex-jp
/usr/local/share/emacs/23.2.91/site-lisp/auctex/context-nl hides /usr/share/emacs/site-lisp/auctex/context-nl
/usr/local/share/emacs/23.2.91/site-lisp/auctex/toolbar-x hides /usr/share/emacs/site-lisp/auctex/toolbar-x
/usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-mik hides /usr/share/emacs/site-lisp/auctex/tex-mik
/usr/local/share/emacs/23.2.91/site-lisp/auctex/context-en hides /usr/share/emacs/site-lisp/auctex/context-en
/usr/local/share/emacs/23.2.91/site-lisp/auctex/texmathp hides /usr/share/emacs/site-lisp/auctex/texmathp
/usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-info hides /usr/share/emacs/site-lisp/auctex/tex-info
/usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-fptex hides /usr/share/emacs/site-lisp/auctex/tex-fptex
/usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-font hides /usr/share/emacs/site-lisp/auctex/tex-font
/usr/local/share/emacs/23.2.91/site-lisp/auctex/latex hides /usr/share/emacs/site-lisp/auctex/latex
/usr/local/share/emacs/23.2.91/site-lisp/auctex/font-latex hides /usr/share/emacs/site-lisp/auctex/font-latex
/usr/local/share/emacs/23.2.91/site-lisp/auctex/tex-bar hides /usr/share/emacs/site-lisp/auctex/tex-bar
/usr/local/share/emacs/23.2.91/site-lisp/auctex/multi-prompt hides /usr/share/emacs/site-lisp/auctex/multi-prompt
/usr/local/share/emacs/23.2.91/site-lisp/auctex/tex hides /usr/share/emacs/site-lisp/auctex/tex

Features:
(shadow sort mail-extr message sendmail ecomplete rfc822 mml mml-sec
password-cache mm-decode mm-bodies mm-encode mailcap mail-parse rfc2231
rfc2047 rfc2045 qp ietf-drums mailabbrev nnheader gnus-util netrc
time-date mm-util mail-prsvr gmm-utils wid-edit mailheader canlock sha1
hex-util hashcash mail-utils emacsbug preview prv-emacs byte-opt
warnings tex-buf noutline outline font-latex bytecomp byte-compile latex
tex-style tex nxml-uchnm rng-xsd xsd-regexp rng-cmpct rng-nxml rng-valid
rng-loc rng-uri rng-parse nxml-parse rng-match rng-dt rng-util rng-pttrn
nxml-ns nxml-mode nxml-outln nxml-rap nxml-util nxml-glyph nxml-enc
xmltok sgml-mode conf-mode newcomment make-mode vc-git cperl-mode
longlines face-remap filladapt flyspell auto-dictionary-autoloads
dictionary-autoloads js2-mode-autoloads package reporter completing-help
ff-paths uniquify paren savehist minibuf-eldef iswitchb icomplete
autorevert time cus-start cus-load desktop server change-mode advice
help-fns advice-preload php-mode derived etags cc-langs cl cl-19 cc-mode
cc-fonts cc-menus cc-cmds cc-styles cc-align cc-engine cc-vars cc-defs
speedbar sb-image ezimage dframe easymenu assoc lua-mode regexp-opt
comint ring whitespace etags-update smart-quotes edmacro kmacro ispell
ffap muse-autoloads emacs-goodies-el emacs-goodies-custom
emacs-goodies-loaddefs easy-mmode devhelp preview-latex tex-site
auto-loads tooltip ediff-hook vc-hooks lisp-float-type mwheel x-win
x-dnd font-setting tool-bar dnd fontset image fringe lisp-mode register
page menu-bar rfn-eshadow timer select scroll-bar mldrag mouse jit-lock
font-lock syntax facemenu font-core frame cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew
greek romanian slovak czech european ethiopic indian cyrillic chinese
case-table epa-hook jka-cmpr-hook help simple abbrev loaddefs button
minibuffer faces cus-face files text-properties overlay md5 base64
format env code-pages mule custom widget hashtable-print-readable
backquote make-network-process dbusbind system-font-setting
font-render-setting gtk x-toolkit x multi-tty emacs)

-- 
http://rrt.sc3d.org/




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Fri, 07 Jan 2011 13:07:01 GMT) Full text and rfc822 format available.

Message #8 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Agustin Martin <agustin.martin <at> hispalinux.es>
To: Reuben Thomas <rrt <at> sc3d.org>
Cc: 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: 23.2.91; ispell problem with hunspell and UTF-8 file
Date: Fri, 7 Jan 2011 14:14:03 +0100
2011/1/4 Reuben Thomas <rrt <at> sc3d.org>:
> With the following text, and using emacs -Q, I get the errors you can
> see in the messages log below when using hunspell to spell-check a UTF-8
> buffer with some extended characters in it.
>
> I did test this with emacs -Q, but the current session, in which I
> reproduced the problem and am now composing this bug report, was not
> started with -Q (this is so submitting the bug report works properly!).
>
> I am running a freshly bzr-pulled build of the emacs-23 branch.

Hi, Reuben,

I can also reproduce this with emacs23.2. I could locate problems in
two lines, after splititng original lines,

-- Cut here -- 8< ----- minimal.txt: utf-8
of out-of-copyright works. The Kindle may be a loss leader, but at £109
it’s still not cheap. Feedbooks, rather than integrating easily into
-- Cut here -- 8< ----- End of minimal.txt

In first line, currency seems to give some conversion errors when
iso-8859-1 is used, when that should have ignored by hunspell. I get
tons of

UTF-8 encoding error. Missing continuation byte in 0. character position:

for that line when using

$ cat minimal.txt | hunspell -d en_US -a -i iso-8859-1

In second line unusual apostrophe seems to cause some confusion to
hunspell when utf8 is used. Comparing what aspell and hunspell give in
similar text I get

$ cat minimal.txt | aspell --encoding=utf-8 -d en_US -a
& Feedbooks 6 22: Feed books, Feed-books, Feedback's, Feedbags, ...

$ cat minimal.txt | hunspell -d en_US -i utf-8 -a
& Feedbooks 8 24: Feed books, Feed-books, Feedback, Feedbags, ...

Do not worry about first number, is the number of suggestions. However
position in second number differ. Seems that hunspell is not
considering that apostrophe as a single (multibyte) char when
counting, but as three components

Looks to me an hunspell bug. I found no reference to this problem in
hunspell sf site, but noticed that Hunspell 1.2.14 was released
yesterday. Need to check if that has some related new.

-- 
Agustin




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Fri, 07 Jan 2011 14:24:02 GMT) Full text and rfc822 format available.

Message #11 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Reuben Thomas <rrt <at> sc3d.org>
To: Agustin Martin <agustin.martin <at> hispalinux.es>
Cc: 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: 23.2.91; ispell problem with hunspell and UTF-8 file
Date: Fri, 7 Jan 2011 14:30:37 +0000
Thanks very much for your investigation, Agustin.

I tried hunspell 1.2.14 and got exactly the same error.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Fri, 11 Feb 2011 16:53:01 GMT) Full text and rfc822 format available.

Message #14 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Agustin Martin <agustin.martin <at> hispalinux.es>
To: Reuben Thomas <rrt <at> sc3d.org>
Cc: 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: 23.2.91; ispell problem with hunspell and UTF-8 file
Date: Fri, 11 Feb 2011 18:00:53 +0100
forwarded 7781 https://sourceforge.net/tracker/?func=detail&aid=3178449&group_id=143754&atid=756395
thanks

2011/1/7 Agustin Martin <agustin.martin <at> hispalinux.es>:
> 2011/1/4 Reuben Thomas <rrt <at> sc3d.org>:
>> With the following text, and using emacs -Q, I get the errors you can
>> see in the messages log below when using hunspell to spell-check a UTF-8
>> buffer with some extended characters in it.

> Do not worry about first number, is the number of suggestions. However
> position in second number differ. Seems that hunspell is not
> considering that apostrophe as a single (multibyte) char when
> counting, but as three components
>
> Looks to me an hunspell bug. I found no reference to this problem in
> hunspell sf site, but noticed that Hunspell 1.2.14 was released
> yesterday. Need to check if that has some related new.

Opened an hunspell  bug report for bad count problem

https://sourceforge.net/tracker/?func=detail&aid=3178449&group_id=143754&atid=756395

Seems I no longer see the other problem.

Cheers,

-- 
Agustin




Set bug forwarded-to-address to 'https://sourceforge.net/tracker/?func=detail&aid=3178449&group_id=143754&atid=756395'. Request was from Agustin Martin <agustin.martin <at> hispalinux.es> to control <at> debbugs.gnu.org. (Fri, 11 Feb 2011 16:53:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Sun, 01 Jan 2012 21:45:02 GMT) Full text and rfc822 format available.

Message #19 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Richard Wordingham <richard.wordingham <at> ntlworld.com>
To: 7781 <at> debbugs.gnu.org
Subject: ispell problem with hunspell and UTF-8 file (and other, related
	hunspell problems)
Date: Sun, 1 Jan 2012 21:42:30 +0000
Those who want to compile a bug fix in Hunspell for themselves can find
fixes (based on Hunspell 1.2.8 and Emacs V23) to spell check
word-separated Thai in UTF-8 from Emacs at
http://homepage.ntlworld.com/richard.wordingham/thai/hunspell-1.2.8-jrw1.1.zip
- the byte v. character count problem was just one of those met and resolved. The full list is:

On Hunspell:

Bad UTF-8 char count in pipe mode - ID: 3178449
No Encoding of Word for Suggestions in Piped Mode
(https://sourceforge.net/tracker/?func=detail&aid=3468022&group_id=143754&atid=756395)
Multidictionary guesses dictionary for suggestions
(https://sourceforge.net/tracker/?func=detail&aid=3468039&group_id=143754&atid=756395)
Hunspell 1.2.8 Groups Thai TIS-620 Chars in Lower/Upper Case Pairs
(https://bugs.launchpad.net/ubuntu/+source/hunspell/+bug/910452) (fixed
in Release 1.2.14)

On the Thai dictionary:

th_TH Affix File Inadequate for Hunspell
(https://bugs.launchpad.net/ubuntu/+source/openoffice.org-dictionaries/+bug/910447)

There is also a problem with the size of the window holding correction
in Thai (probably depending on the choice of font); the addition of
(fit-window-to-buffer) at the appropriate point in ispell.el (as in the
zip file) fixes that.

Richard.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Sat, 13 Apr 2013 23:46:02 GMT) Full text and rfc822 format available.

Message #22 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Николай Сущенко
 <sckol <at> yandex.ru>
To: 7781 <at> debbugs.gnu.org
Subject: [PATCH] Fix ispell problem with hunspell and UTF-8 file
Date: Sat, 13 Apr 2013 23:12:38 +0400
[Message part 1 (text/plain, inline)]
As soon as I can see, the hunspell team haven't fixed the bug in more 
then 2 years. Maybe for them it is not a bug but a feature.

The problem is that hunspell reports byte-position instead of 
char-position with multi-byte character input, while Emacs waits for 
char-position. With the patch attached I propose to make conversation in 
the ispell-parse-output function.

Thanks,
Nikolay Suschenko
[ispell.el.patch (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Sun, 14 Apr 2013 05:47:02 GMT) Full text and rfc822 format available.

Message #25 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Николай Сущенко <sckol <at> yandex.ru>
Cc: 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: [PATCH] Fix ispell problem with hunspell and UTF-8 file
Date: Sun, 14 Apr 2013 08:42:11 +0300
> Date: Sat, 13 Apr 2013 23:12:38 +0400
> From: Николай Сущенко <sckol <at> yandex.ru>
> 
> As soon as I can see, the hunspell team haven't fixed the bug in more 
> then 2 years. Maybe for them it is not a bug but a feature.

Hunspell bug resolution process could use some speedup.

> The problem is that hunspell reports byte-position instead of 
> char-position with multi-byte character input, while Emacs waits for 
> char-position. With the patch attached I propose to make conversation in 
> the ispell-parse-output function.

Sorry, no.  I tried that initially, but this work-around has problems
(don't remember the details, though).

It is much better to rebuild Hunspell with this bug fixed.  I can give
you a patch for that if you need it (I think there's a patch in the
bug database as well).  I fixed my hunspell long ago, and never looked
back.  Or ask your distribution's maintainers to release a fixed
hunspell distro.

Thanks.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Sun, 14 Apr 2013 06:39:01 GMT) Full text and rfc822 format available.

Message #28 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Николай Сущенко
 <sckol <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: [PATCH] Fix ispell problem with hunspell and UTF-8 file
Date: Sun, 14 Apr 2013 10:33:39 +0400
Hi, Eli

Please send me this patch, I'll ask the hunspell developers to include it.
Could you also recall which concrete problems produces this workaround? 
For me it works fine, but I haven't tested it in different languages and 
encodings. If it is some problems, I could try to fix it, but as for 
now, Emacs don't work with hunspell+utf-8 at all, at the minimum in 
Slackware and Arch.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Sun, 14 Apr 2013 07:13:02 GMT) Full text and rfc822 format available.

Message #31 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Николай Сущенко <sckol <at> yandex.ru>
Cc: 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: [PATCH] Fix ispell problem with hunspell and UTF-8 file
Date: Sun, 14 Apr 2013 10:08:42 +0300
> Date: Sun, 14 Apr 2013 10:33:39 +0400
> From: Николай Сущенко
>  <sckol <at> yandex.ru>
> CC: 7781 <at> debbugs.gnu.org
> 
> Please send me this patch, I'll ask the hunspell developers to include it.

Attached.  This is a small part of a much larger patch, most of it for
Windows-specific problems.  If you have problems compiling the patched
hunspell, let me know: it could be that I omitted some hunk that is
needed for this part.

> Could you also recall which concrete problems produces this workaround? 
> For me it works fine, but I haven't tested it in different languages and 
> encodings.

One problem is that you assume the encoding of the communications with
hunspell is UTF-8, and thus matches the internal representation of
text in Emacs buffers and strings (only then will byte-to-position
give correct results).  But that assumption is false: hunspell
supports any encoding that it can convert to/from UTF-8 (it uses
libiconv internally).  The "usual" choice of the encoding is the one
used by the dictionary.  Not every dictionary out there is in UTF-8.

> If it is some problems, I could try to fix it

I don't think you can fix this on the Emacs side, because Emacs cannot
easily and/or quickly convert between bytes and characters in an
arbitrary multibyte encoding.

When I discovered this problem, I also tried fixing it on the Emacs
side first, but then I realized that this kind of solution has too
many problems, and instead fixed it in hunspell.

--- src/tools/hunspell.cxx~0	2011-01-21 19:01:29.000000000 +0200
+++ src/tools/hunspell.cxx	2013-02-07 10:11:54.443610900 +0200
@@ -710,13 +748,22 @@ if (pos >= 0) {
 			fflush(stdout);
 		} else {
 			char ** wlst = NULL;
-			int ns = pMS[d]->suggest(&wlst, token);
+			int byte_offset = parser->get_tokenpos() + pos;
+			int char_offset = 0;
+			if (strcmp(io_enc, "UTF-8") == 0) {
+				for (int i = 0; i < byte_offset; i++) {
+					if ((buf[i] & 0xc0) != 0x80)
+						char_offset++;
+				}
+			} else {
+				char_offset = byte_offset;
+			}
+			int ns = pMS[d]->suggest(&wlst, chenc(token, io_enc, dic_enc[d]));
 			if (ns == 0) {
-		    		fprintf(stdout,"# %s %d", token,
-		    		    parser->get_tokenpos() + pos);
+		    		fprintf(stdout,"# %s %d", token, char_offset);
 			} else {
 				fprintf(stdout,"& %s %d %d: ", token, ns,
-				    parser->get_tokenpos() + pos);
+					char_offset);
 				fprintf(stdout,"%s", chenc(wlst[0], dic_enc[d], io_enc));
 			}
 			for (int j = 1; j < ns; j++) {
@@ -745,13 +792,23 @@ if (pos >= 0) {
 			if (root) free(root);
 		} else {
 			char ** wlst = NULL;
+			int byte_offset = parser->get_tokenpos() + pos;
+			int char_offset = 0;
+			if (strcmp(io_enc, "UTF-8") == 0) {
+				for (int i = 0; i < byte_offset; i++) {
+					if ((buf[i] & 0xc0) != 0x80)
+						char_offset++;
+				}
+			} else {
+				char_offset = byte_offset;
+			}
 			int ns = pMS[d]->suggest(&wlst, chenc(token, io_enc, dic_enc[d]));
 			if (ns == 0) {
 		    		fprintf(stdout,"# %s %d", chenc(token, io_enc, ui_enc),
-		    		    parser->get_tokenpos() + pos);
+		    		    char_offset);
 			} else {
 				fprintf(stdout,"& %s %d %d: ", chenc(token, io_enc, ui_enc), ns,
-				    parser->get_tokenpos() + pos);
+				    char_offset);
 				fprintf(stdout,"%s", chenc(wlst[0], dic_enc[d], ui_enc));
 			}
 			for (int j = 1; j < ns; j++) {





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Sat, 20 Apr 2013 18:49:01 GMT) Full text and rfc822 format available.

Message #34 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Николай Сущенко
 <sckol <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: [PATCH] Fix ispell problem with hunspell and UTF-8 file
Date: Sat, 20 Apr 2013 22:43:19 +0400
Thank you, for me this patch worked well. However, somebody have already 
proposed another patch:
https://sourceforge.net/tracker/?func=detail&aid=3610147&group_id=143754&atid=756397




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Sun, 27 Apr 2014 21:31:02 GMT) Full text and rfc822 format available.

Message #37 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Peter Münster <pmlists <at> free.fr>
To: 7781 <at> debbugs.gnu.org
Subject: hunspell and latex-mode
Date: Sun, 27 Apr 2014 23:30:25 +0200
Hi,

I'm using a patched hunspell
(http://sourceforge.net/p/hunspell/patches/57/) and it works well with
text-mode and message-mode. But unfortunately it does not work with
context-mode or latex-mode.

Example:

--8<---------------cut here---------------start------------->8---
\documentclass{article}
\begin{document}
bla
\end{document}
--8<---------------cut here---------------end--------------->8---

Running ispell fails with this error:

ispell-process-line: Ispell misalignment: word `bla' point 41; probably incompatible versions

Do you know a solution?

I'm using bzr emacs and git auctex.

TIA for any help,
-- 
           Peter




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Mon, 28 Apr 2014 15:38:02 GMT) Full text and rfc822 format available.

Message #40 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Peter Münster <pmlists <at> free.fr>
Cc: 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: hunspell and latex-mode
Date: Mon, 28 Apr 2014 18:37:01 +0300
> From: Peter Münster <pmlists <at> free.fr>
> Date: Sun, 27 Apr 2014 23:30:25 +0200
> 
> I'm using a patched hunspell
> (http://sourceforge.net/p/hunspell/patches/57/) and it works well with
> text-mode and message-mode. But unfortunately it does not work with
> context-mode or latex-mode.
> 
> Example:
> 
> --8<---------------cut here---------------start------------->8---
> \documentclass{article}
> \begin{document}
> bla
> \end{document}
> --8<---------------cut here---------------end--------------->8---
> 
> Running ispell fails with this error:
> 
> ispell-process-line: Ispell misalignment: word `bla' point 41; probably incompatible versions

I cannot reproduce this.  If I start "emacs -Q" and try spell-checking
your example (with Hunspell being the speller), it works just fine for
me: I get suggestions to replace "bla".  Same thing if I load AUCTeX
into "emacs -Q" (does AUCTeX even change anything about
spell-checking?).

Does this work for you in "emacs -Q"?  If so, I suggest to review your
customizations to look for those which somehow cause this.

If "emacs -Q" doesn't work either, please provide a detailed
reproduction recipe starting from "emacs -Q".




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Mon, 28 Apr 2014 16:19:02 GMT) Full text and rfc822 format available.

Message #43 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Peter Münster <pmlists <at> free.fr>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: hunspell and latex-mode
Date: Mon, 28 Apr 2014 18:18:10 +0200
On Mon, Apr 28 2014, Eli Zaretskii wrote:

> I cannot reproduce this.  If I start "emacs -Q" and try spell-checking
> your example (with Hunspell being the speller), it works just fine for
> me: I get suggestions to replace "bla".  Same thing if I load AUCTeX
> into "emacs -Q" (does AUCTeX even change anything about
> spell-checking?).

Hi Eli,

It's not AUCTeX, I've just tested with normal latex-mode.


> If "emacs -Q" doesn't work either, please provide a detailed
> reproduction recipe starting from "emacs -Q".

Here a reproduction recipe:

- create minimal latex file /tmp/test.tex
- start emacs:
  LANG=C emacs -Q --eval '(setq ispell-program-name "hunspell")' /tmp/test.tex
- M-x ispell

Here are more details about my system:

In GNU Emacs 24.4.50.2 (x86_64-suse-linux-gnu, GTK+ Version 3.10.4)
 of 2014-04-20 on micropit
Repository revision: 116996 dancol <at> dancol.org-20140420144613-8e4t4swlxauwl4w7
Windowing system distributor `The X.Org Foundation', version 11.0.11403901
System Description:	openSUSE 13.1 (Bottle) (x86_64)

Configured using:
 `configure --without-toolkit-scroll-bars'

Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GCONF GSETTINGS
NOTIFY LIBSELINUX LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB

Important settings:
  value of $LANG: C
  value of $XMODIFIERS: @im=ibus
  locale-coding-system: nil

Major mode: LaTeX

Minor modes in effect:
  shell-dirtrack-mode: t
  tooltip-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:
M-x i s p <tab> <return> M-x r e p o r t - e m <tab> 
<return>

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
Starting new Ispell process hunspell with default dictionary...
Spell-checking test.tex using hunspell with default dictionary...done
ispell-process-line: Ispell misalignment: word `bla' point 41; probably incompatible versions

Load-path shadows:
None found.

Features:
(shadow sort gnus-util mail-extr emacsbug message dired format-spec
rfc822 mml easymenu mml-sec mm-decode mm-bodies mm-encode mail-parse
rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045
ietf-drums mm-util help-fns mail-prsvr mail-utils ispell tex-mode
compile shell pcomplete comint ansi-color ring latexenc time-date
tooltip electric uniquify ediff-hook vc-hooks lisp-float-type mwheel
x-win x-dnd tool-bar dnd fontset image regexp-opt fringe tabulated-list
newcomment lisp-mode prog-mode register page menu-bar rfn-eshadow timer
select scroll-bar mouse jit-lock font-lock syntax facemenu font-core
frame cham georgian utf-8-lang misc-lang vietnamese tibetan thai
tai-viet lao korean japanese hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese case-table epa-hook jka-cmpr-hook help
simple abbrev minibuffer nadvice loaddefs button faces cus-face macroexp
files text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote make-network-process
dbusbind gfilenotify dynamic-setting system-font-setting
font-render-setting move-toolbar gtk x-toolkit x multi-tty emacs)

Memory information:
((conses 16 87695 6922)
 (symbols 48 19137 0)
 (miscs 40 44 125)
 (strings 32 14709 4542)
 (string-bytes 1 418678)
 (vectors 16 10601)
 (vector-slots 8 389507 5806)
 (floats 8 67 64)
 (intervals 56 250 165)
 (buffers 960 13)
 (heap 1024 42866 735))

-- 
           Peter




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Mon, 28 Apr 2014 16:49:02 GMT) Full text and rfc822 format available.

Message #46 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Peter Münster <pmlists <at> free.fr>
Cc: 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: hunspell and latex-mode
Date: Mon, 28 Apr 2014 19:48:00 +0300
> From: Peter Münster <pmlists <at> free.fr>
> Cc: 7781 <at> debbugs.gnu.org
> Date: Mon, 28 Apr 2014 18:18:10 +0200
> 
> - create minimal latex file /tmp/test.tex
> - start emacs:
>   LANG=C emacs -Q --eval '(setq ispell-program-name "hunspell")' /tmp/test.tex
> - M-x ispell

Works fine for me, sorry.

Maybe your Hunspell is not patched enough.  Mine has much more patches
than the one you mentioned.  Most of them are Windows-specific or
related to encoding/decoding non-ASCII characters, something that
doesn't sound relevant for your use case.  But who knows? you might
take a look at the file DIFFS in this archive, where you will find all
the changes I made to Hunspell:

  http://sourceforge.net/projects/ezwinports/files/hunspell-1.3.2-3-w32-src.zip/download

Or maybe wait for someone on Unix to try reproducing your recipe.

One other idea is to try spell-checking your sample file outside of
Emacs, maybe you will see something that will give some ideas.

Finally, are you sure the 'hunspell' executable Emacs finds on PATH is
indeed the one you intend?  (Try putting a full absolute file name
into ispell-program-name.)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Mon, 28 Apr 2014 17:18:02 GMT) Full text and rfc822 format available.

Message #49 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Peter Münster <pmlists <at> free.fr>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: hunspell and latex-mode
Date: Mon, 28 Apr 2014 19:17:36 +0200
On Mon, Apr 28 2014, Eli Zaretskii wrote:

> Maybe your Hunspell is not patched enough.

Perhaps.


> Mine has much more patches than the one you mentioned. Most of them
> are Windows-specific or related to encoding/decoding non-ASCII
> characters, something that doesn't sound relevant for your use case.
> But who knows? you might take a look at the file DIFFS in this
> archive, where you will find all the changes I made to Hunspell:
>
>   http://sourceforge.net/projects/ezwinports/files/hunspell-1.3.2-3-w32-src.zip/download

Indeed. I'll take a look when I have some more time.


> Or maybe wait for someone on Unix to try reproducing your recipe.

Yes, let's see.


> One other idea is to try spell-checking your sample file outside of
> Emacs, maybe you will see something that will give some ideas.

No. Here is the result:

--8<---------------cut here---------------start------------->8---
$ hunspell -a -d en_US -i UTF-8 /tmp/test.tex
@(#) International Ispell Version 3.2.06 (but really Hunspell 1.3.2)
& documentclass 8 1: document class, document-class, documentations, documentation, documents, documentary, underclassmen, underclassman
*

*
*

& bla 15 0: alb, bl, la, blat, bola, blag, blah, blab, lab, baa, bra, boa, Ila, Ala, Ola

*
*
--8<---------------cut here---------------end--------------->8---


> Finally, are you sure the 'hunspell' executable Emacs finds on PATH is
> indeed the one you intend?

Yes. And after switching to "M-x text-mode", there is no more problem.

-- 
           Peter




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Mon, 28 Apr 2014 17:33:01 GMT) Full text and rfc822 format available.

Message #52 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Peter Münster <pmlists <at> free.fr>
Cc: 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: hunspell and latex-mode
Date: Mon, 28 Apr 2014 20:32:11 +0300
> From: Peter Münster <pmlists <at> free.fr>
> Cc: 7781 <at> debbugs.gnu.org
> Date: Mon, 28 Apr 2014 19:17:36 +0200
> 
> after switching to "M-x text-mode", there is no more problem.

Maybe you should activate the debugging code in ispell.el and see what
is being submitted to hunspell and what it returns.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Mon, 28 Apr 2014 18:28:01 GMT) Full text and rfc822 format available.

Message #55 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Peter Münster <pmlists <at> free.fr>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: hunspell and latex-mode
Date: Mon, 28 Apr 2014 20:27:31 +0200
[Message part 1 (text/plain, inline)]
On Mon, Apr 28 2014, Eli Zaretskii wrote:

>> after switching to "M-x text-mode", there is no more problem.
>
> Maybe you should activate the debugging code in ispell.el and see what
> is being submitted to hunspell and what it returns.

Please find attached 2 debug-outputs, one with latex-mode and one with
text-mode. Both are created with `ispell-buffer-with-debug'.

Do you see, what is going on there?

-- 
           Peter
[ispell-debug-latex.txt (text/plain, attachment)]
[ispell-debug-text.txt (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Tue, 29 Apr 2014 10:04:02 GMT) Full text and rfc822 format available.

Message #58 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Agustin Martin <agustin.martin <at> hispalinux.es>
To: Peter Münster <pmlists <at> free.fr>, 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: hunspell and latex-mode
Date: Tue, 29 Apr 2014 12:03:25 +0200
On Mon, Apr 28, 2014 at 06:18:10PM +0200, Peter Münster wrote:
> On Mon, Apr 28 2014, Eli Zaretskii wrote:
> 
> > I cannot reproduce this.  If I start "emacs -Q" and try spell-checking
> > your example (with Hunspell being the speller), it works just fine for
> > me: I get suggestions to replace "bla".  Same thing if I load AUCTeX
> > into "emacs -Q" (does AUCTeX even change anything about
> > spell-checking?).
> 
> Hi Eli,
> 
> It's not AUCTeX, I've just tested with normal latex-mode.
> 
> 
> > If "emacs -Q" doesn't work either, please provide a detailed
> > reproduction recipe starting from "emacs -Q".
> 
> Here a reproduction recipe:
> 
> - create minimal latex file /tmp/test.tex
> - start emacs:
>   LANG=C emacs -Q --eval '(setq ispell-program-name "hunspell")' /tmp/test.tex
> - M-x ispell
> 
> Here are more details about my system:
> 
> In GNU Emacs 24.4.50.2 (x86_64-suse-linux-gnu, GTK+ Version 3.10.4)
>  of 2014-04-20 on micropit

Cannot reproduce it here with emacs-snapshot 24.3.50.1 in Debian. What does
'ps -aux' show for hunspell call when run in xterm? 

-- 
Agustin




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Tue, 29 Apr 2014 10:14:01 GMT) Full text and rfc822 format available.

Message #61 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Peter Münster <pmlists <at> free.fr>
To: Agustin Martin <agustin.martin <at> hispalinux.es>
Cc: 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: hunspell and latex-mode
Date: Tue, 29 Apr 2014 12:13:04 +0200
On Tue, Apr 29 2014, Agustin Martin wrote:

> Cannot reproduce it here with emacs-snapshot 24.3.50.1 in Debian. What does
> 'ps -aux' show for hunspell call when run in xterm? 

hunspell -a -d en_US -i UTF-8

-- 
           Peter




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Tue, 29 Apr 2014 10:21:01 GMT) Full text and rfc822 format available.

Message #64 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Peter Münster <pmlists <at> free.fr>
To: Agustin Martin <agustin.martin <at> hispalinux.es>
Cc: 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: hunspell and latex-mode
Date: Tue, 29 Apr 2014 12:20:50 +0200
On Tue, Apr 29 2014, Agustin Martin wrote:

> Cannot reproduce it here with emacs-snapshot 24.3.50.1 in Debian.

Could you please send the ispell-debug buffer, created with
`ispell-buffer-with-debug'? Then we could compare it with mine. There
are perhaps differences.

-- 
           Peter




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Tue, 29 Apr 2014 10:23:02 GMT) Full text and rfc822 format available.

Message #67 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Agustin Martin <agustin.martin <at> hispalinux.es>
To: Peter Münster <pmlists <at> free.fr>, 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: hunspell and latex-mode
Date: Tue, 29 Apr 2014 12:21:57 +0200
On Tue, Apr 29, 2014 at 12:13:04PM +0200, Peter Münster wrote:
> On Tue, Apr 29 2014, Agustin Martin wrote:
> 
> > Cannot reproduce it here with emacs-snapshot 24.3.50.1 in Debian. What does
> > 'ps -aux' show for hunspell call when run in xterm? 
> 
> hunspell -a -d en_US -i UTF-8

That is what is expected. I am clueless about this.

-- 
Agustin




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Tue, 29 Apr 2014 10:41:02 GMT) Full text and rfc822 format available.

Message #70 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Agustin Martin <agustin.martin <at> hispalinux.es>
To: Peter Münster <pmlists <at> free.fr>, 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: hunspell and latex-mode
Date: Tue, 29 Apr 2014 12:39:49 +0200
[Message part 1 (text/plain, inline)]
On Tue, Apr 29, 2014 at 12:20:50PM +0200, Peter Münster wrote:
> On Tue, Apr 29 2014, Agustin Martin wrote:
> 
> > Cannot reproduce it here with emacs-snapshot 24.3.50.1 in Debian.
> 
> Could you please send the ispell-debug buffer, created with
> `ispell-buffer-with-debug'? Then we could compare it with mine. There
> are perhaps differences.

Please find it attached. Apart from the misalignment problem the only
difference seems to be that I have lots of dicts installed and the
~/.openoffice.org/ path.

-- 
Agustin
[ispell-debug-buffer-amd-7781.txt (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Tue, 29 Apr 2014 11:56:01 GMT) Full text and rfc822 format available.

Message #73 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Peter Münster <pmlists <at> free.fr>
To: Agustin Martin <agustin.martin <at> hispalinux.es>
Cc: 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: hunspell and latex-mode
Date: Tue, 29 Apr 2014 13:54:54 +0200
On Tue, Apr 29 2014, Agustin Martin wrote:

> Please find it attached. Apart from the misalignment problem the only
> difference seems to be that I have lots of dicts installed and the
> ~/.openoffice.org/ path.

There is probably not enough information in the debug buffer.

Could you please try this:

mv /usr/bin/hunspell /usr/bin/hunspell-orig

And create the file /usr/bin/hunspell with the following content:

--8<---------------cut here---------------start------------->8---
#!/bin/bash
tee /tmp/hunspell-input | hunspell-orig "$@" | tee /tmp/hunspell-output
--8<---------------cut here---------------end--------------->8---

This is what I get:

input:
--8<---------------cut here---------------start------------->8---
!
+
^bla
--8<---------------cut here---------------end--------------->8---

output:
--8<---------------cut here---------------start------------->8---
@(#) International Ispell Version 3.2.06 (but really Hunspell 1.3.2)
& bla 15 0: alb, bl, la, blat, bola, blag, blah, blab, lab, baa, bra, boa, Ila, Ala, Ola
--8<---------------cut here---------------end--------------->8---

I guess, that you get "bla 15 1", because of the "^" before the "bla".

That would mean, that my hunspell would need another patch. Which one
please?

Thanks for your efforts,
-- 
           Peter




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Tue, 29 Apr 2014 12:49:01 GMT) Full text and rfc822 format available.

Message #76 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Peter Münster <pmlists <at> free.fr>
To: Agustin Martin <agustin.martin <at> hispalinux.es>
Cc: 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: hunspell and latex-mode
Date: Tue, 29 Apr 2014 14:48:43 +0200
I've just tried unpatched hunspell: no problem with TeX-mode.
It's the patch on sf.net that breaks the TeX-mode, the character
position is always 0:
https://sourceforge.net/p/hunspell/patches/57/#d425

I'll build hunspell with Eli's patch now.

Sorry for the noise...

-- 
           Peter




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Tue, 29 Apr 2014 13:58:01 GMT) Full text and rfc822 format available.

Message #79 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Peter Münster <pmlists <at> free.fr>
Cc: agustin.martin <at> hispalinux.es, 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: hunspell and latex-mode
Date: Tue, 29 Apr 2014 16:57:36 +0300
> From: Peter Münster <pmlists <at> free.fr>
> Date: Tue, 29 Apr 2014 14:48:43 +0200
> Cc: 7781 <at> debbugs.gnu.org
> 
> I've just tried unpatched hunspell: no problem with TeX-mode.
> It's the patch on sf.net that breaks the TeX-mode, the character
> position is always 0:
> https://sourceforge.net/p/hunspell/patches/57/#d425

That's what I thought.  If I invoke Hunspell like ispell.el does for a
LaTeX buffer, i.e.

  hunspell -a -d en_US -i UTF-8

and then type "^bla RET" into Hunspell, I get this as output:

  & bla 15 1: alb, bl, la, bola, blah, blab, lab, baa, ala, bra, boa, Ila, Ala, Ola, Ula

As you see, I get "15 1".  If you get 0 instead of 1, then that's the
cause of the problem, because the part of your debug output marked
below:

  ispell-process-line: Ispell misalignment error:
    [Word from ispell pipe]: [bla], actual (point,line,column): (41,2,16)
                                                                 ^^^^^^^
clearly shows that ispell.el is confused about where the word "bla"
begins in the buffer; the correct data is 42,3,0.  Also note that just
before reading Hunspell's output, ispell.el correctly identified both
the word and its location:

  ispell-region: string pos (42->45), eol: 45, [in-comment]: [nil], [add-comment]: [nil], [string]: [^bla
  ]

> I'll build hunspell with Eli's patch now.

I think that will solve the problem.

(I have no idea why visiting the same file in Text mode avoids the
problem.  The only difference is that in Text mode, ispell.el does not
skip the first 2 lines, but instead submits them to Hunspell.  Why
this makes the difference, I don't know, but probably the lone "^bla"
somehow triggers the bug in the patch you installed, whatever that bug
is.)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Tue, 29 Apr 2014 14:31:03 GMT) Full text and rfc822 format available.

Message #82 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Peter Münster <pmlists <at> free.fr>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: agustin.martin <at> hispalinux.es, 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: hunspell and latex-mode
Date: Tue, 29 Apr 2014 16:30:07 +0200
On Tue, Apr 29 2014, Eli Zaretskii wrote:

> (I have no idea why visiting the same file in Text mode avoids the
> problem.  The only difference is that in Text mode, ispell.el does not
> skip the first 2 lines, but instead submits them to Hunspell.

No. In latex-mode, emacs switches hunspell into TeX-mode with the "+".


> Why this makes the difference, I don't know, but probably the lone
> "^bla" somehow triggers the bug in the patch you installed, whatever
> that bug is.)

No. In normal mode, the "^bla" works fine. The patch on sf.net just
breaks the TeX-mode: every position becomes 0.

Your patch works nicely, thanks!

I should have tested hunspell on the command line, before reporting the
problem. Now I know, how to do that.

-- 
           Peter




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Tue, 29 Apr 2014 15:26:01 GMT) Full text and rfc822 format available.

Message #85 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Peter Münster <pmlists <at> free.fr>
Cc: agustin.martin <at> hispalinux.es, 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: hunspell and latex-mode
Date: Tue, 29 Apr 2014 18:25:31 +0300
> From: Peter Münster <pmlists <at> free.fr>
> Cc: agustin.martin <at> hispalinux.es,  7781 <at> debbugs.gnu.org
> Date: Tue, 29 Apr 2014 16:30:07 +0200
> 
> On Tue, Apr 29 2014, Eli Zaretskii wrote:
> 
> > (I have no idea why visiting the same file in Text mode avoids the
> > problem.  The only difference is that in Text mode, ispell.el does not
> > skip the first 2 lines, but instead submits them to Hunspell.
> 
> No. In latex-mode, emacs switches hunspell into TeX-mode with the "+".

It does both, evidently.  Compare this part of your debug output (in
LaTeX buffer):

  ispell-region: First skip: \documentclass at (pos,line,column): (1,1,0).
  ispell-region: Continue spell-checking with hunspell and default dictionary...
  ispell-region: string pos (41->41), eol: 45, [in-comment]: [nil], [add-comment]: [nil], [string]: [nil]
  ispell-region: string pos (42->45), eol: 45, [in-comment]: [nil], [add-comment]: [nil], [string]: [^bla
  ]

with this (in Text buffer):

  ispell-region: string pos (1->24), eol: 24, [in-comment]: [nil], [add-comment]: [nil], [string]: [^\documentclass{article}
  ]
  ispell-region: string pos (24->24), eol: 41, [in-comment]: [nil], [add-comment]: [nil], [string]: [nil]
  ispell-region: string pos (25->41), eol: 41, [in-comment]: [nil], [add-comment]: [nil], [string]: [^\begin{document}
  ]
  ispell-region: string pos (41->41), eol: 45, [in-comment]: [nil], [add-comment]: [nil], [string]: [nil]
  ispell-region: string pos (42->45), eol: 45, [in-comment]: [nil], [add-comment]: [nil], [string]: [^bla
  ]
  ispell-region: string pos (45->45), eol: 60, [in-comment]: [nil], [add-comment]: [nil], [string]: [nil]
  ispell-region: string pos (46->60), eol: 60, [in-comment]: [nil], [add-comment]: [nil], [string]: [^\end{document}
  ]

As you see, in the second case, the TeX directives are also sent to
Hunspell for checking, while in the first case they are not.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Tue, 29 Apr 2014 16:35:01 GMT) Full text and rfc822 format available.

Message #88 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Peter Münster <pmlists <at> free.fr>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: agustin.martin <at> hispalinux.es, 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: hunspell and latex-mode
Date: Tue, 29 Apr 2014 18:34:07 +0200
On Tue, Apr 29 2014, Eli Zaretskii wrote:

>> > The only difference is that in Text mode, ispell.el does not skip
>> > the first 2 lines, but instead submits them to Hunspell.
>> 
>> No. In latex-mode, emacs switches hunspell into TeX-mode with the "+".
>
> It does both, evidently.  Compare this part of your debug output (in
> LaTeX buffer):

Sorry. I just wanted to say: "No, it's not the *only* difference." ... ;)

-- 
           Peter




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Thu, 25 Sep 2014 09:55:02 GMT) Full text and rfc822 format available.

Message #91 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Reuben Thomas <rrt <at> sc3d.org>
To: 7781 <at> debbugs.gnu.org
Subject: Bug still present in hunspell 1.3.3; Eli's patch still works
Date: Thu, 25 Sep 2014 10:54:05 +0100
[Message part 1 (text/plain, inline)]
I have sent a message to the upstream maintainer informing him of the
situation and asking for the patch to be included in the next release.

-- 
http://rrt.sc3d.org
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Thu, 16 Oct 2014 13:38:01 GMT) Full text and rfc822 format available.

Message #94 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Agustin Martin <agustin6martin <at> gmail.com>
To: Reuben Thomas <rrt <at> sc3d.org>, 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: 23.2.91; ispell problem with hunspell and UTF-8 file
Date: Thu, 16 Oct 2014 15:37:24 +0200
Control: tag 7781 + upstream fixed-upstream

On Fri, Feb 11, 2011 at 06:00:53PM +0100, Agustin Martin wrote:
> forwarded 7781 https://sourceforge.net/tracker/?func=detail&aid=3178449&group_id=143754&atid=756395
> thanks
> 
> 2011/1/7 Agustin Martin <agustin.martin <at> hispalinux.es>:
> > 2011/1/4 Reuben Thomas <rrt <at> sc3d.org>:
> >> With the following text, and using emacs -Q, I get the errors you can
> >> see in the messages log below when using hunspell to spell-check a UTF-8
> >> buffer with some extended characters in it.
> 
> > Do not worry about first number, is the number of suggestions. However
> > position in second number differ. Seems that hunspell is not
> > considering that apostrophe as a single (multibyte) char when
> > counting, but as three components
> >
> > Looks to me an hunspell bug. I found no reference to this problem in
> > hunspell sf site, but noticed that Hunspell 1.2.14 was released
> > yesterday. Need to check if that has some related new.
> 
> Opened an hunspell  bug report for bad count problem
> 
> https://sourceforge.net/tracker/?func=detail&aid=3178449&group_id=143754&atid=756395

Reuben Thomas wrote:
> I have sent a message to the upstream maintainer informing him of the
> situation and asking for the patch to be included in the next release.

Proposed patch has been integrated in hunspell upstream by caolan mcnamara.

Regards,

PS: My old hispalinux.es address is failing silently and I do not if I will
ever be able to get it fixed. Please use current gmail address for replies.

-- 
Agustin




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Thu, 16 Oct 2014 13:55:01 GMT) Full text and rfc822 format available.

Message #97 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Agustin Martin <agustin6martin <at> gmail.com>
Cc: 7781 <at> debbugs.gnu.org, rrt <at> sc3d.org
Subject: Re: bug#7781: 23.2.91; ispell problem with hunspell and UTF-8 file
Date: Thu, 16 Oct 2014 16:54:16 +0300
> Date: Thu, 16 Oct 2014 15:37:24 +0200
> From: Agustin Martin <agustin6martin <at> gmail.com>
> 
> > Opened an hunspell  bug report for bad count problem
> > 
> > https://sourceforge.net/tracker/?func=detail&aid=3178449&group_id=143754&atid=756395
> 
> Reuben Thomas wrote:
> > I have sent a message to the upstream maintainer informing him of the
> > situation and asking for the patch to be included in the next release.
> 
> Proposed patch has been integrated in hunspell upstream by caolan mcnamara.

Do you mean there's now an official release of Hunspell with this bug
fixed?  If so, where can one find it?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Thu, 16 Oct 2014 14:10:02 GMT) Full text and rfc822 format available.

Message #100 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Agustin Martin <agustin6martin <at> gmail.com>
To: rrt <at> sc3d.org, 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: 23.2.91; ispell problem with hunspell and UTF-8 file
Date: Thu, 16 Oct 2014 16:08:59 +0200
On Thu, Oct 16, 2014 at 04:54:16PM +0300, Eli Zaretskii wrote:
> > Date: Thu, 16 Oct 2014 15:37:24 +0200
> > From: Agustin Martin <agustin6martin <at> gmail.com>
> > 
> > > Opened an hunspell  bug report for bad count problem
> > > 
> > > https://sourceforge.net/tracker/?func=detail&aid=3178449&group_id=143754&atid=756395
> > 
> > Reuben Thomas wrote:
> > > I have sent a message to the upstream maintainer informing him of the
> > > situation and asking for the patch to be included in the next release.
> > 
> > Proposed patch has been integrated in hunspell upstream by caolan mcnamara.
> 
> Do you mean there's now an official release of Hunspell with this bug
> fixed?  If so, where can one find it?

I am afraid it only means that fix has been pushed to upstream VCS. 

http://hunspell.cvs.sourceforge.net/viewvc/hunspell/hunspell/src/tools/hunspell.cxx?r1=1.60&r2=1.61

Another good new is that this is not the only bug just handled,

http://sourceforge.net/p/hunspell/bugs/228/
[hunspell:bugs] #228 Some problems with Emacs and init string in pipe mode

has been changed to closed-accepted and pushed to the repo (r1.62).

Regards,

-- 
Agustin




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Fri, 28 Aug 2020 12:01:01 GMT) Full text and rfc822 format available.

Message #103 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Reuben Thomas <rrt <at> sc3d.org>
Cc: 7781 <at> debbugs.gnu.org
Subject: Re: bug#7781: 23.2.91; ispell problem with hunspell and UTF-8 file
Date: Fri, 28 Aug 2020 05:00:11 -0700
Reuben Thomas <rrt <at> sc3d.org> writes:

> With the following text, and using emacs -Q, I get the errors you can
> see in the messages log below when using hunspell to spell-check a UTF-8
> buffer with some extended characters in it.
>
> I did test this with emacs -Q, but the current session, in which I
> reproduced the problem and am now composing this bug report, was not
> started with -Q (this is so submitting the bug report works properly!).
>
> I am running a freshly bzr-pulled build of the emacs-23 branch.
>
> Text follows

I tried this but couldn't reproduce the bug using current master and
Hunspell 1.7.0.  Having read the bug report, IIUC, this was a bug in
Hunspell and not in Emacs?

Are you still able to reproduce this using a recent Emacs and Hunspell?

If I don't hear back from you within a couple of weeks, I'll just
close this bug as unreproducible.

Best regards,
Stefan Kangas




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Fri, 28 Aug 2020 12:37:02 GMT) Full text and rfc822 format available.

Message #106 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Kangas <stefan <at> marxist.se>
Cc: 7781 <at> debbugs.gnu.org, rrt <at> sc3d.org
Subject: Re: bug#7781: 23.2.91; ispell problem with hunspell and UTF-8 file
Date: Fri, 28 Aug 2020 15:36:01 +0300
> From: Stefan Kangas <stefan <at> marxist.se>
> Date: Fri, 28 Aug 2020 05:00:11 -0700
> Cc: 7781 <at> debbugs.gnu.org
> 
> Reuben Thomas <rrt <at> sc3d.org> writes:
> 
> > With the following text, and using emacs -Q, I get the errors you can
> > see in the messages log below when using hunspell to spell-check a UTF-8
> > buffer with some extended characters in it.
> >
> > I did test this with emacs -Q, but the current session, in which I
> > reproduced the problem and am now composing this bug report, was not
> > started with -Q (this is so submitting the bug report works properly!).
> >
> > I am running a freshly bzr-pulled build of the emacs-23 branch.
> >
> > Text follows
> 
> I tried this but couldn't reproduce the bug using current master and
> Hunspell 1.7.0.  Having read the bug report, IIUC, this was a bug in
> Hunspell and not in Emacs?
> 
> Are you still able to reproduce this using a recent Emacs and Hunspell?

Some (old) versions of Hunspell had a bug, whereby the mis-spelled
words were reported with offsets in bytes, not in characters.  When
this happens, ispell.el reports "misalignment" errors.

I don't remember when (or even if) Hunspell fixed that problem (in the
version I use I fixed it myself), but if 1.7.0 has that problem fixed,
you will not see the problem.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#7781; Package emacs. (Fri, 28 Aug 2020 12:57:01 GMT) Full text and rfc822 format available.

Message #109 received at 7781 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 7781 <at> debbugs.gnu.org, rrt <at> sc3d.org
Subject: Re: bug#7781: 23.2.91; ispell problem with hunspell and UTF-8 file
Date: Fri, 28 Aug 2020 05:56:09 -0700
tags 7781 + notabug
close 7781
thanks

Eli Zaretskii <eliz <at> gnu.org> writes:

> Some (old) versions of Hunspell had a bug, whereby the mis-spelled
> words were reported with offsets in bytes, not in characters.  When
> this happens, ispell.el reports "misalignment" errors.
>
> I don't remember when (or even if) Hunspell fixed that problem (in the
> version I use I fixed it myself), but if 1.7.0 has that problem fixed,
> you will not see the problem.

Thanks, so this is not a bug in Emacs.  I'm therefore closing this bug report.

If this conclusion is incorrect, please reopen the bug report.

Best regards,
Stefan Kangas




Added tag(s) notabug. Request was from Stefan Kangas <stefan <at> marxist.se> to control <at> debbugs.gnu.org. (Fri, 28 Aug 2020 12:57:01 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 7781 <at> debbugs.gnu.org and Reuben Thomas <rrt <at> sc3d.org> Request was from Stefan Kangas <stefan <at> marxist.se> to control <at> debbugs.gnu.org. (Fri, 28 Aug 2020 12:57:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 26 Sep 2020 11:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 203 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.