GNU bug report logs - #17437
24.3; ispell uses typographically correct apostrophe as word boundary

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: emacs; Reported by: "Tobias Getzner" <tobias.getzner@HIDDEN>; dated Thu, 8 May 2014 16:04:01 UTC; Maintainer for emacs is bug-gnu-emacs@HIDDEN.
Removed tag(s) moreinfo. Request was from Lars Ingebrigtsen <larsi@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 17437 <at> debbugs.gnu.org:


Received: (at 17437) by debbugs.gnu.org; 22 Jul 2014 09:42:40 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Jul 22 05:42:40 2014
Received: from localhost ([127.0.0.1]:34306 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1X9Wah-0006SN-NM
	for submit <at> debbugs.gnu.org; Tue, 22 Jul 2014 05:42:40 -0400
Received: from mout.gmx.net ([212.227.17.20]:59965)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <tobias.getzner@HIDDEN>) id 1X9Wab-0006Rr-TY
 for 17437 <at> debbugs.gnu.org; Tue, 22 Jul 2014 05:42:34 -0400
Received: from glenalbyn.linguistics.ruhr-uni-bochum.de ([134.147.14.84]) by
 mail.gmx.com (mrgmx101) with ESMTPSA (Nemesis) id 0MMXVC-1XA4Xr2AjW-008HXi
 for <17437 <at> debbugs.gnu.org>; Tue, 22 Jul 2014 11:42:21 +0200
Message-ID: <1406022141.16949.2.camel@HIDDEN>
Subject: Re: bug#17437: 24.3; ispell uses typographically correct apostrophe
From: Tobias Getzner <tobias.getzner@HIDDEN>
To: 17437 <at> debbugs.gnu.org
Date: Tue, 22 Jul 2014 11:42:21 +0200
Content-Type: text/plain; charset="UTF-8"
X-Mailer: Evolution 3.12.4 
Mime-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-Provags-ID: V03:K0:c72iR0MjYa4w/AXBL1grpF3wMQz1fJ4eQ16OXHWyadSY5+8T4B+
 +zzX8wnrW2BTFYt691HXyhdA07VO+ToshpD31tJmOxyGo2LuaNHjcTpBzVj0EBJrsTzDi7o
 GGFMLJHzlG50nz/XwvmziJRhU/oStlMdiW3otmFoeFAgvU242Uh6Cd2ARWrrgTWAf+ev7/H
 /sfhirQ1B1R8j0XL9k63g==
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 17437
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.0 (/)

> More details are needed here, e.g., what language is the reporter using?

I suppose you are referring to the selected ispell dictionary? I am not
explicitly setting the ispell dictionary, so (I presume) ispell.el will
not pass a dictionary to hunspell, which will accordingly use the
default one for my locale, i. e., en_US. While some dictionaries have
issues with U+2019 (and in fact most still are encoded in latin-1 :-/),
I have added this character to WORDCHARS in my hunspell en_US
dictionary; hunspell now correctly recognize words using this character
when invoking hunspell in a terminal. Sadly, it seems ispell.el sadly
still won=E2=80=99t handle these, however.

The problem seems to be that ispell.el still thinks that U+2019 is a
word boundary and doesn=E2=80=99t pass the whole word on to the spell check=
er.
Is this likely? Looking at ispell.el, it looks like it is doing word
boundary parsing on its own=E2=80=BD If so, U+2019 should be treated as a
word-character when it appears in the context of two alphabetical
characters (at least for most western languages).

Best regards,
Tobias






Information forwarded to bug-gnu-emacs@HIDDEN:
bug#17437; Package emacs. Full text available.
Added tag(s) moreinfo. Request was from Paul Eggert <eggert@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 17437 <at> debbugs.gnu.org:


Received: (at 17437) by debbugs.gnu.org; 9 May 2014 05:07:21 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri May 09 01:07:21 2014
Received: from localhost ([127.0.0.1]:56475 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1Wid1k-0000VK-Vm
	for submit <at> debbugs.gnu.org; Fri, 09 May 2014 01:07:21 -0400
Received: from mout.gmx.net ([212.227.17.22]:62700)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <tobias.getzner@HIDDEN>) id 1Wid1h-0000Ux-OO
 for 17437 <at> debbugs.gnu.org; Fri, 09 May 2014 01:07:18 -0400
Received: from [78.48.180.208] by 3capp-gmx-bs61 with HTTP; Fri, 9 May 2014
 07:07:11 +0200
MIME-Version: 1.0
Message-ID: <trinity-1d5c6087-53f7-470c-a3fb-161b6827db78-1399612031405@3capp-gmx-bs61>
From: "Tobias Getzner" <tobias.getzner@HIDDEN>
To: "Agustin Martin" <agustin.martin@HIDDEN>
Subject: Re: bug#17437: 24.3; ispell uses typographically correct apostrophe
 as word boundary
Content-Type: text/plain; charset=UTF-8
Date: Fri, 9 May 2014 07:07:11 +0200
Importance: normal
Sensitivity: Normal
In-Reply-To: <20140508173844.GA3842@HIDDEN>
References: <trinity-69d4191f-ecd1-4780-bbdd-9d12e68b3d52-1399551317647@3capp-gmx-bs52>, 
 <20140508173844.GA3842@HIDDEN>
X-UI-Message-Type: mail
X-Priority: 3
X-Provags-ID: V03:K0:Bt4H+0+tnR/QqBwaxro93nnYJ2lScpcNqin2wEReQ6q
 101LeY6PQjF8QcZ44pHEpmAiOtGY9YqUzjMLZduP/2vSvcv2ag
 hA8/bsbtB4FzPwjVTChptv4BsMc+idNNui6Z3BQ42eosFo0ehv
 H3ItJlkhk+Oov4RngdD8jR/fQeBoyC/nd3vyu4hOwJsuneuQhu
 1MyAmMydYZw7TPQWpgJc1zCLQ5kkXdI2C1vtO51hdpWaut5DNf
 BJdba69C9pI/5NLdeEs+kmTcO2cnzvK+Sz+VpHSH9ci8Ols2tB 3VSLjM=
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 17437
Cc: 17437 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.0 (/)

Hello Agustin,

> Which language are you using? Whether the apostrophe is or not a
> wordchar depends on the language. By the way, "doesn't" is working
> well here with aspell+american.

Please note that the bug is not about the single quote apostrophe,
U+0027, but concerns the typographically correct apostrophe, U+2019.

Both hunspell and aspell support it in recent versions, but Emacs
fails to correctly hand over words containing the typographical
apostrophe.

Regards,
Tobias




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#17437; Package emacs. Full text available.

Message received at 17437 <at> debbugs.gnu.org:


Received: (at 17437) by debbugs.gnu.org; 8 May 2014 17:38:55 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu May 08 13:38:55 2014
Received: from localhost ([127.0.0.1]:56239 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WiSHW-0003gH-Nu
	for submit <at> debbugs.gnu.org; Thu, 08 May 2014 13:38:54 -0400
Received: from edison.ccupm.upm.es ([138.100.198.71]:49827)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <agustin.martin@HIDDEN>) id 1WiSHT-0003g1-Fr
 for 17437 <at> debbugs.gnu.org; Thu, 08 May 2014 13:38:52 -0400
Received: from agmartin.aq.upm.es (Agmartin.aq.upm.es [138.100.41.131])
 by smtp.upm.es (8.14.3/8.14.3/edison-001) with ESMTP id s48HciJD003791;
 Thu, 8 May 2014 19:38:44 +0200
Received: by agmartin.aq.upm.es (Postfix, from userid 1000)
 id B0AEF3FF58; Thu,  8 May 2014 19:38:44 +0200 (CEST)
Date: Thu, 8 May 2014 19:38:44 +0200
From: Agustin Martin <agustin.martin@HIDDEN>
To: Tobias Getzner <tobias.getzner@HIDDEN>, 17437 <at> debbugs.gnu.org
Subject: Re: bug#17437: 24.3; ispell uses typographically correct apostrophe
 as word boundary
Message-ID: <20140508173844.GA3842@HIDDEN>
References: <trinity-69d4191f-ecd1-4780-bbdd-9d12e68b3d52-1399551317647@3capp-gmx-bs52>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <trinity-69d4191f-ecd1-4780-bbdd-9d12e68b3d52-1399551317647@3capp-gmx-bs52>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Spam-Score: -3.0 (---)
X-Debbugs-Envelope-To: 17437
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.0 (---)

On Thu, May 08, 2014 at 02:15:17PM +0200, Tobias Getzner wrote:
> 
> When using the typographically correct apostrophe (“right single
> quotation mark” U+2019), ispell will mark-up parts of words as typos.
> E.g., in “doesn’t”, the part before the apostrophe will be highlighted
> as a typo even if the spell-checker supports the apostrophe.
> 
> This bug occurs irrespective of the spell-checker, so I suppose that
> ispell does its own tokenization and uses the apostrophe as a word
> boundary. Instead, the apostrophe should correctly be treated as
> word-internal punctuation and handed on to the actual spell-checker
> program.

Which language are you using? Whether the apostrophe is or not a wordchar
depends on the language. By the way, "doesn't" is working well here with
aspell+american.

-- 
Agustin




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#17437; Package emacs. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 8 May 2014 16:03:41 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu May 08 12:03:41 2014
Received: from localhost ([127.0.0.1]:56151 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WiQnM-0008MV-Ik
	for submit <at> debbugs.gnu.org; Thu, 08 May 2014 12:03:41 -0400
Received: from eggs.gnu.org ([208.118.235.92]:34718)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <tobias.getzner@HIDDEN>) id 1WiNEv-0001Pk-1F
 for submit <at> debbugs.gnu.org; Thu, 08 May 2014 08:15:53 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <tobias.getzner@HIDDEN>) id 1WiNEf-0002aG-LL
 for submit <at> debbugs.gnu.org; Thu, 08 May 2014 08:15:47 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM
 autolearn=disabled version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:51943)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <tobias.getzner@HIDDEN>) id 1WiNEf-0002Yf-2V
 for submit <at> debbugs.gnu.org; Thu, 08 May 2014 08:15:37 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:36400)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <tobias.getzner@HIDDEN>) id 1WiNEX-0001as-4t
 for bug-gnu-emacs@HIDDEN; Thu, 08 May 2014 08:15:36 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <tobias.getzner@HIDDEN>) id 1WiNEO-0002JU-Rh
 for bug-gnu-emacs@HIDDEN; Thu, 08 May 2014 08:15:29 -0400
Received: from mout.gmx.net ([212.227.15.19]:49292)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <tobias.getzner@HIDDEN>) id 1WiNEO-0002JB-Iv
 for bug-gnu-emacs@HIDDEN; Thu, 08 May 2014 08:15:20 -0400
Received: from [134.147.14.84] by 3capp-gmx-bs52 with HTTP; Thu, 8 May 2014
 14:15:17 +0200
MIME-Version: 1.0
Message-ID: <trinity-69d4191f-ecd1-4780-bbdd-9d12e68b3d52-1399551317647@3capp-gmx-bs52>
From: "Tobias Getzner" <tobias.getzner@HIDDEN>
To: bug-gnu-emacs@HIDDEN
Subject: 24.3; ispell uses typographically correct apostrophe as word boundary
Content-Type: text/plain; charset=UTF-8
Date: Thu, 8 May 2014 14:15:17 +0200
Importance: normal
Sensitivity: Normal
Content-Transfer-Encoding: quoted-printable
X-Priority: 3
X-Provags-ID: V03:K0:Lf669W9WYsx3CbDyM+0y6EQj2WVenAp5Krj8FzCgKeO
 4AmxyaPtK+7Zf5cyu7F6S7kJZSUlIHra1diqT8yavNKINElsIP
 BAo/d5QECHu30MAKrYHoWHkPmbpBtGxq5oLxOFi+Y7guJ2mS7T
 uulSSf2pClvrObbVFd7NrqwIaFwp7w0CpxFC7RXZDYXqN8SWje
 7ZM3Kq7VOo4sQPXqz+SMCFeM4R5s5YXlGO7Q/UHOHenKX8jPCq
 wfAH0+sWuV8H+S1kai/UpYi8bnnqx9oLDScx5AwCD69tEgO6io rTnfXc=
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic]
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
 (bad octet value).
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -4.1 (----)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Thu, 08 May 2014 12:03:38 -0400
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -4.1 (----)


When using the typographically correct apostrophe (=E2=80=9Cright single
quotation mark=E2=80=9D U+2019), ispell will mark-up parts of words as typ=
os.
E.g., in =E2=80=9Cdoesn=E2=80=99t=E2=80=9D, the part before the apostrophe=
 will be highlighted
as a typo even if the spell-checker supports the apostrophe.

This bug occurs irrespective of the spell-checker, so I suppose that
ispell does its own tokenization and uses the apostrophe as a word
boundary. Instead, the apostrophe should correctly be treated as
word-internal punctuation and handed on to the actual spell-checker
program.

Best regards,
Tobias
=C2=A0




Acknowledgement sent to "Tobias Getzner" <tobias.getzner@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs@HIDDEN. Full text available.
Report forwarded to bug-gnu-emacs@HIDDEN:
bug#17437; Package emacs. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Sat, 26 Dec 2015 15:30:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.