GNU bug report logs - #47455
27.1; bibtex mode - citation key generation - non-ascii characters

Previous Next

Package: emacs;

Reported by: Brian Elmegaard <be <at> mek.dtu.dk>

Date: Sun, 28 Mar 2021 21:45:01 UTC

Severity: normal

Found in version 27.1

Done: Roland Winkler <winkler <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 47455 in the body.
You can then email your comments to 47455 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#47455; Package emacs. (Sun, 28 Mar 2021 21:45:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Brian Elmegaard <be <at> mek.dtu.dk>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 28 Mar 2021 21:45:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Brian Elmegaard <be <at> mek.dtu.dk>
To: "bug-gnu-emacs <at> gnu.org" <bug-gnu-emacs <at> gnu.org>
Subject: 27.1; bibtex mode - citation key generation - non-ascii characters
Date: Sun, 28 Mar 2021 21:26:36 +0000
[Message part 1 (text/plain, inline)]

Using C-c C-c in a bibtex cleans the entry and generates a citation key.
If the author name includes non-ascii characters these are included in
the key, even though BibTeX does not accept this.

For example:
@Article{äöü21,
  author =          {æøå äöü},
  title =               {foo},
  journal =          {bar},
  year =               2021}



In GNU Emacs 27.1 (build 1, x86_64-w64-mingw32)
of 2020-08-21 built on CIRROCUMULUS
Repository revision: 86d8d76aa36037184db0b2897c434cdaab1a9ae8
Repository branch: HEAD
Windowing system distributor 'Microsoft Corp.', version 10.0.17134
System Description: Microsoft Windows 10 Enterprise (v10.0.1803.17134.2087)

Recent messages:
Checking 71 files in c:/Program Files (x86)/Emacs/emacs-27.1-x86_64/share/emacs/27.1/lisp/erc...
Checking 34 files in c:/Program Files (x86)/Emacs/emacs-27.1-x86_64/share/emacs/27.1/lisp/emulation...
Checking 180 files in c:/Program Files (x86)/Emacs/emacs-27.1-x86_64/share/emacs/27.1/lisp/emacs-lisp...
Checking 24 files in c:/Program Files (x86)/Emacs/emacs-27.1-x86_64/share/emacs/27.1/lisp/cedet...
Checking 59 files in c:/Program Files (x86)/Emacs/emacs-27.1-x86_64/share/emacs/27.1/lisp/calendar...
Checking 87 files in c:/Program Files (x86)/Emacs/emacs-27.1-x86_64/share/emacs/27.1/lisp/calc...
Checking 113 files in c:/Program Files (x86)/Emacs/emacs-27.1-x86_64/share/emacs/27.1/lisp/obsolete...
Checking 1 files in c:/Users/brel/AppData/Roaming/.emacs.d/lisp/dna-mode...
Checking for load-path shadows...done
Making completion list...

Configured using:
'configure --without-dbus --host=x86_64-w64-mingw32
--without-compress-install 'CFLAGS=-O2 -static''

Configured features:
XPM JPEG TIFF GIF PNG RSVG SOUND NOTIFY W32NOTIFY ACL GNUTLS LIBXML2
HARFBUZZ ZLIB TOOLKIT_SCROLL_BARS MODULES THREADS JSON PDUMPER LCMS2 GMP

Important settings:
  value of $LANG: DAN
  locale-coding-system: cp1252

Major mode: Lisp Interaction

Minor modes in effect:
  save-place-mode: t
  delete-selection-mode: t
  show-paren-mode: t
  recentf-mode: t
  global-auto-revert-mode: t
  cua-mode: t
  TeX-PDF-mode: t
  TeX-source-correlate-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(help-mode pp shadow sort mail-extr emacsbug message rmc puny dired
dired-loaddefs format-spec rfc822 mml mml-sec epa derived epg epg-config
gnus-util rmail rmail-loaddefs text-property-search time-date mm-decode
mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader
sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils
saveplace delsel paren recentf tree-widget wid-edit autorevert
filenotify cua-base cus-start cus-load server tex-mik tex crm advice
texmathp finder-inf info tex-site package easymenu browse-url
url-handlers url-parse auth-source cl-seq eieio eieio-core cl-macs
eieio-loaddefs password-cache json subr-x map url-vars seq byte-opt gv
bytecomp byte-compile cconv cl-loaddefs cl-lib tooltip eldoc electric
uniquify ediff-hook vc-hooks lisp-float-type mwheel dos-w32 ls-lisp
disp-table term/w32-win w32-win w32-vars term/common-win tool-bar dnd
fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode elisp-mode lisp-mode prog-mode register page tab-bar menu-bar
rfn-eshadow isearch timer select scroll-bar mouse jit-lock font-lock
syntax facemenu font-core term/tty-colors frame minibuffer cl-generic
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote threads w32notify w32
lcms2 multi-tty make-network-process emacs)

Memory information:
((conses 16 100510 8583)
(symbols 48 10963 1)
(strings 32 33640 2227)
(string-bytes 1 1011136)
(vectors 16 16807)
(vector-slots 8 206002 13918)
(floats 8 44 213)
(intervals 56 279 0)
(buffers 1000 15))
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#47455; Package emacs. (Tue, 18 May 2021 16:06:02 GMT) Full text and rfc822 format available.

Message #8 received at 47455 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Brian Elmegaard <be <at> mek.dtu.dk>
Cc: 47455 <at> debbugs.gnu.org, Roland Winkler <winkler <at> gnu.org>
Subject: Re: bug#47455: 27.1; bibtex mode - citation key generation -
 non-ascii characters
Date: Tue, 18 May 2021 18:05:49 +0200
Brian Elmegaard <be <at> mek.dtu.dk> writes:

> Using C-c C-c in a bibtex cleans the entry and generates a citation key.
>
> If the author name includes non-ascii characters these are included in
>
> the key, even though BibTeX does not accept this.

Is this the case for all versions of BibTeX?

> For example:
>
> @Article{äöü21,
>
>   author =          {æøå äöü},
>
>   title =               {foo},
>
>   journal =          {bar},
>
>   year =               2021}

I guess Emacs could an asciification of some sort here, but I'm not sure
there's any that's universally accepted?  I've added Roland to the
CCs -- perhaps he has some comments.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#47455; Package emacs. (Tue, 18 May 2021 19:02:02 GMT) Full text and rfc822 format available.

Message #11 received at 47455 <at> debbugs.gnu.org (full text, mbox):

From: "Roland Winkler" <winkler <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 47455 <at> debbugs.gnu.org, Brian Elmegaard <be <at> mek.dtu.dk>
Subject: Re: bug#47455: 27.1; bibtex mode - citation key generation -
 non-ascii characters
Date: Tue, 18 May 2021 14:00:56 -0500
On Tue May 18 2021 Lars Ingebrigtsen wrote:
> Brian Elmegaard <be <at> mek.dtu.dk> writes:
> 
> > Using C-c C-c in a bibtex cleans the entry and generates a citation key.
> > If the author name includes non-ascii characters these are included in
> > the key, even though BibTeX does not accept this.
> 
> Is this the case for all versions of BibTeX?

I believe the problem lies here already in BibTeX itself, that is,
BibTeX [like conventional (La)TeX] does not like non-ascii
characters anywhere, not in the key nor anywhere else.

Of course, there is biblatex and also new versions of (La)TeX that
can handle non-ascii characters.  But that's a separate story.

> > For example:
> >
> > @Article{äöü21,
> >   author =          {æøå äöü},
> 
> I guess Emacs could an asciification of some sort here, but I'm
> not sure there's any that's universally accepted?

The default of the user variable bibtex-autokey-transcriptions
handles "LaTeX non-ascii" characters like \"a.  You can customize
these rules to your liking.

I vaguely remember an old thread that started from the very question
raised here and expanding on how asciification can be encapsulated
in some generic piece of elisp code.  But I cannot find it anymore
and I do not know either whether this would be possible at all.  I
believe, everyone agrees on asciification of German umlaute like

  ä -> ae

But beyond that, I do not know how to do this satisfactorily.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#47455; Package emacs. (Mon, 24 May 2021 22:06:01 GMT) Full text and rfc822 format available.

Message #14 received at 47455 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: "Roland Winkler" <winkler <at> gnu.org>
Cc: 47455 <at> debbugs.gnu.org, Brian Elmegaard <be <at> mek.dtu.dk>
Subject: Re: bug#47455: 27.1; bibtex mode - citation key generation -
 non-ascii characters
Date: Tue, 25 May 2021 00:04:38 +0200
"Roland Winkler" <winkler <at> gnu.org> writes:

> I vaguely remember an old thread that started from the very question
> raised here and expanding on how asciification can be encapsulated
> in some generic piece of elisp code.  But I cannot find it anymore
> and I do not know either whether this would be possible at all.  I
> believe, everyone agrees on asciification of German umlaute like
>
>   ä -> ae
>
> But beyond that, I do not know how to do this satisfactorily.

And this gets even more difficult to deal with for non-Latin scripts.

So I'm not sure anything here can be done programmatically...  the
command could output a warning?  "Probably invalid key"?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#47455; Package emacs. (Wed, 26 May 2021 18:58:02 GMT) Full text and rfc822 format available.

Message #17 received at 47455 <at> debbugs.gnu.org (full text, mbox):

From: "Roland Winkler" <winkler <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 47455 <at> debbugs.gnu.org, Brian Elmegaard <be <at> mek.dtu.dk>
Subject: Re: bug#47455: 27.1; bibtex mode - citation key generation -
 non-ascii characters
Date: Wed, 26 May 2021 13:56:53 -0500
On Tue May 25 2021 Lars Ingebrigtsen wrote:
> And this gets even more difficult to deal with for non-Latin scripts.
> 
> So I'm not sure anything here can be done programmatically...  the
> command could output a warning?  "Probably invalid key"?

The warning is a good idea.  Actually, the warning should be issued
if there are non-ascii characters anywhere in a BibTeX key because
(oldfashioned) BibTeX will choke on those no matter where they
appear.  So I'll add a new element for the user variable
bibtex-entry-format for this.  Then users can enable these warnings
if they use oldfashioned BibTeX.  (Those who use modern variants of
BibTeX need not enable these warnings.)

I believe that the real problem here lies in the fact that many
publishers of scientific journals let you download citation records
for their journal articles.  When they offer not only BibTeX-formatted
records but other formats, too, the BibTeX records are often
malformed, decorated with non-ascii characters that BibTeX (and
LaTeX) cannot handle and other things.  I have been fooled a number
of times by "invisible" non-ascii characters.  So I will enable the
new option for myself!




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#47455; Package emacs. (Fri, 28 May 2021 07:18:01 GMT) Full text and rfc822 format available.

Message #20 received at 47455 <at> debbugs.gnu.org (full text, mbox):

From: Brian Elmegaard <be <at> mek.dtu.dk>
To: Roland Winkler <winkler <at> gnu.org>, Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: "47455 <at> debbugs.gnu.org" <47455 <at> debbugs.gnu.org>
Subject: RE: bug#47455: 27.1; bibtex mode - citation key generation -
 non-ascii characters
Date: Fri, 28 May 2021 07:17:10 +0000
Hi

Thanks for looking into this.
I understand your reasoning about this being an issue with the tools used.
In auctex I can also enter \newcommand{\ü}{u} without being warned that it will not work with latex.

The warning seems to be a good idea to me as well.

Brian

-----Original Message-----
From: Roland Winkler <winkler <at> gnu.org> 
Sent: 26. maj 2021 20:57
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 47455 <at> debbugs.gnu.org; Brian Elmegaard <be <at> mek.dtu.dk>
Subject: Re: bug#47455: 27.1; bibtex mode - citation key generation - non-ascii characters

On Tue May 25 2021 Lars Ingebrigtsen wrote:
> And this gets even more difficult to deal with for non-Latin scripts.
> 
> So I'm not sure anything here can be done programmatically...  the 
> command could output a warning?  "Probably invalid key"?

The warning is a good idea.  Actually, the warning should be issued if there are non-ascii characters anywhere in a BibTeX key because
(oldfashioned) BibTeX will choke on those no matter where they appear.  So I'll add a new element for the user variable bibtex-entry-format for this.  Then users can enable these warnings if they use oldfashioned BibTeX.  (Those who use modern variants of BibTeX need not enable these warnings.)

I believe that the real problem here lies in the fact that many publishers of scientific journals let you download citation records for their journal articles.  When they offer not only BibTeX-formatted records but other formats, too, the BibTeX records are often malformed, decorated with non-ascii characters that BibTeX (and
LaTeX) cannot handle and other things.  I have been fooled a number of times by "invisible" non-ascii characters.  So I will enable the new option for myself!




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#47455; Package emacs. (Sun, 30 May 2021 21:40:02 GMT) Full text and rfc822 format available.

Message #23 received at 47455 <at> debbugs.gnu.org (full text, mbox):

From: "Roland Winkler" <winkler <at> gnu.org>
To: Brian Elmegaard <be <at> mek.dtu.dk>
Cc: Lars Ingebrigtsen <larsi <at> gnus.org>,
 "47455 <at> debbugs.gnu.org" <47455 <at> debbugs.gnu.org>
Subject: RE: bug#47455: 27.1; bibtex mode - citation key generation -
 non-ascii characters
Date: Sun, 30 May 2021 16:39:16 -0500
On Fri May 28 2021 Brian Elmegaard wrote:
> I understand your reasoning about this being an issue with the
> tools used.  In auctex I can also enter \newcommand{\ü}{u} without
> being warned that it will not work with latex.
> 
> The warning seems to be a good idea to me as well.

I realized:

One can also instruct font-lock to use for non-ascii characters
something like font-lock-warning-face (based on a user option for
enabling this behavior).

And Auctex could do the same.

However this fails with something like the unicode character
'ZERO WIDTH SPACE' (which has fooled me occassionally in the very
context we are discussing here).  Is it possible to instruct emacs
to make such "hidden characters" more easily visible, say by using
some display property?

The fontification could also be encapsulated in a minor mode that
one could use for \(Bib\|La\)?TeX files.

Maybe such a minor mode exists already and I am right now
reinventing the wheel?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#47455; Package emacs. (Mon, 31 May 2021 05:49:01 GMT) Full text and rfc822 format available.

Message #26 received at 47455 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: "Roland Winkler" <winkler <at> gnu.org>
Cc: "47455 <at> debbugs.gnu.org" <47455 <at> debbugs.gnu.org>,
 Brian Elmegaard <be <at> mek.dtu.dk>
Subject: Re: bug#47455: 27.1; bibtex mode - citation key generation -
 non-ascii characters
Date: Mon, 31 May 2021 07:48:45 +0200
"Roland Winkler" <winkler <at> gnu.org> writes:

> One can also instruct font-lock to use for non-ascii characters
> something like font-lock-warning-face (based on a user option for
> enabling this behavior).

Sounds like a good idea.

> However this fails with something like the unicode character
> 'ZERO WIDTH SPACE' (which has fooled me occassionally in the very
> context we are discussing here).  Is it possible to instruct emacs
> to make such "hidden characters" more easily visible, say by using
> some display property?

I vaguely remember a discussion about this not too long ago, but I can't
find it now.

All the other space "special" space characters, like NON-BREAKING SPACE
and the like, are fontified specially by default in Emacs 28.  But the
point of ZERO WIDTH SPACE is that it takes no room, which makes it
difficult to fontify.  :-)

But bibtex could apply a special display property here, or fontify the
surrounding characters in a special way.

> The fontification could also be encapsulated in a minor mode that
> one could use for \(Bib\|La\)?TeX files.
>
> Maybe such a minor mode exists already and I am right now
> reinventing the wheel?

I can't recall any such minor mode.  Anybody else?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#47455; Package emacs. (Tue, 01 Jun 2021 09:16:02 GMT) Full text and rfc822 format available.

Message #29 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Arash Esbati <arash <at> gnu.org>
To: Brian Elmegaard via "Bug reports for GNU Emacs, the Swiss army knife of
 text editors" <bug-gnu-emacs <at> gnu.org>
Cc: Lars Ingebrigtsen <larsi <at> gnus.org>,
 "47455 <at> debbugs.gnu.org" <47455 <at> debbugs.gnu.org>,
 Roland Winkler <winkler <at> gnu.org>, Brian Elmegaard <be <at> mek.dtu.dk>
Subject: Re: bug#47455: 27.1; bibtex mode - citation key generation -
 non-ascii characters
Date: Tue, 01 Jun 2021 11:14:50 +0200
Brian Elmegaard via "Bug reports for GNU Emacs, the Swiss army knife of
text editors" <bug-gnu-emacs <at> gnu.org> writes:

> Thanks for looking into this.
> I understand your reasoning about this being an issue with the tools
> used.  In auctex I can also enter \newcommand{\ü}{u} without being
> warned that it will not work with latex.

It's not about LaTeX as the macro package, it's about the underlying
engine.  This example

    \documentclass[10pt]{article}
    \newcommand\ü{foo}
    \begin{document}
    \ü bar
    \end{document}

works when your run lualatex on it, but chokes with pdflatex.  The
reason is described in ltnews30.pdf:

    Improving Unicode handling in pdfTEX

    [...]  What is not possible when using an 8-bit engine such as
    pdfTEX is to use characters other than ascii letters as part of a
    command name. This is due to the fact that all other characters in
    such engines are not single character tokens, but in fact consist of
    a sequence of bytes and this is not supported in command names.

So yes, it about the tools used :-)

Best, Arash




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#47455; Package emacs. (Tue, 01 Jun 2021 09:16:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#47455; Package emacs. (Tue, 01 Jun 2021 14:38:01 GMT) Full text and rfc822 format available.

Message #35 received at 47455 <at> debbugs.gnu.org (full text, mbox):

From: "Roland Winkler" <winkler <at> gnu.org>
To: Arash Esbati <arash <at> gnu.org>
Cc: Lars Ingebrigtsen <larsi <at> gnus.org>,
 "47455 <at> debbugs.gnu.org" <47455 <at> debbugs.gnu.org>,
 Brian Elmegaard <be <at> mek.dtu.dk>
Subject: Re: bug#47455: 27.1; bibtex mode - citation key generation -
 non-ascii characters
Date: Tue, 1 Jun 2021 09:37:01 -0500
On Tue Jun 1 2021 Arash Esbati wrote:
> It's not about LaTeX as the macro package, it's about the underlying
> engine.

This is known and doesn't solve the problem.  If you are required to
use LaTeX and not, say, lualatex, the reason for this being whatever
it is, then you should use only ascii characters.




Reply sent to Roland Winkler <winkler <at> gnu.org>:
You have taken responsibility. (Fri, 30 Dec 2022 06:35:02 GMT) Full text and rfc822 format available.

Notification sent to Brian Elmegaard <be <at> mek.dtu.dk>:
bug acknowledged by developer. (Fri, 30 Dec 2022 06:35:02 GMT) Full text and rfc822 format available.

Message #40 received at 47455-done <at> debbugs.gnu.org (full text, mbox):

From: Roland Winkler <winkler <at> gnu.org>
To: Arash Esbati <arash <at> gnu.org>
Cc: Lars Ingebrigtsen <larsi <at> gnus.org>,
 "47455 <at> debbugs.gnu.org" <47455-done <at> debbugs.gnu.org>,
 Brian Elmegaard <be <at> mek.dtu.dk>
Subject: Re: bug#47455: 27.1; bibtex mode - citation key generation -
 non-ascii characters
Date: Fri, 30 Dec 2022 00:34:04 -0600
On Tue, Jun 01 2021, Roland Winkler wrote:
> On Tue Jun 1 2021 Arash Esbati wrote:
>> It's not about LaTeX as the macro package, it's about the underlying
>> engine.
>
> This is known and doesn't solve the problem.  If you are required to
> use LaTeX and not, say, lualatex, the reason for this being whatever
> it is, then you should use only ascii characters.

Closing.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 27 Jan 2023 12:24:10 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 83 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.