GNU bug report logs - #38287
26.3.50; filenotify.el: the Chinese file name in the event is messy code

Previous Next

Package: emacs;

Reported by: HaiJun Zhang <netjune <at> outlook.com>

Date: Wed, 20 Nov 2019 03:51:01 UTC

Severity: normal

Tags: patch

Found in version 26.3.50

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 38287 in the body.
You can then email your comments to 38287 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#38287; Package emacs. (Wed, 20 Nov 2019 03:51:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to HaiJun Zhang <netjune <at> outlook.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 20 Nov 2019 03:51:04 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: HaiJun Zhang <netjune <at> outlook.com>
To: "bug-gnu-emacs <at> gnu.org" <bug-gnu-emacs <at> gnu.org>
Subject: 26.3.50; filenotify.el: the Chinese file name in the event is messy
 code
Date: Wed, 20 Nov 2019 03:50:26 +0000
[Message part 1 (text/plain, inline)]
So file name comparing in the event callback of filenotify.el always fails. And there is no autorevert for this file.

[cid:D1374FE5DD8D46F9BEA64B829E25DBC4]

In GNU Emacs 26.3.50 (build 1, x86_64-apple-darwin17.7.0, NS appkit-1561.61 Version 10.13.6 (Build 17G8037))
 of 2019-10-30 built on jundeMac
Repository revision: 3ee8ee8476fef2a5e8159f7597e36e0953295ce2
Windowing system distributor ‘Apple', version 10.3.1561
Recent messages:
+++ new: 31, ( *Echo Area 1*)
t
next-line: End of buffer [2 times]
previous-line: Beginning of buffer [13 times]
next-line: End of buffer [14 times]

Configured using:
 ‘configure —with-ns '--enable-locallisppath=/Library/Application
 Support/Emacs/${version}/site-lisp:/Library/Application
 Support/Emacs/site-lisp’ --with-modules --disable-acl
 —without-makeinfo CFLAGS=-O2’

Configured features:
JPEG RSVG GLIB NOTIFY GNUTLS LIBXML2 ZLIB TOOLKIT_SCROLL_BARS NS MODULES
THREADS LCMS2

Important settings:
  value of $LANG: zh_CN.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Messages

Minor modes in effect:
  global-auto-revert-mode: t
  shell-dirtrack-mode: t
  ido-everywhere: t
  global-hl-line-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc puny dired dired-loaddefs
format-spec rfc822 mml mml-sec epa derived epg gnus-util rmail
rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse rfc2231
mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums
mm-util mail-prsvr mail-utils thingatpt help-fns radix-tree help-mode
autorevert easy-mmode filenotify subr-x map edmacro kmacro tex-mode
compile shell pcomplete comint ansi-color ring latexenc package easymenu
epg-config url-handlers url-parse auth-source cl-seq eieio eieio-core
cl-macs eieio-loaddefs password-cache url-vars windmove ido seq byte-opt
gv bytecomp byte-compile cconv cl-loaddefs cl-lib display-line-numbers
hl-line elec-pair time-date china-util tooltip eldoc electric uniquify
ediff-hook vc-hooks lisp-float-type mwheel term/ns-win ns-win
ucs-normalize mule-util term/common-win tool-bar dnd fontset image
regexp-opt fringe tabulated-list replace newcomment text-mode elisp-mode
lisp-mode prog-mode register page menu-bar rfn-eshadow isearch timer
select scroll-bar mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame cl-generic cham georgian utf-8-lang misc-lang
vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932
hebrew greek romanian slovak czech european ethiopic indian cyrillic
chinese composite charscript charprop case-table epa-hook jka-cmpr-hook
help simple abbrev obarray minibuffer cl-preloaded nadvice loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote threads kqueue cocoa ns lcms2 multi-tty make-network-process
emacs)

Memory information:
((conses 16 263112 15865)
 (symbols 48 22940 1)
 (miscs 40 405 241)
 (strings 32 42961 1400)
 (string-bytes 1 1130894)
 (vectors 16 40024)
 (vector-slots 8 826529 14554)
 (floats 8 63 332)
 (intervals 56 503 0)
 (buffers 992 14))

[Message part 2 (text/html, inline)]
[Attachment.png (image/png, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#38287; Package emacs. (Wed, 20 Nov 2019 16:33:02 GMT) Full text and rfc822 format available.

Message #8 received at 38287 <at> debbugs.gnu.org (full text, mbox):

From: Michael Albinus <michael.albinus <at> gmx.de>
To: HaiJun Zhang <netjune <at> outlook.com>
Cc: 38287 <at> debbugs.gnu.org
Subject: Re: bug#38287: 26.3.50; filenotify.el: the Chinese file name in the
 event is messy code
Date: Wed, 20 Nov 2019 17:32:24 +0100
HaiJun Zhang <netjune <at> outlook.com> writes:

Hi,

> So file name comparing in the event callback of filenotify.el always
> fails. And there is no autorevert for this file.

Well, it is hard to analyse based on a .png file. Could you please
uncomment the line 93 in filenotify.el (it is a message call), and rerun
the test? There shall be debug output in the *Messages* buffer then.

> In GNU Emacs 26.3.50 (build 1, x86_64-apple-darwin17.7.0, NS
> appkit-1561.61 Version 10.13.6 (Build 17G8037))
>  of 2019-10-30 built on jundeMac
> Repository revision: 3ee8ee8476fef2a5e8159f7597e36e0953295ce2

It's a Mac. That means, kqueue is the file-notify backend.

Does the underlying file system supports utf8? Is it enabled? Maybe
there's something to convert, when getting a kevent from the system?

> Important settings:
>   value of $LANG: zh_CN.UTF-8
>   locale-coding-system: utf-8-unix

That looks OK, although I'm not sure whether the coding system shall be
utf-8-hfs or something like this.

Unfortunately, I'm not able to debug on Mac :-(

Best regards, Michael.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#38287; Package emacs. (Wed, 20 Nov 2019 17:36:02 GMT) Full text and rfc822 format available.

Message #11 received at 38287 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Michael Albinus <michael.albinus <at> gmx.de>
Cc: 38287 <at> debbugs.gnu.org, netjune <at> outlook.com
Subject: Re: bug#38287: 26.3.50;
 filenotify.el: the Chinese file name in the event is messy code
Date: Wed, 20 Nov 2019 19:35:17 +0200
> From: Michael Albinus <michael.albinus <at> gmx.de>
> Date: Wed, 20 Nov 2019 17:32:24 +0100
> Cc: 38287 <at> debbugs.gnu.org
> 
> Does the underlying file system supports utf8? Is it enabled? Maybe
> there's something to convert, when getting a kevent from the system?
> 
> > Important settings:
> >   value of $LANG: zh_CN.UTF-8
> >   locale-coding-system: utf-8-unix
> 
> That looks OK, although I'm not sure whether the coding system shall be
> utf-8-hfs or something like this.

The strings shown in the image are UTF-8 encoded.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#38287; Package emacs. (Wed, 20 Nov 2019 17:50:02 GMT) Full text and rfc822 format available.

Message #14 received at 38287 <at> debbugs.gnu.org (full text, mbox):

From: Michael Albinus <michael.albinus <at> gmx.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 38287 <at> debbugs.gnu.org, netjune <at> outlook.com
Subject: Re: bug#38287: 26.3.50; filenotify.el: the Chinese file name in the
 event is messy code
Date: Wed, 20 Nov 2019 18:49:34 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

>> That looks OK, although I'm not sure whether the coding system shall be
>> utf-8-hfs or something like this.
>
> The strings shown in the image are UTF-8 encoded.

Hmm. kqueue.c is very lazy in using ENCODE_FILE, it uses it only in
kqueue-add-watch. Maybe it is missing somewhere else?

(I always fail to handle utf-8 properly, especially in C code :-( )

Best regards, Michael.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#38287; Package emacs. (Wed, 20 Nov 2019 18:26:01 GMT) Full text and rfc822 format available.

Message #17 received at 38287 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Michael Albinus <michael.albinus <at> gmx.de>
Cc: 38287 <at> debbugs.gnu.org, netjune <at> outlook.com
Subject: Re: bug#38287: 26.3.50; filenotify.el: the Chinese file name in the
 event is messy code
Date: Wed, 20 Nov 2019 20:25:02 +0200
> From: Michael Albinus <michael.albinus <at> gmx.de>
> Cc: netjune <at> outlook.com,  38287 <at> debbugs.gnu.org
> Date: Wed, 20 Nov 2019 18:49:34 +0100
> 
> > The strings shown in the image are UTF-8 encoded.
> 
> Hmm. kqueue.c is very lazy in using ENCODE_FILE, it uses it only in
> kqueue-add-watch. Maybe it is missing somewhere else?

I see one potential problem: in kqueue-add-watch, you encode the file
name, but then pass it to APIs that generally expect multibyte
(i.e. un-encoded) strings, although they will also work with encoded
unibyte strings.  Moreover, you put the unibyte encoded file name into
the watch object.  Not sure if this is related to the issue at hand,
but it would be cleaner to make this change:

diff --git a/src/kqueue.c b/src/kqueue.c
index 76d7fc1..1383d7d 100644
--- a/src/kqueue.c
+++ b/src/kqueue.c
@@ -414,7 +414,7 @@ DEFUN ("kqueue-add-watch", Fkqueue_add_watch, Skqueue_add_watch, 3, 3, 0,
     }
 
   /* Open file.  */
-  file = ENCODE_FILE (file);
+  Lisp_Object encoded_file = ENCODE_FILE (file);
   oflags = O_NONBLOCK;
 #if O_EVTONLY
   oflags |= O_EVTONLY;
@@ -426,7 +426,7 @@ DEFUN ("kqueue-add-watch", Fkqueue_add_watch, Skqueue_add_watch, 3, 3, 0,
 #else
     oflags |= O_NOFOLLOW;
 #endif
-  fd = emacs_open (SSDATA (file), oflags, 0);
+  fd = emacs_open (SSDATA (encoded_file), oflags, 0);
   if (fd == -1)
     report_file_error ("File cannot be opened", file);
 
Btw, I don't think I understand the nature of the problem yet: where
were the unibyte strings shown in the report printed?  Did some Emacs
code print them, and if so, where is that code?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#38287; Package emacs. (Wed, 20 Nov 2019 18:46:01 GMT) Full text and rfc822 format available.

Message #20 received at 38287 <at> debbugs.gnu.org (full text, mbox):

From: Michael Albinus <michael.albinus <at> gmx.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 38287 <at> debbugs.gnu.org, netjune <at> outlook.com
Subject: Re: bug#38287: 26.3.50; filenotify.el: the Chinese file name in the
 event is messy code
Date: Wed, 20 Nov 2019 19:45:31 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

>> Hmm. kqueue.c is very lazy in using ENCODE_FILE, it uses it only in
>> kqueue-add-watch. Maybe it is missing somewhere else?
>
> I see one potential problem: in kqueue-add-watch, you encode the file
> name, but then pass it to APIs that generally expect multibyte
> (i.e. un-encoded) strings, although they will also work with encoded
> unibyte strings.  Moreover, you put the unibyte encoded file name into
> the watch object.  Not sure if this is related to the issue at hand,
> but it would be cleaner to make this change:
>
> diff --git a/src/kqueue.c b/src/kqueue.c
> index 76d7fc1..1383d7d 100644
> --- a/src/kqueue.c
> +++ b/src/kqueue.c
> @@ -414,7 +414,7 @@ DEFUN ("kqueue-add-watch", Fkqueue_add_watch, Skqueue_add_watch, 3, 3, 0,
>      }
>
>    /* Open file.  */
> -  file = ENCODE_FILE (file);
> +  Lisp_Object encoded_file = ENCODE_FILE (file);
>    oflags = O_NONBLOCK;
>  #if O_EVTONLY
>    oflags |= O_EVTONLY;
> @@ -426,7 +426,7 @@ DEFUN ("kqueue-add-watch", Fkqueue_add_watch, Skqueue_add_watch, 3, 3, 0,
>  #else
>      oflags |= O_NOFOLLOW;
>  #endif
> -  fd = emacs_open (SSDATA (file), oflags, 0);
> +  fd = emacs_open (SSDATA (encoded_file), oflags, 0);
>    if (fd == -1)
>      report_file_error ("File cannot be opened", file);

Thanks, let's see how far we go with this.

> Btw, I don't think I understand the nature of the problem yet: where
> were the unibyte strings shown in the report printed?  Did some Emacs
> code print them, and if so, where is that code?

Same question here. Looks like the OP has added some prints to the code.

In Emacs 27.0.50, we have file-notify-debug, which does it for us when
set to t. But this is Emacs 26.3.50, that's why I have asked to activate
the relevant debug message manually.

Best regards, Michael.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#38287; Package emacs. (Thu, 21 Nov 2019 00:36:02 GMT) Full text and rfc822 format available.

Message #23 received at 38287 <at> debbugs.gnu.org (full text, mbox):

From: HaiJun Zhang <netjune <at> outlook.com>
To: Michael Albinus <michael.albinus <at> gmx.de>, Eli Zaretskii <eliz <at> gnu.org>
Cc: "38287 <at> debbugs.gnu.org" <38287 <at> debbugs.gnu.org>
Subject: Re: bug#38287: 26.3.50; filenotify.el: the Chinese file name in the
 event is messy code
Date: Thu, 21 Nov 2019 00:35:14 +0000
[Message part 1 (text/plain, inline)]
在 2019年11月21日 +0800 AM2:24,Eli Zaretskii <eliz <at> gnu.org>,写道:
Btw, I don't think I understand the nature of the problem yet: where
were the unibyte strings shown in the report printed? Did some Emacs
code print them, and if so, where is that code?

It’s my fault. I didn’t describe the problem clearly. I have added some debug messages to notify.el.
Auto-revert doesn’t work for many files on my machine, so I want to find the cause and added the debug messages. Finally I find that it is because the messy code.

The scenario:

  1.  Low level file event comes, there is a file name in the event which has messy code int it.
  2.  In file notify.el, it receives the event, extracts the file name in the event and compares it with the one it has stored when adding the watch. The extracted on is messy code, and the stored one is good string. They are not equal. So the event is discarded.
  3.  Then no auto-revert for the file forever.

[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#38287; Package emacs. (Thu, 21 Nov 2019 02:34:01 GMT) Full text and rfc822 format available.

Message #26 received at 38287 <at> debbugs.gnu.org (full text, mbox):

From: HaiJun Zhang <netjune <at> outlook.com>
To: Michael Albinus <michael.albinus <at> gmx.de>, Eli Zaretskii <eliz <at> gnu.org>
Cc: "38287 <at> debbugs.gnu.org" <38287 <at> debbugs.gnu.org>
Subject: Re: bug#38287: 26.3.50; filenotify.el: the Chinese file name in the
 event is messy code
Date: Thu, 21 Nov 2019 02:33:42 +0000
[Message part 1 (text/plain, inline)]
在 2019年11月21日 +0800 AM2:24,Eli Zaretskii <eliz <at> gnu.org>,写道:
From: Michael Albinus <michael.albinus <at> gmx.de>
Cc: netjune <at> outlook.com, 38287 <at> debbugs.gnu.org
Date: Wed, 20 Nov 2019 18:49:34 +0100

The strings shown in the image are UTF-8 encoded.

Hmm. kqueue.c is very lazy in using ENCODE_FILE, it uses it only in
kqueue-add-watch. Maybe it is missing somewhere else?

I see one potential problem: in kqueue-add-watch, you encode the file
name, but then pass it to APIs that generally expect multibyte
(i.e. un-encoded) strings, although they will also work with encoded
unibyte strings. Moreover, you put the unibyte encoded file name into
the watch object. Not sure if this is related to the issue at hand,
but it would be cleaner to make this change:

diff --git a/src/kqueue.c b/src/kqueue.c
index 76d7fc1..1383d7d 100644
--- a/src/kqueue.c
+++ b/src/kqueue.c
@@ -414,7 +414,7 @@ DEFUN ("kqueue-add-watch", Fkqueue_add_watch, Skqueue_add_watch, 3, 3, 0,
}

/* Open file. */
- file = ENCODE_FILE (file);
+ Lisp_Object encoded_file = ENCODE_FILE (file);
oflags = O_NONBLOCK;
#if O_EVTONLY
oflags |= O_EVTONLY;
@@ -426,7 +426,7 @@ DEFUN ("kqueue-add-watch", Fkqueue_add_watch, Skqueue_add_watch, 3, 3, 0,
#else
oflags |= O_NOFOLLOW;
#endif
- fd = emacs_open (SSDATA (file), oflags, 0);
+ fd = emacs_open (SSDATA (encoded_file), oflags, 0);
if (fd == -1)
report_file_error ("File cannot be opened", file);

It is fixed by your patch. Thanks.

A question:
I print the value of file and encoded_file with safe_debug_print in kqueue.c. The former is normal string. The latter is messy code. What is the encoding of encoded_file? The value of file-name-coding-system is utf-8-hfs. How much does utf-8-hfs diff with utf-8-unix? Is utf-8-hfs not really utf-8?







[Message part 2 (text/html, inline)]

Added tag(s) patch. Request was from Michael Albinus <michael.albinus <at> gmx.de> to control <at> debbugs.gnu.org. (Thu, 21 Nov 2019 07:45:02 GMT) Full text and rfc822 format available.

Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Thu, 21 Nov 2019 14:43:01 GMT) Full text and rfc822 format available.

Notification sent to HaiJun Zhang <netjune <at> outlook.com>:
bug acknowledged by developer. (Thu, 21 Nov 2019 14:43:02 GMT) Full text and rfc822 format available.

Message #33 received at 38287-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: HaiJun Zhang <netjune <at> outlook.com>
Cc: 38287-done <at> debbugs.gnu.org, michael.albinus <at> gmx.de
Subject: Re: bug#38287: 26.3.50; filenotify.el: the Chinese file name in the
 event is messy code
Date: Thu, 21 Nov 2019 16:42:41 +0200
> From: HaiJun Zhang <netjune <at> outlook.com>
> CC: "38287 <at> debbugs.gnu.org" <38287 <at> debbugs.gnu.org>
> Date: Thu, 21 Nov 2019 02:33:42 +0000
> 
> It is fixed by your patch. Thanks. 

Thanks, I installed it.

> I print the value of file and encoded_file with safe_debug_print in kqueue.c. The former is normal string. The
> latter is messy code. What is the encoding of encoded_file? The value of file-name-coding-system is
> utf-8-hfs. How much does utf-8-hfs diff with utf-8-unix? Is utf-8-hfs not really utf-8?

encoded_file is in UTF-8 on your system.  What you perceive as "messy
code" is how Emacs displays unibyte strings, which are actually
sequences of raw bytes, not of characters.

utf-8 and utf-8-hfs are not exactly the same, but for Chinese
characters they produce the same results, because those characters
don't have decompositions.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 20 Dec 2019 12:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 127 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.