GNU bug report logs - #13553
24.3.50; incorrect usage of IS_DIRECTORY_SEP

Previous Next

Package: emacs;

Reported by: Shigeru Fukaya <shigeru.fukaya <at> gmail.com>

Date: Sat, 26 Jan 2013 05:54:01 UTC

Severity: normal

Found in version 24.3.50

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 13553 in the body.
You can then email your comments to 13553 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#13553; Package emacs. (Sat, 26 Jan 2013 05:54:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Shigeru Fukaya <shigeru.fukaya <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sat, 26 Jan 2013 05:54:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Shigeru Fukaya <shigeru.fukaya <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
Date: Sat, 26 Jan 2013 14:52:59 +0900
Hi,

`file-attributes' returns nil as stat info for a file, its name ending
with a character whose second byte is 0x5c.

It is because IS_DIRECTORY_SEP is wrongly used.
IS_DIRECTORY_SEP only works when its argument is surely on a start
byte of dbcs characters.
I find incorrect usage, at least in some functions of `w32.c'.
(readdir, stat_worker, read_unc_volume)


I show, as a silly sample, of easy fix.

#define IS_AT_DIRECTORY_SEP(p,i) \
  (p[i] == '/' || (p[i] == '\\' && _mbsbtype(p,i) == _MBC_SINGLE))

and,

  IS_DIRECTORY_SEP(p[i]) 

to

  IS_AT_DIRECTORY_SEP(p, i) 


Regards,
Shigeru




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13553; Package emacs. (Sat, 26 Jan 2013 11:10:01 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Shigeru Fukaya <shigeru.fukaya <at> gmail.com>
Cc: 13553 <at> debbugs.gnu.org
Subject: Re: bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
Date: Sat, 26 Jan 2013 05:26:04 -0500
> It is because IS_DIRECTORY_SEP is wrongly used.
> IS_DIRECTORY_SEP only works when its argument is surely on a start
> byte of dbcs characters.

If the file name is properly decoded (i.e. in its multibyte form), there
can't be a 0x5c byte other than at the beginning of a char.

So this bug can only happens if you pass a unibyte filename.  IOW the
bug is probably in the caller rather than in the file-attributes function.

Please show us the actual situation where this happens, since file names
are normally always multibyte.


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13553; Package emacs. (Sat, 26 Jan 2013 11:18:02 GMT) Full text and rfc822 format available.

Message #11 received at 13553 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 13553 <at> debbugs.gnu.org, shigeru.fukaya <at> gmail.com
Subject: Re: bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
Date: Sat, 26 Jan 2013 13:16:50 +0200
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Date: Sat, 26 Jan 2013 05:26:04 -0500
> Cc: 13553 <at> debbugs.gnu.org
> 
> > It is because IS_DIRECTORY_SEP is wrongly used.
> > IS_DIRECTORY_SEP only works when its argument is surely on a start
> > byte of dbcs characters.
> 
> If the file name is properly decoded (i.e. in its multibyte form), there
> can't be a 0x5c byte other than at the beginning of a char.
> 
> So this bug can only happens if you pass a unibyte filename.  IOW the
> bug is probably in the caller rather than in the file-attributes function.

No, it's not in the caller.  It's in w32.c, whose functions almost
always manipulate encoded file names (because they shadow system
APIs).

> Please show us the actual situation where this happens, since file names
> are normally always multibyte.

You call file-attributes, which encodes the file name and passes it to
'lstat'.  The implementation of 'lstat' in w32.c then looks at the
last byte of the encoded file name to see if there's a slash or
backslash there.  Boom!

I will fix that.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13553; Package emacs. (Sat, 26 Jan 2013 12:58:02 GMT) Full text and rfc822 format available.

Message #14 received at 13553 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Shigeru Fukaya <shigeru.fukaya <at> gmail.com>
Cc: 13553 <at> debbugs.gnu.org
Subject: Re: bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
Date: Sat, 26 Jan 2013 14:56:46 +0200
> From: Shigeru Fukaya <shigeru.fukaya <at> gmail.com>
> Date: Sat, 26 Jan 2013 14:52:59 +0900
> 
> `file-attributes' returns nil as stat info for a file, its name ending
> with a character whose second byte is 0x5c.
> 
> It is because IS_DIRECTORY_SEP is wrongly used.
> IS_DIRECTORY_SEP only works when its argument is surely on a start
> byte of dbcs characters.
> I find incorrect usage, at least in some functions of `w32.c'.
> (readdir, stat_worker, read_unc_volume)

Thanks.  I fixed w32.c functions that use IS_DIRECTORY_SEP in revision
111200 on the emacs-24 branch.  As I cannot easily test DBCS file-name
encodings on my system, please test the fixed version and see if the
problem indeed goes away.  The patch relative to emacs-24 branch is
below, for your convenience.

After looking at the code which I fixed, I'm afraid I don't understand
why file-attributes returned nil in your case.  The code that uses
IS_DIRECTORY_SEP there is only a fallback, for when the
GetFileInformationByHandle API fails for some reason.  You don't say
what version of Windows you are using, but I won't expect that
fallback code be executed for any reasonably recent Windows version.

So could you please trace through the code that file-attributes calls,
and see which use of IS_DIRECTORY_SEP caused the problem?  I'm afraid
that it's not in w32.c at all, but elsewhere.


=== modified file 'src/w32.c'
--- src/w32.c	2013-01-25 14:47:37 +0000
+++ src/w32.c	2013-01-26 12:49:34 +0000
@@ -1503,12 +1503,17 @@ parse_root (char * name, char ** pPath)
   else if (IS_DIRECTORY_SEP (name[0]) && IS_DIRECTORY_SEP (name[1]))
     {
       int slashes = 2;
+      int dbcs_p = max_filename_mbslen () > 1;
+
       name += 2;
       do
         {
 	  if (IS_DIRECTORY_SEP (*name) && --slashes == 0)
 	    break;
-	  name++;
+	  if (dbcs_p)
+	    name = CharNextExA (file_name_codepage, name, 0);
+	  else
+	    name++;
 	}
       while ( *name );
       if (IS_DIRECTORY_SEP (name[0]))
@@ -2369,12 +2374,23 @@ get_volume_info (const char * name, cons
     {
       char *str = temp;
       int slashes = 4;
+      int dbcs_p = max_filename_mbslen () > 1;
+
       rootname = temp;
       do
         {
 	  if (IS_DIRECTORY_SEP (*name) && --slashes == 0)
 	    break;
-	  *str++ = *name++;
+	  if (!dbcs_p)
+	    *str++ = *name++;
+	  else
+	    {
+	      const char *p = name;
+
+	      name = CharNextExA (file_name_codepage, name, 0);
+	      memcpy (str, p, name - p);
+	      str += name - p;
+	    }
 	}
       while ( *name );
 
@@ -2610,11 +2626,23 @@ readdir (DIR *dirp)
     {
       char filename[MAXNAMLEN + 3];
       int ln;
+      int dbcs_p = max_filename_mbslen () > 1;
 
       strcpy (filename, dir_pathname);
       ln = strlen (filename) - 1;
-      if (!IS_DIRECTORY_SEP (filename[ln]))
-	strcat (filename, "\\");
+      if (!dbcs_p)
+	{
+	  if (!IS_DIRECTORY_SEP (filename[ln]))
+	    strcat (filename, "\\");
+	}
+      else
+	{
+	  char *end = filename + ln + 1;
+	  char *last_char = CharPrevExA (file_name_codepage, filename, end, 0);
+
+	  if (!IS_DIRECTORY_SEP (*last_char))
+	    strcat (filename, "\\");
+	}
       strcat (filename, "*");
 
       /* Note: No need to resolve symlinks in FILENAME, because
@@ -2719,6 +2747,7 @@ read_unc_volume (HANDLE henum, char *rea
   DWORD bufsize = 512;
   char *buffer;
   char *ptr;
+  int dbcs_p = max_filename_mbslen () > 1;
 
   count = 1;
   buffer = alloca (bufsize);
@@ -2729,7 +2758,13 @@ read_unc_volume (HANDLE henum, char *rea
   /* WNetEnumResource returns \\resource\share...skip forward to "share". */
   ptr = ((LPNETRESOURCE) buffer)->lpRemoteName;
   ptr += 2;
-  while (*ptr && !IS_DIRECTORY_SEP (*ptr)) ptr++;
+  if (!dbcs_p)
+    while (*ptr && !IS_DIRECTORY_SEP (*ptr)) ptr++;
+  else
+    {
+      while (*ptr && !IS_DIRECTORY_SEP (*ptr))
+	ptr = CharNextExA (file_name_codepage, ptr, 0);
+    }
   ptr++;
 
   strncpy (readbuf, ptr, size);
@@ -2766,9 +2801,11 @@ logon_network_drive (const char *path)
 {
   NETRESOURCE resource;
   char share[MAX_PATH];
-  int i, n_slashes;
+  int n_slashes;
   char drive[4];
   UINT drvtype;
+  char *p;
+  int dbcs_p;
 
   if (IS_DIRECTORY_SEP (path[0]) && IS_DIRECTORY_SEP (path[1]))
     drvtype = DRIVE_REMOTE;
@@ -2790,13 +2827,18 @@ logon_network_drive (const char *path)
   n_slashes = 2;
   strncpy (share, path, MAX_PATH);
   /* Truncate to just server and share name.  */
-  for (i = 2; i < MAX_PATH; i++)
+  dbcs_p = max_filename_mbslen () > 1;
+  for (p = share + 2; *p && p < share + MAX_PATH; )
     {
-      if (IS_DIRECTORY_SEP (share[i]) && ++n_slashes > 3)
+      if (IS_DIRECTORY_SEP (*p) && ++n_slashes > 3)
         {
-          share[i] = '\0';
+          *p = '\0';
           break;
         }
+      if (dbcs_p)
+	p = CharNextExA (file_name_codepage, p, 0);
+      else
+	p++;
     }
 
   resource.dwType = RESOURCETYPE_DISK;
@@ -3557,6 +3599,7 @@ stat_worker (const char * path, struct s
   DWORD access_rights = 0;
   DWORD fattrs = 0, serialnum = 0, fs_high = 0, fs_low = 0, nlinks = 1;
   FILETIME ctime, atime, wtime;
+  int dbcs_p;
 
   if (path == NULL || buf == NULL)
     {
@@ -3751,6 +3794,7 @@ stat_worker (const char * path, struct s
 	 did not ask for extra precision, resolving symlinks will fly
 	 in the face of that request, since the user then wants the
 	 lightweight version of the code.  */
+      dbcs_p = max_filename_mbslen () > 1;
       rootdir = (path >= save_name + len - 1
 		 && (IS_DIRECTORY_SEP (*path) || *path == 0));
 
@@ -3778,8 +3822,19 @@ stat_worker (const char * path, struct s
 	}
       else if (rootdir)
 	{
-	  if (!IS_DIRECTORY_SEP (name[len-1]))
-	    strcat (name, "\\");
+	  if (!dbcs_p)
+	    {
+	      if (!IS_DIRECTORY_SEP (name[len-1]))
+		strcat (name, "\\");
+	    }
+	  else
+	    {
+	      char *end = name + len;
+	      char *n = CharPrevExA (file_name_codepage, name, end, 0);
+
+	      if (!IS_DIRECTORY_SEP (*n))
+		strcat (name, "\\");
+	    }
 	  if (GetDriveType (name) < 2)
 	    {
 	      errno = ENOENT;
@@ -3791,15 +3846,37 @@ stat_worker (const char * path, struct s
 	}
       else
 	{
-	  if (IS_DIRECTORY_SEP (name[len-1]))
-	    name[len - 1] = 0;
+	  if (!dbcs_p)
+	    {
+	      if (IS_DIRECTORY_SEP (name[len-1]))
+		name[len - 1] = 0;
+	    }
+	  else
+	    {
+	      char *end = name + len;
+	      char *n = CharPrevExA (file_name_codepage, name, end, 0);
+
+	      if (IS_DIRECTORY_SEP (*n))
+		*n = 0;
+	    }
 
 	  /* (This is hacky, but helps when doing file completions on
 	     network drives.)  Optimize by using information available from
 	     active readdir if possible.  */
 	  len = strlen (dir_pathname);
-	  if (IS_DIRECTORY_SEP (dir_pathname[len-1]))
-	    len--;
+	  if (!dbcs_p)
+	    {
+	      if (IS_DIRECTORY_SEP (dir_pathname[len-1]))
+		len--;
+	    }
+	  else
+	    {
+	      char *end = dir_pathname + len;
+	      char *n = CharPrevExA (file_name_codepage, dir_pathname, end, 0);
+
+	      if (IS_DIRECTORY_SEP (*n))
+		len--;
+	    }
 	  if (dir_find_handle != INVALID_HANDLE_VALUE
 	      && !(is_a_symlink && follow_symlinks)
 	      && strnicmp (save_name, dir_pathname, len) == 0
@@ -4060,6 +4137,7 @@ symlink (char const *filename, char cons
   char linkfn[MAX_PATH], *tgtfn;
   DWORD flags = 0;
   int dir_access, filename_ends_in_slash;
+  int dbcs_p;
 
   /* Diagnostics follows Posix as much as possible.  */
   if (filename == NULL || linkname == NULL)
@@ -4085,6 +4163,8 @@ symlink (char const *filename, char cons
       return -1;
     }
 
+  dbcs_p = max_filename_mbslen () > 1;
+
   /* Note: since empty FILENAME was already rejected, we can safely
      refer to FILENAME[1].  */
   if (!(IS_DIRECTORY_SEP (filename[0]) || IS_DEVICE_SEP (filename[1])))
@@ -4099,8 +4179,21 @@ symlink (char const *filename, char cons
       char tem[MAX_PATH];
       char *p = linkfn + strlen (linkfn);
 
-      while (p > linkfn && !IS_ANY_SEP (p[-1]))
-	p--;
+      if (!dbcs_p)
+	{
+	  while (p > linkfn && !IS_ANY_SEP (p[-1]))
+	    p--;
+	}
+      else
+	{
+	  char *p1 = CharPrevExA (file_name_codepage, linkfn, p, 0);
+
+	  while (p > linkfn && !IS_ANY_SEP (*p1))
+	    {
+	      p = p1;
+	      p1 = CharPrevExA (file_name_codepage, linkfn, p1, 0);
+	    }
+	}
       if (p > linkfn)
 	strncpy (tem, linkfn, p - linkfn);
       tem[p - linkfn] = '\0';
@@ -4115,7 +4208,15 @@ symlink (char const *filename, char cons
      exist, but ends in a slash, we create a symlink to directory.  If
      FILENAME exists and is a directory, we always create a symlink to
      directory.  */
-  filename_ends_in_slash = IS_DIRECTORY_SEP (filename[strlen (filename) - 1]);
+  if (!dbcs_p)
+    filename_ends_in_slash = IS_DIRECTORY_SEP (filename[strlen (filename) - 1]);
+  else
+    {
+      const char *end = filename + strlen (filename);
+      const char *n = CharPrevExA (file_name_codepage, filename, end, 0);
+
+      filename_ends_in_slash = IS_DIRECTORY_SEP (*n);
+    }
   if (dir_access == 0 || filename_ends_in_slash)
     flags = SYMBOLIC_LINK_FLAG_DIRECTORY;
 
@@ -4440,6 +4541,7 @@ chase_symlinks (const char *file)
   char link[MAX_PATH];
   ssize_t res, link_len;
   int loop_count = 0;
+  int dbcs_p;
 
   if (is_windows_9x () == TRUE || !is_symlink (file))
     return (char *)file;
@@ -4447,13 +4549,27 @@ chase_symlinks (const char *file)
   if ((link_len = GetFullPathName (file, MAX_PATH, link, NULL)) == 0)
     return (char *)file;
 
+  dbcs_p = max_filename_mbslen () > 1;
   target[0] = '\0';
   do {
 
     /* Remove trailing slashes, as we want to resolve the last
        non-trivial part of the link name.  */
-    while (link_len > 3 && IS_DIRECTORY_SEP (link[link_len-1]))
-      link[link_len--] = '\0';
+    if (!dbcs_p)
+      {
+	while (link_len > 3 && IS_DIRECTORY_SEP (link[link_len-1]))
+	  link[link_len--] = '\0';
+      }
+    else if (link_len > 3)
+      {
+	char *n = CharPrevExA (file_name_codepage, link, link + link_len, 0);
+
+	while (n >= link + 2 && IS_DIRECTORY_SEP (*n))
+	  {
+	    n[1] = '\0';
+	    n = CharPrevExA (file_name_codepage, link, n, 0);
+	  }
+      }
 
     res = readlink (link, target, MAX_PATH);
     if (res > 0)
@@ -4466,8 +4582,21 @@ chase_symlinks (const char *file)
 	       the symlink, then copy the result back to target.  */
 	    char *p = link + link_len;
 
-	    while (p > link && !IS_ANY_SEP (p[-1]))
-	      p--;
+	    if (!dbcs_p)
+	      {
+		while (p > link && !IS_ANY_SEP (p[-1]))
+		  p--;
+	      }
+	    else
+	      {
+		char *p1 = CharPrevExA (file_name_codepage, link, p, 0);
+
+		while (p > link && !IS_ANY_SEP (*p1))
+		  {
+		    p = p1;
+		    p1 = CharPrevExA (file_name_codepage, link, p1, 0);
+		  }
+	      }
 	    strcpy (p, target);
 	    strcpy (target, link);
 	  }





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13553; Package emacs. (Sat, 26 Jan 2013 13:24:01 GMT) Full text and rfc822 format available.

Message #17 received at 13553 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 13553 <at> debbugs.gnu.org, shigeru.fukaya <at> gmail.com
Subject: Re: bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
Date: Sat, 26 Jan 2013 15:23:02 +0200
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: shigeru.fukaya <at> gmail.com,  13553 <at> debbugs.gnu.org
> Date: Sat, 26 Jan 2013 07:40:08 -0500
> 
> > You call file-attributes, which encodes the file name and passes it to
> > 'lstat'.  The implementation of 'lstat' in w32.c then looks at the
> > last byte of the encoded file name to see if there's a slash or
> > backslash there.  Boom!
> 
> I see.   So, does that meant that w32.c can't faithfully implement lstat
> without doing the moral equivalent of re-decoding its argument?

It can, if we limit such support to Windows codepage encodings.  See
the changes I made on the emacs-24 branch revisions 111194 and 111200.

This will still lose if the user binds file-name-coding-system to
something like shift_jis (instead of using its Windows extension
cp932), but there's nothing I can do about that.  I was lucky to find
in Windows APIs 2 functions that can move forward and backward by DBCS
characters, and that accept a codepage as their argument (i.e. do not
limit support to the current system codepage).  Personally, I think
supporting all possible Windows codepages is good enough.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13553; Package emacs. (Sat, 26 Jan 2013 14:08:02 GMT) Full text and rfc822 format available.

Message #20 received at 13553 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: shigeru.fukaya <at> gmail.com
Cc: 13553 <at> debbugs.gnu.org
Subject: Re: bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
Date: Sat, 26 Jan 2013 16:06:47 +0200
> Date: Sat, 26 Jan 2013 14:56:46 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 13553 <at> debbugs.gnu.org
> 
> After looking at the code which I fixed, I'm afraid I don't understand
> why file-attributes returned nil in your case.  The code that uses
> IS_DIRECTORY_SEP there is only a fallback, for when the
> GetFileInformationByHandle API fails for some reason.  You don't say
> what version of Windows you are using, but I won't expect that
> fallback code be executed for any reasonably recent Windows version.

Actually, that fallback code gets also executed if you customize
w32-get-true-file-attributes to nil, or if the file is on a remote
(i.e. networked) filesystem.  Could one of these be true in your case?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13553; Package emacs. (Sat, 26 Jan 2013 16:34:02 GMT) Full text and rfc822 format available.

Message #23 received at 13553 <at> debbugs.gnu.org (full text, mbox):

From: Shigeru Fukaya <shigeru.fukaya <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 13553 <at> debbugs.gnu.org
Subject: Re: bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
Date: Sun, 27 Jan 2013 01:33:32 +0900
I greatly appreciate your very quick work, thank you.

I looked through your fix of all functions
(parse_root, get_volume_info, readdir, read_unc_volume, 
logon_network_drive, star_worker, symlink, chase_symlinks)
one by one, and found good at first look.

I'll try build and check it next, anyway.

Yes, my `w32-get-true-file-attributes' is nil. (Isn't it default?)

As for coding symbol, not a few Japanese use 'cp932 not 'shift-jis I 
suppose.  But we need some notice for users.

Regards,
Shigeru


>> From: Shigeru Fukaya <shigeru.fukaya <at> gmail.com>
>> Date: Sat, 26 Jan 2013 14:52:59 +0900
>> 
>> `file-attributes' returns nil as stat info for a file, its name ending
>> with a character whose second byte is 0x5c.
>> 
>> It is because IS_DIRECTORY_SEP is wrongly used.
>> IS_DIRECTORY_SEP only works when its argument is surely on a start
>> byte of dbcs characters.
>> I find incorrect usage, at least in some functions of `w32.c'.
>> (readdir, stat_worker, read_unc_volume)
>
>Thanks.  I fixed w32.c functions that use IS_DIRECTORY_SEP in revision
>111200 on the emacs-24 branch.  As I cannot easily test DBCS file-name
>encodings on my system, please test the fixed version and see if the
>problem indeed goes away.  The patch relative to emacs-24 branch is
>below, for your convenience.
>
>After looking at the code which I fixed, I'm afraid I don't understand
>why file-attributes returned nil in your case.  The code that uses
>IS_DIRECTORY_SEP there is only a fallback, for when the
>GetFileInformationByHandle API fails for some reason.  You don't say
>what version of Windows you are using, but I won't expect that
>fallback code be executed for any reasonably recent Windows version.
>
>So could you please trace through the code that file-attributes calls,
>and see which use of IS_DIRECTORY_SEP caused the problem?  I'm afraid
>that it's not in w32.c at all, but elsewhere.
>
>
>=== modified file 'src/w32.c'
...




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13553; Package emacs. (Sat, 26 Jan 2013 17:39:02 GMT) Full text and rfc822 format available.

Message #26 received at 13553 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Shigeru Fukaya <shigeru.fukaya <at> gmail.com>
Cc: 13553 <at> debbugs.gnu.org
Subject: Re: bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
Date: Sat, 26 Jan 2013 19:37:57 +0200
> From: Shigeru Fukaya <shigeru.fukaya <at> gmail.com>
> Cc: 13553 <at> debbugs.gnu.org
> Date: Sun, 27 Jan 2013 01:33:32 +0900
> 
> I'll try build and check it next, anyway.

Thank you.

> Yes, my `w32-get-true-file-attributes' is nil. (Isn't it default?)

The default is 'local', which means your local files use a different
code path, one which doesn't need to use IS_DIRECTORY_SEP.

If your value was nil, then I understand why the code I fixed could
cause trouble.

> As for coding symbol, not a few Japanese use 'cp932 not 'shift-jis I 
> suppose.  But we need some notice for users.

Not sure what you are saying here.  Do you think many Japanese Windows
users will set file-name-coding-system to shift_jis, or that most of
them will set it to cp932?

Normally, users don't customize file-name-coding-system at all, in
which case Emacs will use default-file-name-coding-system, that is
automatically set to cp932, according to the system-wide codepage.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13553; Package emacs. (Sat, 26 Jan 2013 21:20:02 GMT) Full text and rfc822 format available.

Message #29 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 13553 <at> debbugs.gnu.org, shigeru.fukaya <at> gmail.com
Subject: Re: bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
Date: Sat, 26 Jan 2013 07:40:08 -0500
> You call file-attributes, which encodes the file name and passes it to
> 'lstat'.  The implementation of 'lstat' in w32.c then looks at the
> last byte of the encoded file name to see if there's a slash or
> backslash there.  Boom!

I see.   So, does that meant that w32.c can't faithfully implement lstat
without doing the moral equivalent of re-decoding its argument?


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13553; Package emacs. (Sun, 27 Jan 2013 00:22:02 GMT) Full text and rfc822 format available.

Message #32 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 13553 <at> debbugs.gnu.org, shigeru.fukaya <at> gmail.com
Subject: Re: bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
Date: Sat, 26 Jan 2013 17:24:37 -0500
>> > You call file-attributes, which encodes the file name and passes it to
>> > 'lstat'.  The implementation of 'lstat' in w32.c then looks at the
>> > last byte of the encoded file name to see if there's a slash or
>> > backslash there.  Boom!
>> I see.   So, does that meant that w32.c can't faithfully implement lstat
>> without doing the moral equivalent of re-decoding its argument?
> It can, if we limit such support to Windows codepage encodings.  See
> the changes I made on the emacs-24 branch revisions 111194 and 111200.

CharNextExA and CharPrevExA, in my mind, do perform the moral equivalent
of decoding the argument (just in an incremental way).

> Personally, I think supporting all possible Windows codepages is
> good enough.

Fine by me,


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13553; Package emacs. (Sun, 27 Jan 2013 06:58:01 GMT) Full text and rfc822 format available.

Message #35 received at 13553 <at> debbugs.gnu.org (full text, mbox):

From: Shigeru Fukaya <shigeru.fukaya <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 13553 <at> debbugs.gnu.org
Subject: Re: bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
Date: Sun, 27 Jan 2013 15:56:54 +0900
Built on migw32, and no trouble, thank you
(I didn't do coverage tests, sorry).

But we must remember, my raise of issue is the incorrect usage of
IS_DIRECTORY_SEP.  There are still more in fileio.c and more.
(I really expect remove the check of '\' from IS_DIRECTORY_SEP, and
handle '\' only where it is necessary, and things wolud be better)

>> As for coding symbol, not a few Japanese use 'cp932 not 'shift-jis I 
>> suppose.  But we need some notice for users.
>
>Not sure what you are saying here.  Do you think many Japanese Windows
>users will set file-name-coding-system to shift_jis, or that most of
>them will set it to cp932?
>
>Normally, users don't customize file-name-coding-system at all, in
>which case Emacs will use default-file-name-coding-system, that is
>automatically set to cp932, according to the system-wide codepage.

Yes, you are right. I mean, maybe, the case of using some remote file
system.  You can check by yourself their (our?) usage if you like.
Some seems still using shift-jis, not cp932.

http://www.google.co.jp/search?q=file-name-coding-system+sjis


Regards,
Shigeru




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13553; Package emacs. (Sun, 27 Jan 2013 07:07:02 GMT) Full text and rfc822 format available.

Message #38 received at 13553 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 13553 <at> debbugs.gnu.org, shigeru.fukaya <at> gmail.com
Subject: Re: bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
Date: Sun, 27 Jan 2013 09:06:40 +0200
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: shigeru.fukaya <at> gmail.com,  13553 <at> debbugs.gnu.org
> Date: Sat, 26 Jan 2013 17:24:37 -0500
> 
> CharNextExA and CharPrevExA, in my mind, do perform the moral equivalent
> of decoding the argument (just in an incremental way).

No, they are the equivalents of NEXT_CHAR_BOUNDARY and
PREV_CHAR_BOUNDARY, except that they can do this for many different
encodings, not just for UTF-8.




Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Sun, 27 Jan 2013 07:17:02 GMT) Full text and rfc822 format available.

Notification sent to Shigeru Fukaya <shigeru.fukaya <at> gmail.com>:
bug acknowledged by developer. (Sun, 27 Jan 2013 07:17:02 GMT) Full text and rfc822 format available.

Message #43 received at 13553-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Shigeru Fukaya <shigeru.fukaya <at> gmail.com>
Cc: 13553-done <at> debbugs.gnu.org
Subject: Re: bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
Date: Sun, 27 Jan 2013 09:15:45 +0200
> From: Shigeru Fukaya <shigeru.fukaya <at> gmail.com>
> Cc: 13553 <at> debbugs.gnu.org
> Date: Sun, 27 Jan 2013 15:56:54 +0900
> 
> Built on migw32, and no trouble, thank you
> (I didn't do coverage tests, sorry).

Thanks.  I'm therefore closing this bug.

> But we must remember, my raise of issue is the incorrect usage of
> IS_DIRECTORY_SEP.  There are still more in fileio.c and more.

Yes.  This is being discussed on emacs-devel, and once that discussion
ends, the fileio.c and dired.c functions will be fixed as well.

> >> As for coding symbol, not a few Japanese use 'cp932 not 'shift-jis I 
> >> suppose.  But we need some notice for users.
> >
> >Not sure what you are saying here.  Do you think many Japanese Windows
> >users will set file-name-coding-system to shift_jis, or that most of
> >them will set it to cp932?
> >
> >Normally, users don't customize file-name-coding-system at all, in
> >which case Emacs will use default-file-name-coding-system, that is
> >automatically set to cp932, according to the system-wide codepage.
> 
> Yes, you are right. I mean, maybe, the case of using some remote file
> system.

Accessing remote files doesn't go through functions in w32.c, it goes
through file handlers (in Tramp).

> You can check by yourself their (our?) usage if you like.
> Some seems still using shift-jis, not cp932.
> 
> http://www.google.co.jp/search?q=file-name-coding-system+sjis

If they want 100% solid support, they will have to change their
customizations, sorry.  (We could implement an equivalence table,
whereby, e.g., shift_jis would be mapped to cp932, but these encodings
are slightly different, so I think that would be a kludge.)

In any case, the new code in w32.c is no worse than the previous one,
when file-name-coding-system is set to anything that is not a
recognized Windows codepage.  It is actually slightly better: it uses
the system-wide ANSI codepage in that case.  In most cases, this would
be the right thing anyway.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13553; Package emacs. (Sun, 27 Jan 2013 08:38:01 GMT) Full text and rfc822 format available.

Message #46 received at 13553 <at> debbugs.gnu.org (full text, mbox):

From: Shigeru Fukaya <shigeru.fukaya <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 13553 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
Date: Sun, 27 Jan 2013 17:36:48 +0900
One more, please.

CharNextExA/CharPrevExA work based on dynamically specified codepage.

And, _mbspbrk/_mbslwr are also used in w32.c.  They work based on MS
Window's locale setting.

Isn't it inconsistent?  _mbsinc/_mbsdec would simply be sufficient?
(Besides, I don't know well, _mbslwr seems to me lower latin
characters' case on some locale.  Is it true, or intended behavior?)

Thank you,
Shigeru


>> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
>> Cc: shigeru.fukaya <at> gmail.com,  13553 <at> debbugs.gnu.org
>> Date: Sat, 26 Jan 2013 17:24:37 -0500
>> 
>> CharNextExA and CharPrevExA, in my mind, do perform the moral equivalent
>> of decoding the argument (just in an incremental way).
>
>No, they are the equivalents of NEXT_CHAR_BOUNDARY and
>PREV_CHAR_BOUNDARY, except that they can do this for many different
>encodings, not just for UTF-8.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13553; Package emacs. (Sun, 27 Jan 2013 09:41:02 GMT) Full text and rfc822 format available.

Message #49 received at 13553 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Shigeru Fukaya <shigeru.fukaya <at> gmail.com>
Cc: 13553 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#13553: 24.3.50; incorrect usage of IS_DIRECTORY_SEP
Date: Sun, 27 Jan 2013 11:40:03 +0200
> From: Shigeru Fukaya <shigeru.fukaya <at> gmail.com>
> Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>,
>  13553 <at> debbugs.gnu.org
> Date: Sun, 27 Jan 2013 17:36:48 +0900
> 
> CharNextExA/CharPrevExA work based on dynamically specified codepage.
> 
> And, _mbspbrk/_mbslwr are also used in w32.c.  They work based on MS
> Window's locale setting.
> 
> Isn't it inconsistent?

It is, but I don't know of any practical way of eliminating that
inconsistency.  In the normal case, where the current
file-name-coding-system is the same as the ANSI codepage, there is no
inconsistency, so using CharNextExA/CharPrevExA is never worse than
using _mbsinc/_mbsdec.  It is sometimes better, because the code which
calls the _mbs* functions is executed only by some functions in w32.c.

> _mbsinc/_mbsdec would simply be sufficient?

That would preclude code like this:

  (let ((file-name-coding-system 'cp932))
    (expand-file-name "SOMETHING" "C:/")

I think it's reasonable to expect this to be supported, even if the
default file-name encoding is not cp932.

> (Besides, I don't know well, _mbslwr seems to me lower latin
> characters' case on some locale.  Is it true, or intended behavior?)

The intended behavior is to downcase letters for which lower-case is
defined.  I don't know if this is limited to Latin characters.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 24 Feb 2013 12:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 11 years and 84 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.