GNU logs - #33044, boring messages


Message sent to bug-guile@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#33044: Invalid read access of chars of wide string in scm_seed_to_random_state
Resent-From: Tom de Vries <tdevries@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-guile@HIDDEN
Resent-Date: Mon, 15 Oct 2018 10:43:01 +0000
Resent-Message-ID: <handler.33044.B.15396001327153 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: report 33044
X-GNU-PR-Package: guile
X-GNU-PR-Keywords: 
To: 33044 <at> debbugs.gnu.org
X-Debbugs-Original-To: bug-guile@HIDDEN
Received: via spool by submit <at> debbugs.gnu.org id=B.15396001327153
          (code B ref -1); Mon, 15 Oct 2018 10:43:01 +0000
Received: (at submit) by debbugs.gnu.org; 15 Oct 2018 10:42:12 +0000
Received: from localhost ([127.0.0.1]:49844 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1gC0KB-0001rH-96
	for submit <at> debbugs.gnu.org; Mon, 15 Oct 2018 06:42:11 -0400
Received: from eggs.gnu.org ([208.118.235.92]:35397)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <tdevries@HIDDEN>) id 1gByUt-0005Fw-NX
 for submit <at> debbugs.gnu.org; Mon, 15 Oct 2018 04:45:08 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <tdevries@HIDDEN>) id 1gByUm-000294-14
 for submit <at> debbugs.gnu.org; Mon, 15 Oct 2018 04:45:01 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled
 version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:45244)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <tdevries@HIDDEN>) id 1gByUk-00028K-MX
 for submit <at> debbugs.gnu.org; Mon, 15 Oct 2018 04:44:59 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:44994)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <tdevries@HIDDEN>) id 1gByUj-0003ft-HS
 for bug-guile@HIDDEN; Mon, 15 Oct 2018 04:44:58 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <tdevries@HIDDEN>) id 1gByUg-00024g-B0
 for bug-guile@HIDDEN; Mon, 15 Oct 2018 04:44:57 -0400
Received: from mx2.suse.de ([195.135.220.15]:46892 helo=mx1.suse.de)
 by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <tdevries@HIDDEN>) id 1gByUf-00022y-VG
 for bug-guile@HIDDEN; Mon, 15 Oct 2018 04:44:54 -0400
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay1.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id D093FAC1F
 for <bug-guile@HIDDEN>; Mon, 15 Oct 2018 08:44:51 +0000 (UTC)
From: Tom de Vries <tdevries@HIDDEN>
Message-ID: <469f2345-5e76-1fc5-1105-f1d508611140@HIDDEN>
Date: Mon, 15 Oct 2018 10:44:58 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.2.1
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no
 timestamps) [generic]
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -4.0 (----)
X-Mailman-Approved-At: Mon, 15 Oct 2018 06:42:10 -0400
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -5.0 (-----)

Hi,

Consider min.c:
...
#include <locale.h>
#include "libguile.h"

static void *
foo (void *data)
{
  return NULL;
}

int
main (void)
{
  const char *msg = setlocale (LC_CTYPE, "ja_JP.sjis");
  printf ("msg: %s\n", msg);
  scm_with_guile (foo, NULL);
  return 0;
}
...

Compiled with guile-2.2.4:
...
$ gcc min.c -I /home/vries/guile/tarball/guile-2.2.4 -lguile-2.2 -L
/home/vries/guile/tarball/guile-2.2.4/libguile/.libs
-Wl,-rpath=/home/vries/guile/tarball/guile-2.2.4/libguile/.libs -g
...

We run into a segfault:
...
$ ./a.out
msg: ja_JP.sjis
Segmentation fault (core dumped)
...

The backtrace as reported by gdb is:
...
#0  0x00007ffff7b649ba in scm_variable_ref (var=0x0) at variable.c:92
#1  0x00007ffff7b63868 in scm_throw (key=key@entry=0x7a9580,
args=0x7b94c0) at throw.c:266
#2  0x00007ffff7b63e15 in scm_ithrow (key=key@entry=0x7a9580,
args=<optimized out>, no_return=no_return@entry=1)
    at throw.c:611
#3  0x00007ffff7af51a5 in scm_error_scm (key=key@entry=0x7a9580,
subr=<optimized out>,
    message=message@entry=0x7ba8e0, args=args@entry=0x7b9500,
data=data@entry=0x4) at error.c:94
#4  0x00007ffff7af525f in scm_error (key=0x7a9580, subr=subr@entry=0x0,
    message=message@entry=0x7ffff7b93358 "Invalid read access of chars
of wide string: ~s", args=0x7b9500,
    rest=rest@entry=0x4) at error.c:59
#5  0x00007ffff7af5642 in scm_misc_error (subr=subr@entry=0x0,
    message=message@entry=0x7ffff7b93358 "Invalid read access of chars
of wide string: ~s", args=<optimized out>)
    at error.c:299
#6  0x00007ffff7b5aa9a in scm_i_string_chars (str=<optimized out>,
str@entry=0x7ba900) at strings.c:571
#7  0x00007ffff7b3cef8 in scm_seed_to_random_state (seed=0x7ba900) at
random.c:444
#8  0x00007ffff7b3ddaa in scm_init_random () at ../libguile/random.x:3
#9  0x00007ffff7b0eb41 in scm_i_init_guile (base=<optimized out>) at
init.c:451
#10 0x00007ffff7b62128 in scm_i_init_thread_for_guile
(base=0x7fffffffdb10, dynamic_state=0x0) at threads.c:586
#11 0x00007ffff7b62159 in with_guile (base=0x7fffffffdb10,
data=0x7fffffffdb40) at threads.c:654
#12 0x00007ffff73a84a5 in GC_call_with_stack_base () from
/usr/lib64/libgc.so.1
#13 0x00007ffff7b624a8 in scm_i_with_guile (dynamic_state=<optimized
out>, data=<optimized out>,
    func=<optimized out>) at threads.c:704
#14 scm_with_guile (func=<optimized out>, data=<optimized out>) at
threads.c:710
#15 0x0000000000400786 in main () at min.c:15
...

We see that the backtrace happens while handling an "Invalid read access
of chars of wide string: ~s" error here:
...
const char *
scm_i_string_chars (SCM str)
{
  SCM buf;
  size_t start;
  get_str_buf_start (&str, &buf, &start);
  if (scm_i_is_narrow_string (str))
    return (const char *) STRINGBUF_CHARS (buf) + start;
  else
    scm_misc_error (NULL, "Invalid read access of chars of wide string: ~s",
                    scm_list_1 (str));
  return NULL;
}
...

What triggers the error is that here, we create a non-narrow string
using scm_from_locale_string:
...
#8  0x00007ffff7b3ddaa in scm_init_random () at ../libguile/random.x:3
3       scm_var_random_state = scm_c_define ("*random-state*",
scm_seed_to_random_state (scm_from_locale_string
("URL:http://stat.fsu.edu/~geo/diehard.html")));;
...

but then in scm_seed_to_random_state handle it like a narrow string by
calling scm_i_string_chars:
...
#define FUNC_NAME s_scm_seed_to_random_state
{
  SCM res;
  if (SCM_NUMBERP (seed))
    seed = scm_number_to_string (seed, SCM_UNDEFINED);
  SCM_VALIDATE_STRING (1, seed);
  res = make_rstate (scm_c_make_rstate (scm_i_string_chars (seed),
                                        scm_i_string_length (seed)));
  scm_remember_upto_here_1 (seed);
  return res;

}
...

Thanks,
- Tom




Message sent:


Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Mailer: MIME-tools 5.505 (Entity 5.505)
Content-Type: text/plain; charset=utf-8
X-Loop: help-debbugs@HIDDEN
From: help-debbugs@HIDDEN (GNU bug Tracking System)
To: Tom de Vries <tdevries@HIDDEN>
Subject: bug#33044: Acknowledgement (Invalid read access of chars of wide
 string in scm_seed_to_random_state)
Message-ID: <handler.33044.B.15396001327153.ack <at> debbugs.gnu.org>
References: <469f2345-5e76-1fc5-1105-f1d508611140@HIDDEN>
X-Gnu-PR-Message: ack 33044
X-Gnu-PR-Package: guile
Reply-To: 33044 <at> debbugs.gnu.org
Date: Mon, 15 Oct 2018 10:43:02 +0000

Thank you for filing a new bug report with debbugs.gnu.org.

This is an automatically generated reply to let you know your message
has been received.

Your message is being forwarded to the package maintainers and other
interested parties for their attention; they will reply in due course.

Your message has been sent to the package maintainer(s):
 bug-guile@HIDDEN

If you wish to submit further information on this problem, please
send it to 33044 <at> debbugs.gnu.org.

Please do not send mail to help-debbugs@HIDDEN unless you wish
to report a problem with the Bug-tracking system.

--=20
33044: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D33044
GNU Bug Tracking System
Contact help-debbugs@HIDDEN with problems


Message sent to bug-guile@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#33044: Reproduced using guile binary
References: <469f2345-5e76-1fc5-1105-f1d508611140@HIDDEN>
In-Reply-To: <469f2345-5e76-1fc5-1105-f1d508611140@HIDDEN>
Resent-From: Tom de Vries <tdevries@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-guile@HIDDEN
Resent-Date: Mon, 15 Oct 2018 14:21:02 +0000
Resent-Message-ID: <handler.33044.B33044.15396132153431 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 33044
X-GNU-PR-Package: guile
X-GNU-PR-Keywords: 
To: 33044 <at> debbugs.gnu.org
Received: via spool by 33044-submit <at> debbugs.gnu.org id=B33044.15396132153431
          (code B ref 33044); Mon, 15 Oct 2018 14:21:02 +0000
Received: (at 33044) by debbugs.gnu.org; 15 Oct 2018 14:20:15 +0000
Received: from localhost ([127.0.0.1]:50868 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1gC3jC-0000tH-T3
	for submit <at> debbugs.gnu.org; Mon, 15 Oct 2018 10:20:15 -0400
Received: from mx2.suse.de ([195.135.220.15]:56870 helo=mx1.suse.de)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <tdevries@HIDDEN>) id 1gC3jA-0000t3-TF
 for 33044 <at> debbugs.gnu.org; Mon, 15 Oct 2018 10:20:13 -0400
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id 1C1ACAF85
 for <33044 <at> debbugs.gnu.org>; Mon, 15 Oct 2018 14:20:07 +0000 (UTC)
From: Tom de Vries <tdevries@HIDDEN>
Message-ID: <a0656d24-cc1a-f6e5-ab16-145ab43c4510@HIDDEN>
Date: Mon, 15 Oct 2018 16:20:14 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.2.1
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Spam-Score: -2.3 (--)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Hi,

Using a simple scheme hello world:
...
$ cat hello.scm
(display "hello world")
(newline)
...
we're able to reproduce the problem using the guile binary:
....
$ LC_CTYPE=ja_JP.sjis /home/vries/guile/2.2/install/bin/guile -s hello.scm
Segmentation fault (core dumped)
...

[ Note: When using 2.0, we need to set GUILE_INSTALL_LOCALE=1 in the
environment, otherwise the 'LC_CTYPE=ja_JP.sjis' setting has no effect. ]

Thanks,
- Tom




Message sent to bug-guile@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#33044: Analysis and proposed patch
References: <469f2345-5e76-1fc5-1105-f1d508611140@HIDDEN>
In-Reply-To: <469f2345-5e76-1fc5-1105-f1d508611140@HIDDEN>
Resent-From: Tom de Vries <tdevries@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-guile@HIDDEN
Resent-Date: Mon, 15 Oct 2018 19:00:02 +0000
Resent-Message-ID: <handler.33044.B33044.15396299463198 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 33044
X-GNU-PR-Package: guile
X-GNU-PR-Keywords: 
To: 33044 <at> debbugs.gnu.org
Received: via spool by 33044-submit <at> debbugs.gnu.org id=B33044.15396299463198
          (code B ref 33044); Mon, 15 Oct 2018 19:00:02 +0000
Received: (at 33044) by debbugs.gnu.org; 15 Oct 2018 18:59:06 +0000
Received: from localhost ([127.0.0.1]:51419 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1gC854-0000pV-AB
	for submit <at> debbugs.gnu.org; Mon, 15 Oct 2018 14:59:06 -0400
Received: from mx2.suse.de ([195.135.220.15]:46804 helo=mx1.suse.de)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <tdevries@HIDDEN>) id 1gC850-0000ow-3p
 for 33044 <at> debbugs.gnu.org; Mon, 15 Oct 2018 14:59:02 -0400
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id F2583ADA9
 for <33044 <at> debbugs.gnu.org>; Mon, 15 Oct 2018 18:58:55 +0000 (UTC)
From: Tom de Vries <tdevries@HIDDEN>
Message-ID: <8a6a308f-a981-fd46-93d5-c2d2870f4eb4@HIDDEN>
Date: Mon, 15 Oct 2018 20:59:03 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.2.1
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.3 (--)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Hi,

I think there are two independent problems here.

-------

1.

scm_seed_to_random_state should be able to handle the case that the seed
argument is a non-narrow string.

2.

The *random-state* variable is documented like this:
...
Note that the initial value of *random-state* is the same every time
Guile starts up. Therefore, if you don’t pass a state parameter to the
above procedures, and you don’t set *random-state* to
(seed->random-state your-seed), where your-seed is something that isn’t
the same every time, you’ll get the same sequence of “random” numbers on
every run.
...

However, using scm_from_locale_string to initialize *random-state* makes
it possible that *random-state* differs depending on the locale used
when starting Guile. So, we should use a string that's independent of
the locale settings.

-------

The second problem is fixed by using scm_from_latin1_string instead of
scm_from_locale_string:
...
diff --git a/libguile/random.c b/libguile/random.c
index 4051d1f..6f014e1 100644
--- a/libguile/random.c
+++ b/libguile/random.c
@@ -374,7 +374,7 @@ make_rstate (scm_t_rstate *state)
  * Scheme level interface.
  */

-SCM_GLOBAL_VARIABLE_INIT (scm_var_random_state, "*random-state*",
scm_seed_to_random_state (scm_from_locale_string
("URL:http://stat.fsu.edu/~geo/diehard.html")));
+SCM_GLOBAL_VARIABLE_INIT (scm_var_random_state, "*random-state*",
scm_seed_to_random_state (scm_from_latin1_string
("URL:http://stat.fsu.edu/~geo/diehard.html")));

 SCM_DEFINE (scm_random, "random", 1, 1, 0,
             (SCM n, SCM state),
...

Tested on 2.0.14 and 2.2.4 tarballs.

Thanks,
- Tom




Message sent to bug-guile@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#33044: Guile misbehaves in the "ja_JP.sjis" locale
Resent-From: Mark H Weaver <mhw@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-guile@HIDDEN
Resent-Date: Tue, 16 Oct 2018 01:58:01 +0000
Resent-Message-ID: <handler.33044.B33044.153965506127791 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 33044
X-GNU-PR-Package: guile
X-GNU-PR-Keywords: 
To: Tom de Vries <tdevries@HIDDEN>
Cc: 33044 <at> debbugs.gnu.org
Received: via spool by 33044-submit <at> debbugs.gnu.org id=B33044.153965506127791
          (code B ref 33044); Tue, 16 Oct 2018 01:58:01 +0000
Received: (at 33044) by debbugs.gnu.org; 16 Oct 2018 01:57:41 +0000
Received: from localhost ([127.0.0.1]:51660 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1gCEc8-0007E5-AI
	for submit <at> debbugs.gnu.org; Mon, 15 Oct 2018 21:57:41 -0400
Received: from world.peace.net ([64.112.178.59]:38790)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mhw@HIDDEN>)
 id 1gCEc5-0007Dk-CL; Mon, 15 Oct 2018 21:57:38 -0400
Received: from mhw by world.peace.net with esmtpsa
 (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89)
 (envelope-from <mhw@HIDDEN>)
 id 1gCEby-0003tn-9e; Mon, 15 Oct 2018 21:57:31 -0400
From: Mark H Weaver <mhw@HIDDEN>
References: <469f2345-5e76-1fc5-1105-f1d508611140@HIDDEN>
 <8a6a308f-a981-fd46-93d5-c2d2870f4eb4@HIDDEN>
Date: Mon, 15 Oct 2018 21:57:02 -0400
In-Reply-To: <8a6a308f-a981-fd46-93d5-c2d2870f4eb4@HIDDEN> (Tom de Vries's
 message of "Mon, 15 Oct 2018 20:59:03 +0200")
Message-ID: <87y3ayodqp.fsf_-_@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: 0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

retitle 33044 Guile misbehaves in the "ja_JP.sjis" locale
thanks

Hi Tom,

Thanks for the report, analysis and patch.  I agree with your analysis,
and the patch looks good.

However, there's also a much deeper problem here.  You found and fixed
one occurrence of Guile assuming that the locale encoding is ASCII-
compatible.  In fact, this assumption is widespread in Guile, and I
would guess that it's widespread throughout the POSIX world.

I admit that before I saw your message, I believed that it was
legitimate to assume that the locale encoding was ASCII-compatible.  Now
I'm unsure, although I'll note that according to the 'localedef' utility
from GNU libc, this locale is "not ISO C compliant".  It printed the
following message when I asked it to generate the "ja_JP.sjis" locale:

  [warning] character map `SHIFT_JIS' is not ASCII compatible, locale not I=
SO C compliant [--no-warnings=3Dascii]

Shift_JIS is _mostly_ ASCII-compatible, except that code points 0x5C and
0x7E, which represent backslash (\) and tilde (~) in ASCII, are mapped
to the Yen sign (=C2=A5) and overline (=E2=80=BE) in Shift_JIS.  Backslash =
(\) and
tilde (~) are multibyte characters in Shift_JIS.

One common problem is that Guile often uses 'scm_from_locale_string' to
create Scheme strings from ASCII-only C string literals.  These should
all be changed to use either 'scm_from_latin1_string' or
'scm_from_utf8_string'.  I prefer the latter because modern C compilers
typically use UTF-8 as the default execution character set, i.e. the
character set used to encode string and character constants, regardless
of the locale settings.  GCC uses UTF-8 by default unless
-fexec-charset=3DCHARSET is given at compile time.  I'd prefer to promote
writing code that works for arbitrary string literals, so that code
needn't be adjusted if non-ASCII characters are later added.

A related set of problems is that Guile often applies
'scm_from_locale_string' to char* arguments passed in from the user, or
produced by third-party libraries.  These issues are more difficult to
address.  We provide several C APIs that accept C strings without
specifying what encoding is expected.  If the string ultimately derives
from a C string constant, we probably want UTF-8, whereas if the string
came from I/O, or program arguments, then we probably want the locale
encoding.

For example, consider 'scm_c_eval_string'.  This has been a public API
function since 2002, but we did not specify the encoding of its C string
argument until 2011.  We chose the locale encoding in this case, which I
think is reasonable, but I also expect that code exists in the wild that
passes a C string literal to 'scm_c_eval_string'.

Until now, problems like this have been mostly harmless, since the C
string literals are typically ASCII-only.  However, if we wish to
support non-ASCII-compatible encodings such as Shift_JIS, we can no
longer consider these problems harmless.  For example, programs which
pass C string literals to 'scm_c_eval_string' will fail when using the
"ja_JP.sjis" locale, if any tildes or backslashes are present.
Backslashes are fairly common in Scheme code.

There's various other code scattered in Guile that assumes ASCII
characters can searched for, and sometimes replaced with other ASCII
characters.  For example, several functions in load.c, including
'search_path', 'load_thunk_from_path' scan through file names in the
locale encoding, scanning the bytes looking for particular ASCII codes
such as '.', '/', and '\'.

On MingW, 'scm_i_mirror_backslashes' in load.c converts backslashes into
forward slashes byte-wise, assuming ASCII-compatibility, and this
transformation is applied to file names in several places.

While looking into this, I also discovered that Guile's S-expression
reader, i.e. the 'read' procedure, assumes an ASCII-compatible port
encoding, despite the fact that it is meant to support arbitrary
encodings such as UTF-16 and UTF-32.  I just filed a related bug
<https://bug.gnu.org/33057> to track this probem.

These are some of the problems that I'm currently aware of.  I expect
that this bug report will remain open for a while.

To begin, I've started working on a patch to change many occurrences of
'scm_from_locale_string' to 'scm_from_utf8_string', in cases where the C
string clearly originates from a C string literal.

Thanks again for the detailed bug report and analysis.

    Regards,
      Mark




Message received at control <at> debbugs.gnu.org:


Received: (at control) by debbugs.gnu.org; 16 Oct 2018 01:57:40 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Oct 15 21:57:40 2018
Received: from localhost ([127.0.0.1]:51658 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1gCEc7-0007E3-U3
	for submit <at> debbugs.gnu.org; Mon, 15 Oct 2018 21:57:40 -0400
Received: from world.peace.net ([64.112.178.59]:38790)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mhw@HIDDEN>)
 id 1gCEc5-0007Dk-CL; Mon, 15 Oct 2018 21:57:38 -0400
Received: from mhw by world.peace.net with esmtpsa
 (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89)
 (envelope-from <mhw@HIDDEN>)
 id 1gCEby-0003tn-9e; Mon, 15 Oct 2018 21:57:31 -0400
From: Mark H Weaver <mhw@HIDDEN>
To: Tom de Vries <tdevries@HIDDEN>
Subject: Re: bug#33044: Guile misbehaves in the "ja_JP.sjis" locale
References: <469f2345-5e76-1fc5-1105-f1d508611140@HIDDEN>
 <8a6a308f-a981-fd46-93d5-c2d2870f4eb4@HIDDEN>
Date: Mon, 15 Oct 2018 21:57:02 -0400
In-Reply-To: <8a6a308f-a981-fd46-93d5-c2d2870f4eb4@HIDDEN> (Tom de Vries's
 message of "Mon, 15 Oct 2018 20:59:03 +0200")
Message-ID: <87y3ayodqp.fsf_-_@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: control
Cc: 33044 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

retitle 33044 Guile misbehaves in the "ja_JP.sjis" locale
thanks

Hi Tom,

Thanks for the report, analysis and patch.  I agree with your analysis,
and the patch looks good.

However, there's also a much deeper problem here.  You found and fixed
one occurrence of Guile assuming that the locale encoding is ASCII-
compatible.  In fact, this assumption is widespread in Guile, and I
would guess that it's widespread throughout the POSIX world.

I admit that before I saw your message, I believed that it was
legitimate to assume that the locale encoding was ASCII-compatible.  Now
I'm unsure, although I'll note that according to the 'localedef' utility
from GNU libc, this locale is "not ISO C compliant".  It printed the
following message when I asked it to generate the "ja_JP.sjis" locale:

  [warning] character map `SHIFT_JIS' is not ASCII compatible, locale not I=
SO C compliant [--no-warnings=3Dascii]

Shift_JIS is _mostly_ ASCII-compatible, except that code points 0x5C and
0x7E, which represent backslash (\) and tilde (~) in ASCII, are mapped
to the Yen sign (=C2=A5) and overline (=E2=80=BE) in Shift_JIS.  Backslash =
(\) and
tilde (~) are multibyte characters in Shift_JIS.

One common problem is that Guile often uses 'scm_from_locale_string' to
create Scheme strings from ASCII-only C string literals.  These should
all be changed to use either 'scm_from_latin1_string' or
'scm_from_utf8_string'.  I prefer the latter because modern C compilers
typically use UTF-8 as the default execution character set, i.e. the
character set used to encode string and character constants, regardless
of the locale settings.  GCC uses UTF-8 by default unless
-fexec-charset=3DCHARSET is given at compile time.  I'd prefer to promote
writing code that works for arbitrary string literals, so that code
needn't be adjusted if non-ASCII characters are later added.

A related set of problems is that Guile often applies
'scm_from_locale_string' to char* arguments passed in from the user, or
produced by third-party libraries.  These issues are more difficult to
address.  We provide several C APIs that accept C strings without
specifying what encoding is expected.  If the string ultimately derives
from a C string constant, we probably want UTF-8, whereas if the string
came from I/O, or program arguments, then we probably want the locale
encoding.

For example, consider 'scm_c_eval_string'.  This has been a public API
function since 2002, but we did not specify the encoding of its C string
argument until 2011.  We chose the locale encoding in this case, which I
think is reasonable, but I also expect that code exists in the wild that
passes a C string literal to 'scm_c_eval_string'.

Until now, problems like this have been mostly harmless, since the C
string literals are typically ASCII-only.  However, if we wish to
support non-ASCII-compatible encodings such as Shift_JIS, we can no
longer consider these problems harmless.  For example, programs which
pass C string literals to 'scm_c_eval_string' will fail when using the
"ja_JP.sjis" locale, if any tildes or backslashes are present.
Backslashes are fairly common in Scheme code.

There's various other code scattered in Guile that assumes ASCII
characters can searched for, and sometimes replaced with other ASCII
characters.  For example, several functions in load.c, including
'search_path', 'load_thunk_from_path' scan through file names in the
locale encoding, scanning the bytes looking for particular ASCII codes
such as '.', '/', and '\'.

On MingW, 'scm_i_mirror_backslashes' in load.c converts backslashes into
forward slashes byte-wise, assuming ASCII-compatibility, and this
transformation is applied to file names in several places.

While looking into this, I also discovered that Guile's S-expression
reader, i.e. the 'read' procedure, assumes an ASCII-compatible port
encoding, despite the fact that it is meant to support arbitrary
encodings such as UTF-16 and UTF-32.  I just filed a related bug
<https://bug.gnu.org/33057> to track this probem.

These are some of the problems that I'm currently aware of.  I expect
that this bug report will remain open for a while.

To begin, I've started working on a patch to change many occurrences of
'scm_from_locale_string' to 'scm_from_utf8_string', in cases where the C
string clearly originates from a C string literal.

Thanks again for the detailed bug report and analysis.

    Regards,
      Mark




Message sent to bug-guile@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#33044: Guile misbehaves in the "ja_JP.sjis" locale
Resent-From: Mark H Weaver <mhw@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-guile@HIDDEN
Resent-Date: Tue, 16 Oct 2018 05:15:02 +0000
Resent-Message-ID: <handler.33044.B33044.153966684621251 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 33044
X-GNU-PR-Package: guile
X-GNU-PR-Keywords: 
To: Tom de Vries <tdevries@HIDDEN>
Cc: 33044 <at> debbugs.gnu.org
Received: via spool by 33044-submit <at> debbugs.gnu.org id=B33044.153966684621251
          (code B ref 33044); Tue, 16 Oct 2018 05:15:02 +0000
Received: (at 33044) by debbugs.gnu.org; 16 Oct 2018 05:14:06 +0000
Received: from localhost ([127.0.0.1]:51723 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1gCHgE-0005Wg-6C
	for submit <at> debbugs.gnu.org; Tue, 16 Oct 2018 01:14:06 -0400
Received: from world.peace.net ([64.112.178.59]:44344)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mhw@HIDDEN>) id 1gCHgB-0005W6-Sh
 for 33044 <at> debbugs.gnu.org; Tue, 16 Oct 2018 01:14:04 -0400
Received: from mhw by world.peace.net with esmtpsa
 (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89)
 (envelope-from <mhw@HIDDEN>)
 id 1gCHg5-0007Z2-Su; Tue, 16 Oct 2018 01:13:58 -0400
From: Mark H Weaver <mhw@HIDDEN>
References: <469f2345-5e76-1fc5-1105-f1d508611140@HIDDEN>
 <8a6a308f-a981-fd46-93d5-c2d2870f4eb4@HIDDEN>
 <87y3ayodqp.fsf_-_@HIDDEN>
Date: Tue, 16 Oct 2018 01:13:43 -0400
In-Reply-To: <87y3ayodqp.fsf_-_@HIDDEN> (Mark H. Weaver's message of "Mon, 
 15 Oct 2018 21:57:02 -0400")
Message-ID: <87tvlmo4mw.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: 0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

Mark H Weaver <mhw@HIDDEN> writes:

> Shift_JIS is _mostly_ ASCII-compatible, except that code points 0x5C and
> 0x7E, which represent backslash (\) and tilde (~) in ASCII, are mapped
> to the Yen sign (=C2=A5) and overline (=E2=80=BE) in Shift_JIS.  Backslas=
h (\) and
> tilde (~) are multibyte characters in Shift_JIS.

Although I wrote above that "Backslash (\) and tilde (~) are multibyte
characters in Shift_JIS", that was admittedly my assumption, based on
the absence of those characters in the "First byte" map shown here:

  https://en.wikipedia.org/wiki/Shift_JIS#As_defined_in_JIS_X_0208:1997

However, now I'm unsure.  I've spent some time attempting to find the
Shift_JIS encodings for backslash and tilde, but I've not yet found an
answer.

I've asked Emacs 26 to write a file containing backslashes and Yen signs
using the "shift_jis" encoding, and both characters seem to be mapped to
the same code: 0x5C.

I've also used the 'iconv' utility from GNU libc to convert backslashes
and Yen signs to Shift_JIS, and it also maps these two characters to the
same codes:

--8<---------------cut here---------------start------------->8---
mhw@jojen ~$ echo '\\=C2=A5=C2=A5' | iconv -f UTF-8 -t SHIFT-JIS > Shift_JI=
S_test.txt
mhw@jojen ~$ hexdump -C Shift_JIS_test.txt
00000000  5c 5c 5c 5c 0a                                    |\\\\.|
00000005
--8<---------------cut here---------------end--------------->8---

While investigating, I found this bug for GNU libc asking to add an SJIS
locale, and the developers were strongly opposed:

  https://bugzilla.redhat.com/show_bug.cgi?id=3D136290

At this point, I'm inclined to believe that Shift_JIS is not suitable as
a locale encoding on POSIX systems, and that we should not try to
support it in Guile.

What do you think?

Can you tell me how backslash and tilde are represented in Shift JIS?

     Regards,
       Mark




Message sent to bug-guile@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#33044: Guile misbehaves in the "ja_JP.sjis" locale
Resent-From: John Cowan <cowan@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-guile@HIDDEN
Resent-Date: Tue, 16 Oct 2018 12:54:02 +0000
Resent-Message-ID: <handler.33044.B33044.153969439914745 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 33044
X-GNU-PR-Package: guile
X-GNU-PR-Keywords: 
To: Mark H Weaver <mhw@HIDDEN>
Cc: 33044 <at> debbugs.gnu.org, tdevries@HIDDEN
Received: via spool by 33044-submit <at> debbugs.gnu.org id=B33044.153969439914745
          (code B ref 33044); Tue, 16 Oct 2018 12:54:02 +0000
Received: (at 33044) by debbugs.gnu.org; 16 Oct 2018 12:53:19 +0000
Received: from localhost ([127.0.0.1]:51927 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1gCOqd-0003pk-7s
	for submit <at> debbugs.gnu.org; Tue, 16 Oct 2018 08:53:19 -0400
Received: from mail-wm1-f43.google.com ([209.85.128.43]:38924)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <cowan@HIDDEN>) id 1gCOqb-0003pX-0v
 for 33044 <at> debbugs.gnu.org; Tue, 16 Oct 2018 08:53:17 -0400
Received: by mail-wm1-f43.google.com with SMTP id y144-v6so23074363wmd.4
 for <33044 <at> debbugs.gnu.org>; Tue, 16 Oct 2018 05:53:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=ccil-org.20150623.gappssmtp.com; s=20150623;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=DeSprrRGY7LEbiPA8DojnO0HQAKBCAeAc6UG38C9g7c=;
 b=tWeluz5bMq6/3d874D2/XKjrIXjXJwUhHnScOYUSwk2QJmwtuxi4VYSJZ621xpMU9E
 dCDVJ0ftbUXT4wH+pisyZ7VhlC5CUlttPwsywDE2HE5SEc2tREPUyM9WvJ67TSa67yI1
 LxhipgPT9rbqdjPEDsQ87dwXFdt1r7S7Q6acICxgM31GlXSQsk6a9QBZWZ5rKu/wf2Sx
 amWiLspi3hmNVKcxiixJjnoYtVx8qqWelqYhzTOKnRCv80MszS0S0jw6XzRz5yh6QVhe
 8pn+h0AEWj1/ZoFfaUsBVbi/5RbiK8gR3F04ayb1cvy8oJ1gcQnS8Q/UktR9QKVqvQsZ
 i22g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=DeSprrRGY7LEbiPA8DojnO0HQAKBCAeAc6UG38C9g7c=;
 b=ud+pt1uaUg0msn8AjprlvKrrSfF5tGJas13VNjXX1iaKxV3M78QQl98epSzKL/3y4P
 fBGtKc+IIeZ1supazHDk61b3qdcPKdFGVX/5GVnXF9Kq5Q7LEIJW+wOKterHYOcBQ7oW
 lGXoJQ8jRg1+FEGbcAQEacioDOUFyl9Z3TWCtvad6+xdncSqUIasJZwLHebjK5/BBpiw
 B3BguTMKWIboKjX+tPZPXmM2pZN0f39FFn/0YrrSr1PvmxgBddA87Q/HNIY2dbqmS7DW
 YWX2IrWw9khS+x75c/3oX+K9VDsEcC5agWrCYNF62AoDuTfAe0wZPbUKyHLZGnJgc51x
 7SrQ==
X-Gm-Message-State: ABuFfojy+OS0+nvQmojhhw8Eh1k9hR3JBf6avL7e4B84e4c6BSkn04+i
 lE9XZF4FtgH3vnhfpPoCTY1nNQOr2BjXTHxgMcxg8Q==
X-Google-Smtp-Source: ACcGV61jeaqIM3fzBS6ihUXfWNKdFkw7h4rqI/pWisD/AiVV8G/A5wWdcpn3ZuZBwZuUV6TEWfGglcHy05Sk+TOR3FI=
X-Received: by 2002:a1c:8154:: with SMTP id
 c81-v6mr16157140wmd.140.1539694390924; 
 Tue, 16 Oct 2018 05:53:10 -0700 (PDT)
MIME-Version: 1.0
References: <469f2345-5e76-1fc5-1105-f1d508611140@HIDDEN>
 <8a6a308f-a981-fd46-93d5-c2d2870f4eb4@HIDDEN>
 <87y3ayodqp.fsf_-_@HIDDEN> <87tvlmo4mw.fsf@HIDDEN>
In-Reply-To: <87tvlmo4mw.fsf@HIDDEN>
From: John Cowan <cowan@HIDDEN>
Date: Tue, 16 Oct 2018 08:52:59 -0400
Message-ID: <CAD2gp_RLS5PcWsN3EpfVh3zCf+dWWFio_BLUd+Ur3JGdpBtA9g@HIDDEN>
Content-Type: multipart/alternative; boundary="0000000000003bf2a805785809e2"
X-Spam-Score: -0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

--0000000000003bf2a805785809e2
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

At this point, I'm inclined to believe that Shift_JIS is not suitable as
> a locale encoding on POSIX systems, and that we should not try to
> support it in Guile.
>
> What do you think?
>
> Can you tell me how backslash and tilde are represented in Shift JIS?
>

They aren't:  iconv is right.  Japanese Windows users are used to seeing
Windows pathnames that look like "C:=C2=A5foo=C2=A5bar", and when writing C=
, to
strings like "first line=C2=A5nsecond line."  So what is happening is that =
the
character at #\x5C is *functionally* a backslash that is *displayed* as a
yen sign.  This is reinforced by the fact that the round-trip mapping from
Shift_JIS #\x5C is U+005C BACKSLASH, whereas U+00A5 YEN SIGN is mapped only
from Unicode (or other encodings) to Shift_JIS, never the other way around.

This is the last survivor of the "national characters" concept of ISO 646,
whereby certain 7-bit characters were interpreted differently in different
countries.  For Scandinavian programmers, for example, blocks in C began
with =C3=A6 and ended with =C3=A5 rather than { and } respectively, and the=
 logical
OR operator was =C3=B8.  In the same way, British and Irish programmers use=
d =C2=A3
instead of # at the beginning of comments in awk and shell programs.  With
the arrival of Latin-{1,2,3,4} this concept was eventually abandoned, and
all systems converged on ISO-646-IRV (the same as US-ASCII) *except*
Japanese systems.

So I recommend that you do what everyone else does and ignore the issue in
JIS-based encodings, of which Shift_JIS is the only one in practical use
(and it _is_ heavily used in Japan, where it is almost the only encoding
for documents on desktops).   Just ignoring the encoding is not an option
in Japan: see the comments by Joel Rees, Norman Diamond, and Ryan Thompson
at the bug you pointed to.

--=20
John Cowan          http://vrici.lojban.org/~cowan        cowan@HIDDEN
In might the Feanorians / that swore the unforgotten oath
brought war into Arvernien / with burning and with broken troth.
and Elwing from her fastness dim / then cast her in the waters wide,
but like a mew was swiftly borne, / uplifted o'er the roaring tide.
        --the Earendillinwe

--0000000000003bf2a805785809e2
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div di=
r=3D"ltr"><div dir=3D"ltr"><br><br><div class=3D"gmail_quote"><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px soli=
d rgb(204,204,204);padding-left:1ex">At this point, I&#39;m inclined to bel=
ieve that Shift_JIS is not suitable as<br>
a locale encoding on POSIX systems, and that we should not try to<br>
support it in Guile.<br>
<br>
What do you think?<br>
<br>
Can you tell me how backslash and tilde are represented in Shift JIS?<br></=
blockquote><div><br></div><div>They aren&#39;t:=C2=A0 iconv is right.=C2=A0=
 Japanese Windows users are used to seeing Windows pathnames that look like=
 &quot;C:=C2=A5foo=C2=A5bar&quot;, and when writing C, to strings like &quo=
t;first line=C2=A5nsecond line.&quot;=C2=A0 So what is happening is that th=
e character at #\x5C is *functionally* a backslash that is *displayed* as a=
 yen sign.=C2=A0 This is reinforced by the fact that the round-trip mapping=
 from Shift_JIS #\x5C is U+005C BACKSLASH, whereas U+00A5 YEN SIGN is mappe=
d only from Unicode (or other encodings) to Shift_JIS, never the other way =
around.</div><div><br></div><div>This is the last survivor of the &quot;nat=
ional characters&quot; concept of ISO 646, whereby certain 7-bit characters=
 were interpreted differently in different countries.=C2=A0 For Scandinavia=
n programmers, for example, blocks in C began with =C3=A6 and ended with =
=C3=A5 rather than { and } respectively, and the logical OR operator was =
=C3=B8.=C2=A0 In the same way, British and Irish=C2=A0programmers used=C2=
=A0=C2=A3 instead of # at the beginning of comments in awk and shell progra=
ms.=C2=A0 With the arrival of Latin-{1,2,3,4} this concept was eventually a=
bandoned, and all systems converged on ISO-646-IRV (the same as US-ASCII) *=
except* Japanese systems.</div><div><br></div><div>So I recommend that you =
do what everyone else does and ignore the issue in JIS-based encodings, of =
which Shift_JIS is the only one in practical use (and it _is_ heavily used =
in Japan, where it is almost the only encoding for documents on desktops).=
=C2=A0 =C2=A0Just ignoring the encoding is not an option in Japan: see the =
comments by Joel Rees, Norman Diamond, and Ryan Thompson at the bug you poi=
nted to.</div><div><br></div><div>--=C2=A0</div><div><div>John Cowan=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 <a href=3D"http://vrici.lojban.org/~cowan">http=
://vrici.lojban.org/~cowan</a>=C2=A0 =C2=A0 =C2=A0 =C2=A0 <a href=3D"mailto=
:cowan@HIDDEN">cowan@HIDDEN</a></div><div>In might the Feanorians / tha=
t swore the unforgotten oath</div><div>brought war into Arvernien / with bu=
rning and with broken troth.</div><div>and Elwing from her fastness dim / t=
hen cast her in the waters wide,</div><div>but like a mew was swiftly borne=
, / uplifted o&#39;er the roaring tide.</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 --the Earendillinwe</div></div><div><br></div></div></div></div></div><=
/div></div></div>

--0000000000003bf2a805785809e2--




Message sent to bug-guile@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#33044: Guile misbehaves in the "ja_JP.sjis" locale
Resent-From: Tom de Vries <tdevries@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-guile@HIDDEN
Resent-Date: Tue, 16 Oct 2018 23:28:02 +0000
Resent-Message-ID: <handler.33044.B33044.153973245430451 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 33044
X-GNU-PR-Package: guile
X-GNU-PR-Keywords: 
To: Mark H Weaver <mhw@HIDDEN>
Cc: 33044 <at> debbugs.gnu.org
Received: via spool by 33044-submit <at> debbugs.gnu.org id=B33044.153973245430451
          (code B ref 33044); Tue, 16 Oct 2018 23:28:02 +0000
Received: (at 33044) by debbugs.gnu.org; 16 Oct 2018 23:27:34 +0000
Received: from localhost ([127.0.0.1]:54110 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1gCYkP-0007v5-Nq
	for submit <at> debbugs.gnu.org; Tue, 16 Oct 2018 19:27:34 -0400
Received: from mx2.suse.de ([195.135.220.15]:41246 helo=mx1.suse.de)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <tdevries@HIDDEN>) id 1gCYkN-0007un-QI
 for 33044 <at> debbugs.gnu.org; Tue, 16 Oct 2018 19:27:32 -0400
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id 8F315B00D;
 Tue, 16 Oct 2018 23:27:25 +0000 (UTC)
References: <469f2345-5e76-1fc5-1105-f1d508611140@HIDDEN>
 <8a6a308f-a981-fd46-93d5-c2d2870f4eb4@HIDDEN> <87y3ayodqp.fsf_-_@HIDDEN>
From: Tom de Vries <tdevries@HIDDEN>
Message-ID: <25e93980-834d-cf91-cabf-77c2bb9a31c5@HIDDEN>
Date: Wed, 17 Oct 2018 01:27:33 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.2.1
MIME-Version: 1.0
In-Reply-To: <87y3ayodqp.fsf_-_@HIDDEN>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.3 (--)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

On 10/16/18 3:57 AM, Mark H Weaver wrote:
> retitle 33044 Guile misbehaves in the "ja_JP.sjis" locale
> thanks
> 
> Hi Tom,
> 
> Thanks for the report, analysis and patch.  I agree with your analysis,
> and the patch looks good.
> 

If so, can the patch be committed?

I'm running into this problem in the context of gdb, which fails like this:
...
$ LC_CTYPE=ja_JP.sjis gdb".
Segmentation fault (core dumped)
...

So, gdb (which has a dependency on libguile) aborts because of guile
initialization, without gdb actually using the guile functionality, and
the patch fixes this.

> However, there's also a much deeper problem here.  You found and fixed
> one occurrence of Guile assuming that the locale encoding is ASCII-
> compatible.  In fact, this assumption is widespread in Guile, and I
> would guess that it's widespread throughout the POSIX world.
> 
> I admit that before I saw your message, I believed that it was
> legitimate to assume that the locale encoding was ASCII-compatible.  Now
> I'm unsure, although I'll note that according to the 'localedef' utility
> from GNU libc, this locale is "not ISO C compliant".  It printed the
> following message when I asked it to generate the "ja_JP.sjis" locale:
> 
>   [warning] character map `SHIFT_JIS' is not ASCII compatible, locale not ISO C compliant [--no-warnings=ascii]
> 
> Shift_JIS is _mostly_ ASCII-compatible, except that code points 0x5C and
> 0x7E, which represent backslash (\) and tilde (~) in ASCII, are mapped
> to the Yen sign (¥) and overline (‾) in Shift_JIS.  Backslash (\) and
> tilde (~) are multibyte characters in Shift_JIS.
> 
> One common problem is that Guile often uses 'scm_from_locale_string' to
> create Scheme strings from ASCII-only C string literals.  These should
> all be changed to use either 'scm_from_latin1_string' or
> 'scm_from_utf8_string'.  I prefer the latter because modern C compilers
> typically use UTF-8 as the default execution character set, i.e. the
> character set used to encode string and character constants, regardless
> of the locale settings.  GCC uses UTF-8 by default unless
> -fexec-charset=CHARSET is given at compile time.  I'd prefer to promote
> writing code that works for arbitrary string literals, so that code
> needn't be adjusted if non-ASCII characters are later added.
> 
> A related set of problems is that Guile often applies
> 'scm_from_locale_string' to char* arguments passed in from the user, or
> produced by third-party libraries.  These issues are more difficult to
> address.  We provide several C APIs that accept C strings without
> specifying what encoding is expected.  If the string ultimately derives
> from a C string constant, we probably want UTF-8, whereas if the string
> came from I/O, or program arguments, then we probably want the locale
> encoding.
> 
> For example, consider 'scm_c_eval_string'.  This has been a public API
> function since 2002, but we did not specify the encoding of its C string
> argument until 2011.  We chose the locale encoding in this case, which I
> think is reasonable, but I also expect that code exists in the wild that
> passes a C string literal to 'scm_c_eval_string'.
> 
> Until now, problems like this have been mostly harmless, since the C
> string literals are typically ASCII-only.  However, if we wish to
> support non-ASCII-compatible encodings such as Shift_JIS, we can no
> longer consider these problems harmless.  For example, programs which
> pass C string literals to 'scm_c_eval_string' will fail when using the
> "ja_JP.sjis" locale, if any tildes or backslashes are present.
> Backslashes are fairly common in Scheme code.
> 
> There's various other code scattered in Guile that assumes ASCII
> characters can searched for, and sometimes replaced with other ASCII
> characters.  For example, several functions in load.c, including
> 'search_path', 'load_thunk_from_path' scan through file names in the
> locale encoding, scanning the bytes looking for particular ASCII codes
> such as '.', '/', and '\'.
> 
> On MingW, 'scm_i_mirror_backslashes' in load.c converts backslashes into
> forward slashes byte-wise, assuming ASCII-compatibility, and this
> transformation is applied to file names in several places.
> 
> While looking into this, I also discovered that Guile's S-expression
> reader, i.e. the 'read' procedure, assumes an ASCII-compatible port
> encoding, despite the fact that it is meant to support arbitrary
> encodings such as UTF-16 and UTF-32.  I just filed a related bug
> <https://bug.gnu.org/33057> to track this probem.
> 
> These are some of the problems that I'm currently aware of.  I expect
> that this bug report will remain open for a while.
> 
> To begin, I've started working on a patch to change many occurrences of
> 'scm_from_locale_string' to 'scm_from_utf8_string', in cases where the C
> string clearly originates from a C string literal.
> 

Thanks for the elaboration here, that's helpful for me.

Thanks,
- Tom




Message sent to bug-guile@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#33044: Guile misbehaves in the "ja_JP.sjis" locale
Resent-From: Tom de Vries <tdevries@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-guile@HIDDEN
Resent-Date: Tue, 16 Oct 2018 23:39:02 +0000
Resent-Message-ID: <handler.33044.B33044.153973309631888 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 33044
X-GNU-PR-Package: guile
X-GNU-PR-Keywords: 
To: Mark H Weaver <mhw@HIDDEN>
Cc: 33044 <at> debbugs.gnu.org
Received: via spool by 33044-submit <at> debbugs.gnu.org id=B33044.153973309631888
          (code B ref 33044); Tue, 16 Oct 2018 23:39:02 +0000
Received: (at 33044) by debbugs.gnu.org; 16 Oct 2018 23:38:16 +0000
Received: from localhost ([127.0.0.1]:54115 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1gCYul-0008IF-Rw
	for submit <at> debbugs.gnu.org; Tue, 16 Oct 2018 19:38:16 -0400
Received: from mx2.suse.de ([195.135.220.15]:42036 helo=mx1.suse.de)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <tdevries@HIDDEN>) id 1gCYuj-0008Hz-Tc
 for 33044 <at> debbugs.gnu.org; Tue, 16 Oct 2018 19:38:14 -0400
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay1.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id A763DAFDD;
 Tue, 16 Oct 2018 23:38:07 +0000 (UTC)
References: <469f2345-5e76-1fc5-1105-f1d508611140@HIDDEN>
 <8a6a308f-a981-fd46-93d5-c2d2870f4eb4@HIDDEN> <87y3ayodqp.fsf_-_@HIDDEN>
 <87tvlmo4mw.fsf@HIDDEN>
From: Tom de Vries <tdevries@HIDDEN>
Message-ID: <878acb4d-1f76-5fa1-6ca5-4cd876342912@HIDDEN>
Date: Wed, 17 Oct 2018 01:38:16 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.2.1
MIME-Version: 1.0
In-Reply-To: <87tvlmo4mw.fsf@HIDDEN>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Spam-Score: -2.3 (--)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

On 10/16/18 7:13 AM, Mark H Weaver wrote:
> While investigating, I found this bug for GNU libc asking to add an SJIS
> locale, and the developers were strongly opposed:
> 
>   https://bugzilla.redhat.com/show_bug.cgi?id=136290
> 

Thanks for the pointer, that was interesting reading.

> At this point, I'm inclined to believe that Shift_JIS is not suitable as
> a locale encoding on POSIX systems, and that we should not try to
> support it in Guile.
> 
> What do you think?

My interest here is limited to fixing a gdb regression, so for me not
supporting it is fine. Though we should have a reasonable failure mode
(so, not abort as is the case for this report).

Thanks,
- Tom




Message sent to bug-guile@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#33044: Guile misbehaves in the "ja_JP.sjis" locale
Resent-From: Tom de Vries <tdevries@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-guile@HIDDEN
Resent-Date: Wed, 17 Oct 2018 07:01:01 +0000
Resent-Message-ID: <handler.33044.B33044.153975963721506 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 33044
X-GNU-PR-Package: guile
X-GNU-PR-Keywords: 
To: Mark H Weaver <mhw@HIDDEN>
Cc: 33044 <at> debbugs.gnu.org
Received: via spool by 33044-submit <at> debbugs.gnu.org id=B33044.153975963721506
          (code B ref 33044); Wed, 17 Oct 2018 07:01:01 +0000
Received: (at 33044) by debbugs.gnu.org; 17 Oct 2018 07:00:37 +0000
Received: from localhost ([127.0.0.1]:54313 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1gCfor-0005ao-6x
	for submit <at> debbugs.gnu.org; Wed, 17 Oct 2018 03:00:37 -0400
Received: from mx2.suse.de ([195.135.220.15]:45018 helo=mx1.suse.de)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <tdevries@HIDDEN>) id 1gCfop-0005aa-PX
 for 33044 <at> debbugs.gnu.org; Wed, 17 Oct 2018 03:00:36 -0400
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id DCD65AD70;
 Wed, 17 Oct 2018 07:00:29 +0000 (UTC)
References: <469f2345-5e76-1fc5-1105-f1d508611140@HIDDEN>
 <8a6a308f-a981-fd46-93d5-c2d2870f4eb4@HIDDEN> <87y3ayodqp.fsf_-_@HIDDEN>
 <87tvlmo4mw.fsf@HIDDEN>
From: Tom de Vries <tdevries@HIDDEN>
Message-ID: <3231389e-fe6a-7fda-10ff-89a34f010e20@HIDDEN>
Date: Wed, 17 Oct 2018 09:00:38 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.2.1
MIME-Version: 1.0
In-Reply-To: <87tvlmo4mw.fsf@HIDDEN>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Spam-Score: -2.3 (--)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

On 10/16/18 7:13 AM, Mark H Weaver wrote:
> While investigating, I found this bug for GNU libc asking to add an SJIS
> locale, and the developers were strongly opposed:
> 
>   https://bugzilla.redhat.com/show_bug.cgi?id=136290

FTR, that's a discussion of Fedora/RedHat developers.

This OpenSuse/Suse PR ( "wrong "locale -a" output for
ja_JP.SHIFT_JISX0213 and hy_AM.armscii-8" ,
https://bugzilla.opensuse.org/show_bug.cgi?id=162501 ) also discusses
this topic and links to related glibc maintainers discussions:
- http://www.sourceware.org/ml/libc-locales/2006-q3/msg00054.html
- http://www.sourceware.org/ml/libc-alpha/2000-10/msg00311.html

Thanks,
- Tom




Message sent to bug-guile@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#33044: Guile misbehaves in the "ja_JP.sjis" locale
Resent-From: Mark H Weaver <mhw@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-guile@HIDDEN
Resent-Date: Thu, 18 Oct 2018 01:57:01 +0000
Resent-Message-ID: <handler.33044.B33044.153982782122315 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 33044
X-GNU-PR-Package: guile
X-GNU-PR-Keywords: 
To: Tom de Vries <tdevries@HIDDEN>
Cc: 33044 <at> debbugs.gnu.org
Received: via spool by 33044-submit <at> debbugs.gnu.org id=B33044.153982782122315
          (code B ref 33044); Thu, 18 Oct 2018 01:57:01 +0000
Received: (at 33044) by debbugs.gnu.org; 18 Oct 2018 01:57:01 +0000
Received: from localhost ([127.0.0.1]:56022 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1gCxYa-0005nq-T6
	for submit <at> debbugs.gnu.org; Wed, 17 Oct 2018 21:57:01 -0400
Received: from world.peace.net ([64.112.178.59]:33398)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mhw@HIDDEN>) id 1gCxYZ-0005nb-F5
 for 33044 <at> debbugs.gnu.org; Wed, 17 Oct 2018 21:56:59 -0400
Received: from mhw by world.peace.net with esmtpsa
 (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89)
 (envelope-from <mhw@HIDDEN>)
 id 1gCxYT-0002Fq-GY; Wed, 17 Oct 2018 21:56:53 -0400
From: Mark H Weaver <mhw@HIDDEN>
References: <469f2345-5e76-1fc5-1105-f1d508611140@HIDDEN>
 <8a6a308f-a981-fd46-93d5-c2d2870f4eb4@HIDDEN>
 <87y3ayodqp.fsf_-_@HIDDEN>
 <25e93980-834d-cf91-cabf-77c2bb9a31c5@HIDDEN>
Date: Wed, 17 Oct 2018 21:56:37 -0400
In-Reply-To: <25e93980-834d-cf91-cabf-77c2bb9a31c5@HIDDEN> (Tom de Vries's
 message of "Wed, 17 Oct 2018 01:27:33 +0200")
Message-ID: <871s8o10h6.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Score: 0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

Hi Tom,

Tom de Vries <tdevries@HIDDEN> writes:

> On 10/16/18 3:57 AM, Mark H Weaver wrote:
>> Thanks for the report, analysis and patch.  I agree with your analysis,
>> and the patch looks good.
>> 
>
> If so, can the patch be committed?

I just pushed commit c2a654b7d29f5e2f32fd1313cc80162fd0c8f992 to the
stable-2.2 branch, which includes the fix from your patch (although I
used 'scm_from_utf8_string' instead of 'scm_from_latin1_string'), and
many other instances of the same problem.  These fixes will be in the
upcoming guile-2.2.5 release.

Does that address the problem for you?

I'll leave this bug open at least until 'seed->random-state your-seed'
is fixed to support wide strings.

Thanks again,

     Mark




Message sent to bug-guile@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#33044: Guile misbehaves in the "ja_JP.sjis" locale
Resent-From: Tom de Vries <tdevries@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-guile@HIDDEN
Resent-Date: Thu, 18 Oct 2018 10:27:01 +0000
Resent-Message-ID: <handler.33044.B33044.153985837214578 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 33044
X-GNU-PR-Package: guile
X-GNU-PR-Keywords: 
To: Mark H Weaver <mhw@HIDDEN>
Cc: 33044 <at> debbugs.gnu.org
Received: via spool by 33044-submit <at> debbugs.gnu.org id=B33044.153985837214578
          (code B ref 33044); Thu, 18 Oct 2018 10:27:01 +0000
Received: (at 33044) by debbugs.gnu.org; 18 Oct 2018 10:26:12 +0000
Received: from localhost ([127.0.0.1]:56141 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1gD5VM-0003n4-5H
	for submit <at> debbugs.gnu.org; Thu, 18 Oct 2018 06:26:12 -0400
Received: from mx2.suse.de ([195.135.220.15]:49768 helo=mx1.suse.de)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <tdevries@HIDDEN>) id 1gD5VK-0003mp-5I
 for 33044 <at> debbugs.gnu.org; Thu, 18 Oct 2018 06:26:10 -0400
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id 49324B0B4;
 Thu, 18 Oct 2018 10:26:04 +0000 (UTC)
References: <469f2345-5e76-1fc5-1105-f1d508611140@HIDDEN>
 <8a6a308f-a981-fd46-93d5-c2d2870f4eb4@HIDDEN> <87y3ayodqp.fsf_-_@HIDDEN>
 <25e93980-834d-cf91-cabf-77c2bb9a31c5@HIDDEN> <871s8o10h6.fsf@HIDDEN>
From: Tom de Vries <tdevries@HIDDEN>
Message-ID: <673d7121-9d70-5856-511d-2e9c4ef49ec9@HIDDEN>
Date: Thu, 18 Oct 2018 12:26:12 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.2.1
MIME-Version: 1.0
In-Reply-To: <871s8o10h6.fsf@HIDDEN>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Spam-Score: -2.3 (--)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

On 10/18/18 3:56 AM, Mark H Weaver wrote:
> Hi Tom,
> 
> Tom de Vries <tdevries@HIDDEN> writes:
> 
>> On 10/16/18 3:57 AM, Mark H Weaver wrote:
>>> Thanks for the report, analysis and patch.  I agree with your analysis,
>>> and the patch looks good.
>>>
>>
>> If so, can the patch be committed?
> 
> I just pushed commit c2a654b7d29f5e2f32fd1313cc80162fd0c8f992 to the
> stable-2.2 branch,

Thanks!

> which includes the fix from your patch (although I
> used 'scm_from_utf8_string' instead of 'scm_from_latin1_string'),

Right, that should give the same result on that string.

> and
> many other instances of the same problem.  These fixes will be in the
> upcoming guile-2.2.5 release.
> 
> Does that address the problem for you?
> 
There are two pecularities I'm affected by:
- I'm not able to build from git (and I haven't found instructions
  listing in the README what the correct auto* invocation is). So,
  tarballs only.
- gdb does not support guile 2.2 (so I'd need a backport of this fix to
  stable-2.0 branch). See also "PR21104 - 7.12.1 does not compile with
  latest guile (2.1.6)"
  ( https://sourceware.org/bugzilla/show_bug.cgi?id=21104 ).

As for testing, I've done the following:
- applied the patch onto the 2.2 tarball, build and ran tests, and ran
  hello.scm reproducer
- ported the patch to the 2.0 tarball, build and ran tests, and ran
  hello.scm reproducer
- build gdb master against the the 2.0 tarball build, ran gdb guile
  tests, and ran the gdb reproducer.

This is as far as I can take it, and all LGTM.

Thanks,
- Tom

> I'll leave this bug open at least until 'seed->random-state your-seed'
> is fixed to support wide strings.
> 
> Thanks again,
> 
>      Mark
> 




Message sent to bug-guile@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#33044: Guile misbehaves in the "ja_JP.sjis" locale
Resent-From: Mark H Weaver <mhw@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-guile@HIDDEN
Resent-Date: Sat, 20 Oct 2018 02:26:02 +0000
Resent-Message-ID: <handler.33044.B33044.15400023112036 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 33044
X-GNU-PR-Package: guile
X-GNU-PR-Keywords: 
To: Tom de Vries <tdevries@HIDDEN>
Cc: 33044 <at> debbugs.gnu.org
Received: via spool by 33044-submit <at> debbugs.gnu.org id=B33044.15400023112036
          (code B ref 33044); Sat, 20 Oct 2018 02:26:02 +0000
Received: (at 33044) by debbugs.gnu.org; 20 Oct 2018 02:25:11 +0000
Received: from localhost ([127.0.0.1]:60075 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1gDgwx-0000Wm-9s
	for submit <at> debbugs.gnu.org; Fri, 19 Oct 2018 22:25:11 -0400
Received: from world.peace.net ([64.112.178.59]:46440)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mhw@HIDDEN>) id 1gDgwv-0000WT-Ex
 for 33044 <at> debbugs.gnu.org; Fri, 19 Oct 2018 22:25:09 -0400
Received: from mhw by world.peace.net with esmtpsa
 (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89)
 (envelope-from <mhw@HIDDEN>)
 id 1gDgwp-0001T4-GX; Fri, 19 Oct 2018 22:25:03 -0400
From: Mark H Weaver <mhw@HIDDEN>
References: <469f2345-5e76-1fc5-1105-f1d508611140@HIDDEN>
 <8a6a308f-a981-fd46-93d5-c2d2870f4eb4@HIDDEN>
 <87y3ayodqp.fsf_-_@HIDDEN>
 <25e93980-834d-cf91-cabf-77c2bb9a31c5@HIDDEN>
 <871s8o10h6.fsf@HIDDEN>
Date: Fri, 19 Oct 2018 22:24:48 -0400
In-Reply-To: <871s8o10h6.fsf@HIDDEN> (Mark H. Weaver's message of "Wed, 17
 Oct 2018 21:56:37 -0400")
Message-ID: <87r2gl4aof.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Score: 0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

Mark H Weaver <mhw@HIDDEN> writes:
> I'll leave this bug open at least until 'seed->random-state your-seed'
> is fixed to support wide strings.

This part is now fixed in commit
fbdcf6358519c415bd2041ca09bee9b16e9d528a on the stable-2.2 branch.

      Mark




Message sent to bug-guile@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#33044: Reproduced using guile binary
Resent-From: Tom de Vries <tdevries@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-guile@HIDDEN
Resent-Date: Sun, 21 Oct 2018 16:25:01 +0000
Resent-Message-ID: <handler.33044.B33044.154013908232725 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 33044
X-GNU-PR-Package: guile
X-GNU-PR-Keywords: 
To: 33044 <at> debbugs.gnu.org, Mark H Weaver <mhw@HIDDEN>
Received: via spool by 33044-submit <at> debbugs.gnu.org id=B33044.154013908232725
          (code B ref 33044); Sun, 21 Oct 2018 16:25:01 +0000
Received: (at 33044) by debbugs.gnu.org; 21 Oct 2018 16:24:42 +0000
Received: from localhost ([127.0.0.1]:34528 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1gEGWw-0008Vl-6C
	for submit <at> debbugs.gnu.org; Sun, 21 Oct 2018 12:24:42 -0400
Received: from mx2.suse.de ([195.135.220.15]:44396 helo=mx1.suse.de)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <tdevries@HIDDEN>) id 1gEGWv-0008VX-0Y
 for 33044 <at> debbugs.gnu.org; Sun, 21 Oct 2018 12:24:41 -0400
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id 37201ADF1;
 Sun, 21 Oct 2018 16:24:35 +0000 (UTC)
From: Tom de Vries <tdevries@HIDDEN>
References: <a0656d24-cc1a-f6e5-ab16-145ab43c4510@HIDDEN>
Message-ID: <c2eb0736-8b5f-0d09-c6b4-5fdc0c333df1@HIDDEN>
Date: Sun, 21 Oct 2018 18:24:45 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.2.1
MIME-Version: 1.0
In-Reply-To: <a0656d24-cc1a-f6e5-ab16-145ab43c4510@HIDDEN>
Content-Type: multipart/mixed; boundary="------------C1571D98ED05B116325B5B0F"
Content-Language: en-US
X-Spam-Score: -2.3 (--)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

This is a multi-part message in MIME format.
--------------C1571D98ED05B116325B5B0F
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

On 10/15/18 4:20 PM, Tom de Vries wrote:
> Hi,
> 
> Using a simple scheme hello world:
> ...
> $ cat hello.scm
> (display "hello world")
> (newline)
> ...
> we're able to reproduce the problem using the guile binary:
> ....
> $ LC_CTYPE=ja_JP.sjis /home/vries/guile/2.2/install/bin/guile -s hello.scm
> Segmentation fault (core dumped)
> ...
> 
> [ Note: When using 2.0, we need to set GUILE_INSTALL_LOCALE=1 in the
> environment, otherwise the 'LC_CTYPE=ja_JP.sjis' setting has no effect. ]
> 

I managed to create a testcase for this, patch attached.

Tested on master for x86_64, where it fails.

Thanks,
- Tom


--------------C1571D98ED05B116325B5B0F
Content-Type: text/x-patch;
 name="0001-Add-standalone-test-test-ja_JP.sjis.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="0001-Add-standalone-test-test-ja_JP.sjis.patch"

Add standalone test test-ja_JP.sjis

Test for <https://bugs.gnu.org/33044>.

* test-suite/standalone/test-ja_JP.sjis: New test.
* test-suite/standalone/Makefile.am: Add test-ja_JP.sjis.

---
 test-suite/standalone/Makefile.am     | 4 ++++
 test-suite/standalone/test-ja_JP.sjis | 8 ++++++++
 2 files changed, 12 insertions(+)

diff --git a/test-suite/standalone/Makefile.am b/test-suite/standalone/Makefile.am
index 2aba708da..c5ce4bccb 100644
--- a/test-suite/standalone/Makefile.am
+++ b/test-suite/standalone/Makefile.am
@@ -183,6 +183,10 @@ TESTS += test-mb-regexp
 check_SCRIPTS += test-use-srfi
 TESTS += test-use-srfi
 
+# test-ja_JP.sjis
+check_SCRIPTS += test-ja_JP.sjis
+TESTS += test-ja_JP.sjis
+
 # test-scm-c-read
 test_scm_c_read_SOURCES = test-scm-c-read.c
 test_scm_c_read_CFLAGS = ${test_cflags}
diff --git a/test-suite/standalone/test-ja_JP.sjis b/test-suite/standalone/test-ja_JP.sjis
new file mode 100755
index 000000000..4b7ba0d88
--- /dev/null
+++ b/test-suite/standalone/test-ja_JP.sjis
@@ -0,0 +1,8 @@
+#!/bin/sh
+# Test whether guile can run initialization code using ja_JP.sjis locale
+# (bug #33044).
+unset LC_ALL
+export LC_CTYPE
+LC_CTYPE=ja_JP.sjis
+exec guile -q -s "$0" "$@"
+!#

--------------C1571D98ED05B116325B5B0F--





Last modified: Mon, 25 Nov 2019 12:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.