GNU bug report logs - #11187
multibyte: fmt: Fix incorrect width handling of multibyte

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: coreutils; Severity: wishlist; Reported by: Vladimir 'φ-coder/phcoder' Serbinenko <phcoder@HIDDEN>; Keywords: patch; dated Thu, 5 Apr 2012 18:23:01 UTC; Maintainer for coreutils is bug-coreutils@HIDDEN.
Changed bug title to 'multibyte: fmt: Fix incorrect width handling of multibyte' from '[PATCH] Fix incorrect width handling of multibyte characters in fmt' Request was from Assaf Gordon <assafgordon@HIDDEN> to control <at> debbugs.gnu.org. Full text available.
Severity set to 'wishlist' from 'normal' Request was from Assaf Gordon <assafgordon@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 11187 <at> debbugs.gnu.org:


Received: (at 11187) by debbugs.gnu.org; 6 Apr 2012 06:35:35 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Apr 06 02:35:35 2012
Received: from localhost ([127.0.0.1]:41994 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1SG2lh-00084r-P2
	for submit <at> debbugs.gnu.org; Fri, 06 Apr 2012 02:35:35 -0400
Received: from mx.meyering.net ([88.168.87.75]:41581)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <jim@HIDDEN>) id 1SG2la-00084e-Tw
	for 11187 <at> debbugs.gnu.org; Fri, 06 Apr 2012 02:35:31 -0400
Received: from rho.meyering.net (localhost.localdomain [127.0.0.1])
	by rho.meyering.net (Acme Bit-Twister) with ESMTP id D05946008A;
	Fri,  6 Apr 2012 08:34:46 +0200 (CEST)
From: Jim Meyering <jim@HIDDEN>
To: =?utf-8?Q?Vladimir_'=CF=86-coder=2Fphcoder'_Serbinenko?=
	<phcoder@HIDDEN>
Subject: Re: bug#11187: [PATCH] Fix incorrect width handling of multibyte
	characters in fmt
In-Reply-To: <4F7E240F.3030401@HIDDEN> ("Vladimir =?utf-8?Q?'=CF=86-cod?=
	=?utf-8?Q?er=2Fphcoder'?=
	Serbinenko"'s message of "Fri, 06 Apr 2012 01:00:31 +0200")
References: <4F7DE02D.9050106@HIDDEN> <87d37mq9lz.fsf@HIDDEN>
	<4F7E240F.3030401@HIDDEN>
Date: Fri, 06 Apr 2012 08:34:46 +0200
Message-ID: <87hawxpdvd.fsf@HIDDEN>
Lines: 18
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -1.9 (-)
X-Debbugs-Envelope-To: 11187
Cc: 11187 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -1.9 (-)

Vladimir '=CF=86-coder/phcoder' Serbinenko wrote:
> On 05.04.2012 21:09, Jim Meyering wrote:
>> Vladimir "'=CF=86-coder/phcoder'" Serbinenko wrote:
>>> Currently fmt assumes that 1 byte=3D 1 column which creates wrongly
>>> formatted strings. Attached patch fixes it
>> Hi Vlad,
>>
>> Thank you for contributing.
>> This is a large enough change that we'll need an FSF copyright
>> assignment from you.  If you haven't already sent in the one for
>> gnulib, please just add coreutils to the list of affected projects.
>> (you can do up to 4 projects at a time)
> Ok, will do so. I'll also wait till more or less definitive version is
> ready for gnulib before updating the one for coreutils.
> Can I add TP in the same time?

Translation Project?  I don't know if they use the same forms/addresses.
Please ask coordinator@HIDDEN to be sure.




Information forwarded to bug-coreutils@HIDDEN:
bug#11187; Package coreutils. Full text available.

Message received at 11187 <at> debbugs.gnu.org:


Received: (at 11187) by debbugs.gnu.org; 5 Apr 2012 23:01:27 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Apr 05 19:01:27 2012
Received: from localhost ([127.0.0.1]:41847 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1SFvgB-0006My-32
	for submit <at> debbugs.gnu.org; Thu, 05 Apr 2012 19:01:27 -0400
Received: from mail-wg0-f46.google.com ([74.125.82.46]:60928)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <phcoder@HIDDEN>) id 1SFvg4-0006Mn-KH
	for 11187 <at> debbugs.gnu.org; Thu, 05 Apr 2012 19:01:21 -0400
Received: by wgbdq11 with SMTP id dq11so1753542wgb.15
	for <11187 <at> debbugs.gnu.org>; Thu, 05 Apr 2012 16:00:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=message-id:date:from:user-agent:mime-version:to:cc:subject
	:references:in-reply-to:x-enigmail-version:content-type;
	bh=8kTQMeLu+IdKmDB5hf7Oie49SLasrt3mDmQq1LxOvkA=;
	b=Mp0HpLzJTbBSo8j3R9yI8UXQu7CeO/C7pKGHNKa4V8PxN8MHo+QAp2B7AglgLowqB3
	cum0ztfISb0iqLa4DibD9gAASb8w/3fmX7KFqB9L1StRvaius0y4noFl7mOm5lxI5y2K
	lvhpZCugzdMUz8nXA9RVEVJuCDonqoidpGXBKX9CtgQQY2RdgaRxilxwytuu8L7hnY95
	V1JFmflQzs0xZMqXQpR3nmuP9FJTG0ZF+fkHJs6iLjqyuRhgZ18iGkHwI+KUKHCnxOnC
	zyQ9v2oX8L4cFJyKfFIDeIHh6Gfyk/bVdFST7+HCzTmsLFcV5GgOxOr0kDjSwuckMNjP
	Nr2g==
Received: by 10.216.137.27 with SMTP id x27mr2729129wei.70.1333666840576;
	Thu, 05 Apr 2012 16:00:40 -0700 (PDT)
Received: from debian.x201.phnet (9-233.197-178.cust.bluewin.ch.
	[178.197.233.9])
	by mx.google.com with ESMTPS id l5sm1149998wia.11.2012.04.05.16.00.38
	(version=TLSv1/SSLv3 cipher=OTHER);
	Thu, 05 Apr 2012 16:00:39 -0700 (PDT)
Message-ID: <4F7E240F.3030401@HIDDEN>
Date: Fri, 06 Apr 2012 01:00:31 +0200
From: =?UTF-8?B?VmxhZGltaXIgJ8+GLWNvZGVyL3BoY29kZXInIFNlcmJpbmVua28=?=
	<phcoder@HIDDEN>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:10.0.3) Gecko/20120329 Icedove/10.0.3
MIME-Version: 1.0
To: Jim Meyering <jim@HIDDEN>
Subject: Re: bug#11187: [PATCH] Fix incorrect width handling of multibyte
	characters in fmt
References: <4F7DE02D.9050106@HIDDEN> <87d37mq9lz.fsf@HIDDEN>
In-Reply-To: <87d37mq9lz.fsf@HIDDEN>
X-Enigmail-Version: 1.4
Content-Type: multipart/signed; micalg=pgp-sha512;
	protocol="application/pgp-signature";
	boundary="------------enig870083F86BDD4B965D893D0A"
X-Spam-Score: -2.6 (--)
X-Debbugs-Envelope-To: 11187
Cc: 11187 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -2.6 (--)

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig870083F86BDD4B965D893D0A
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 05.04.2012 21:09, Jim Meyering wrote:
> Vladimir "'=CF=86-coder/phcoder'" Serbinenko wrote:
>> Currently fmt assumes that 1 byte=3D 1 column which creates wrongly
>> formatted strings. Attached patch fixes it
> Hi Vlad,
>
> Thank you for contributing.
> This is a large enough change that we'll need an FSF copyright
> assignment from you.  If you haven't already sent in the one for
> gnulib, please just add coreutils to the list of affected projects.
> (you can do up to 4 projects at a time)
Ok, will do so. I'll also wait till more or less definitive version is
ready for gnulib before updating the one for coreutils.
Can I add TP in the same time?
--=20
Regards
Vladimir '=CF=86-coder/phcoder' Serbinenko


--------------enig870083F86BDD4B965D893D0A
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iF4EAREKAAYFAk9+JA8ACgkQNak7dOguQgnWpAD/dmi/XvgapYVlEkEuGMgyJ14W
lfYKn/zqy8/q7cctDWAA/3u79vWylWDSEMqXATZtvxz6KDcbQ55nGXMCHQqQcU1L
=/KTg
-----END PGP SIGNATURE-----

--------------enig870083F86BDD4B965D893D0A--




Information forwarded to bug-coreutils@HIDDEN:
bug#11187; Package coreutils. Full text available.

Message received at 11187 <at> debbugs.gnu.org:


Received: (at 11187) by debbugs.gnu.org; 5 Apr 2012 19:09:55 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Apr 05 15:09:55 2012
Received: from localhost ([127.0.0.1]:41770 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1SFs4A-0001EK-7N
	for submit <at> debbugs.gnu.org; Thu, 05 Apr 2012 15:09:54 -0400
Received: from mx.meyering.net ([88.168.87.75]:39840)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <jim@HIDDEN>) id 1SFs47-0001EC-Ct
	for 11187 <at> debbugs.gnu.org; Thu, 05 Apr 2012 15:09:53 -0400
Received: from rho.meyering.net (localhost.localdomain [127.0.0.1])
	by rho.meyering.net (Acme Bit-Twister) with ESMTP id 07F9E602C1;
	Thu,  5 Apr 2012 21:09:12 +0200 (CEST)
From: Jim Meyering <jim@HIDDEN>
To: Vladimir =?utf-8?Q?'=CF=86-coder=2Fphcoder'?= Serbinenko
	<phcoder@HIDDEN>
Subject: Re: bug#11187: [PATCH] Fix incorrect width handling of multibyte
	characters in fmt
In-Reply-To: <4F7DE02D.9050106@HIDDEN> ("Vladimir =?utf-8?Q?=5C=22'?=
	=?utf-8?Q?=CF=86-coder=2Fphcoder'=5C=22?=
	Serbinenko"'s message of "Thu, 05 Apr 2012 20:10:53 +0200")
References: <4F7DE02D.9050106@HIDDEN>
Date: Thu, 05 Apr 2012 21:09:12 +0200
Message-ID: <87d37mq9lz.fsf@HIDDEN>
Lines: 155
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -1.9 (-)
X-Debbugs-Envelope-To: 11187
Cc: 11187 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -1.9 (-)

Vladimir "'=CF=86-coder/phcoder'" Serbinenko wrote:
> Currently fmt assumes that 1 byte=3D 1 column which creates wrongly
> formatted strings. Attached patch fixes it

Hi Vlad,

Thank you for contributing.
This is a large enough change that we'll need an FSF copyright
assignment from you.  If you haven't already sent in the one for
gnulib, please just add coreutils to the list of affected projects.
(you can do up to 4 projects at a time)

Here are a few suggested adjustments:

It'd be great to have a test-suite addition that fails without
your patch, yet succeeds with it.  If you simply provide small
sample input/output pairs along with a selected locale name, we
can convert that to an actual test suite script for you.

Also, since this is a NEWS-worthy change, it is customary to
add an entry in the NEWS file, too.  I'd put this in a section
entitled "Improvements".

> diff --git a/src/fmt.c b/src/fmt.c
> index 89d13a6..56f7c0b 100644
> --- a/src/fmt.c
> +++ b/src/fmt.c
> @@ -20,6 +20,7 @@
>  #include <stdio.h>
>  #include <sys/types.h>
>  #include <getopt.h>
> +#include <wchar.h>
>
>  /* Redefine.  Otherwise, systems (Unicos for one) with headers that defi=
ne
>     it to be a type get syntax errors for the variable declaration below.=
  */
> @@ -135,6 +136,7 @@ struct Word
>
>      const char *text;		/* the text of the word */
>      int length;			/* length of this word */
> +    int width;

Please don't follow the bad example there.  This value is always
unsigned, so it is better to use size_t, to match the type of your
new get_display_width function.

>      int space;			/* the size of the following space */
>      unsigned int paren:1;	/* starts with open paren */
>      unsigned int period:1;	/* ends in [.?!])* */
> @@ -259,6 +261,42 @@ static int next_prefix_indent;
>     paragraphs chosen by fmt_paragraph().  */
>  static int last_line_length;
>

Please add a comment saying what the function does/returns,
and naming/describing the arguments.

> +static size_t
> +get_display_width (const char *beg, const char *end)
> +{
> +  const char *ptr;
> +  size_t r =3D 0;
> +  mbstate_t ps;
> +
> +  memset (&ps, 0, sizeof (ps));

We prefer to initialize mbstate_t variables all on one line, like this:

    mbstate_t ps =3D { 0, };

> +  for (ptr =3D beg; *ptr && ptr < end; )
> +    {
> +      wchar_t wc;
> +      size_t s;

Oops.  You've used TABs for indentation.
Note how mixing TABs and spaces makes the indentation look invalid.
Please use only spaces instead.

> +      s =3D mbrtowc (&wc, ptr, end - ptr, &ps);
> +      if (s =3D=3D (size_t) -1)
> +	break;
> +      if (s =3D=3D (size_t) -2)
> +	{
> +	  ptr++;
> +	  r++;
> +	  continue;
> +	}
> +      if (wc =3D=3D '\e' && ptr + 3 < end
> +	  && ptr[1] =3D=3D '[' && (ptr[2] =3D=3D '0' || ptr[2] =3D=3D '1')
> +	  && ptr[3] =3D=3D 'm')
> +	{
> +	  ptr +=3D 4;
> +	  continue;
> +	}
> +      r +=3D wcwidth (wc);
> +      ptr +=3D s;
> +    }
> +  return r;
> +}
> +
>  void
>  usage (int status)
>  {
> @@ -669,7 +707,9 @@ get_line (FILE *f, int c)
>            c =3D getc (f);
>          }
>        while (c !=3D EOF && !isspace (c));
> -      in_column +=3D word_limit->length =3D wptr - word_limit->text;
> +      word_limit->length =3D wptr - word_limit->text;
> +      in_column +=3D word_limit->width =3D get_display_width (word_limit=
->text,
> +							  wptr);
>        check_punctuation (word_limit);
>
>        /* Scan inter-word space.  */
> @@ -871,13 +911,13 @@ fmt_paragraph (void)
>            if (w =3D=3D word_limit)
>              break;
>
> -          len +=3D (w - 1)->space + w->length;	/* w > start >=3D word */
> +          len +=3D (w - 1)->space + w->width;	/* w > start >=3D word */
>          }
>        while (len < max_width);
>        start->best_cost =3D best + base_cost (start);
>      }
>
> -  word_limit->length =3D saved_length;
> +  word_limit->width =3D saved_length;
>  }
>
>  /* Return the constant component of the cost of breaking before the
> @@ -902,13 +942,13 @@ base_cost (WORD *this)
>        else if ((this - 1)->punct)
>          cost -=3D PUNCT_BONUS;
>        else if (this > word + 1 && (this - 2)->final)
> -        cost +=3D WIDOW_COST ((this - 1)->length);
> +        cost +=3D WIDOW_COST ((this - 1)->width);
>      }
>
>    if (this->paren)
>      cost -=3D PAREN_BONUS;
>    else if (this->final)
> -    cost +=3D ORPHAN_COST (this->length);
> +    cost +=3D ORPHAN_COST (this->width);
>
>    return cost;
>  }
> @@ -983,7 +1023,7 @@ put_word (WORD *w)
>    s =3D w->text;
>    for (n =3D w->length; n !=3D 0; n--)
>      putchar (*s++);
> -  out_column +=3D w->length;
> +  out_column +=3D w->width;
>  }
>
>  /* Output to stdout SPACE spaces, or equivalent tabs.  */




Information forwarded to bug-coreutils@HIDDEN:
bug#11187; Package coreutils. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 5 Apr 2012 18:22:49 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Apr 05 14:22:49 2012
Received: from localhost ([127.0.0.1]:41743 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1SFrKO-00008Y-CW
	for submit <at> debbugs.gnu.org; Thu, 05 Apr 2012 14:22:49 -0400
Received: from eggs.gnu.org ([208.118.235.92]:56463)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <phcoder@HIDDEN>) id 1SFrJH-00006Y-1z
	for submit <at> debbugs.gnu.org; Thu, 05 Apr 2012 14:21:27 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <phcoder@HIDDEN>) id 1SFrIQ-00040m-To
	for submit <at> debbugs.gnu.org; Thu, 05 Apr 2012 14:20:41 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM,
	RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.2
Received: from lists.gnu.org ([208.118.235.17]:43199)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <phcoder@HIDDEN>) id 1SFrIP-0003ys-OF
	for submit <at> debbugs.gnu.org; Thu, 05 Apr 2012 14:20:34 -0400
Received: from eggs.gnu.org ([208.118.235.92]:50075)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <phcoder@HIDDEN>) id 1SFr9N-00035C-GU
	for bug-coreutils@HIDDEN; Thu, 05 Apr 2012 14:11:19 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <phcoder@HIDDEN>) id 1SFr9G-0001vb-JJ
	for bug-coreutils@HIDDEN; Thu, 05 Apr 2012 14:11:11 -0400
Received: from mail-wg0-f49.google.com ([74.125.82.49]:44503)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <phcoder@HIDDEN>) id 1SFr9G-0001vN-73
	for bug-coreutils@HIDDEN; Thu, 05 Apr 2012 14:11:06 -0400
Received: by wgbdr1 with SMTP id dr1so1169231wgb.30
	for <bug-coreutils@HIDDEN>; Thu, 05 Apr 2012 11:11:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=message-id:date:from:user-agent:mime-version:to:subject
	:x-enigmail-version:content-type;
	bh=37J/k4fMPXIvmtz3fECijsrMfvJddFYLPD1P3aiF7dk=;
	b=NDR+mrOXNvxjIz1IiBHaERAXoVdAqW7rxPUTtZKYsUOWP2PgmmaNAgvbaGo6QsQIVy
	Q5u2gUisI8t9aOwS1Ny33+uNmXKJKoU7N5TiUZtvgJwzUKscCS99k6+FLkmffmzlBzEm
	FUCJhgR6TT19I6hiI+l99kOgpzs3AnFF07XqYFi3r8bdKITYdlQwN5GqauPQrh0Mnhrd
	44ORSUzjxln6weD5NvuK7mfebI4TirwIQCoxDoFWLtW5HalpDJDPgmthr/hP7olSfHB4
	9xW7k3HYTJjEO+dcaWOFzb2i3aoq2tMj+KXRVxzZm1NW7kt5gkjf2Z7NZuMPpnAvJqwu
	DEsA==
Received: by 10.216.132.6 with SMTP id n6mr2391049wei.26.1333649461845;
	Thu, 05 Apr 2012 11:11:01 -0700 (PDT)
Received: from debian.x201.phnet (9-233.197-178.cust.bluewin.ch.
	[178.197.233.9])
	by mx.google.com with ESMTPS id ff9sm15517393wib.2.2012.04.05.11.11.00
	(version=TLSv1/SSLv3 cipher=OTHER);
	Thu, 05 Apr 2012 11:11:00 -0700 (PDT)
Message-ID: <4F7DE02D.9050106@HIDDEN>
Date: Thu, 05 Apr 2012 20:10:53 +0200
From: =?UTF-8?B?VmxhZGltaXIgJ8+GLWNvZGVyL3BoY29kZXInIFNlcmJpbmVua28=?=
	<phcoder@HIDDEN>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:10.0.3) Gecko/20120329 Icedove/10.0.3
MIME-Version: 1.0
To: bug-coreutils@HIDDEN
Subject: [PATCH] Fix incorrect width handling of multibyte characters in fmt
X-Enigmail-Version: 1.4
Content-Type: multipart/signed; micalg=pgp-sha512;
	protocol="application/pgp-signature";
	boundary="------------enig86CCBDA9B4AA2A1799EDAF82"
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3)
X-Received-From: 208.118.235.17
X-Spam-Score: -6.1 (------)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Thu, 05 Apr 2012 14:22:35 -0400
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -6.1 (------)

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig86CCBDA9B4AA2A1799EDAF82
Content-Type: multipart/mixed;
 boundary="------------070803090808070203070801"

This is a multi-part message in MIME format.
--------------070803090808070203070801
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Currently fmt assumes that 1 byte=3D 1 column which creates wrongly
formatted strings. Attached patch fixes it

--=20
Regards
Vladimir '=CF=86-coder/phcoder' Serbinenko


--------------070803090808070203070801
Content-Type: text/x-diff;
 name="fmt_width.diff"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
 filename="fmt_width.diff"

diff --git a/src/fmt.c b/src/fmt.c
index 89d13a6..56f7c0b 100644
--- a/src/fmt.c
+++ b/src/fmt.c
@@ -20,6 +20,7 @@
 #include <stdio.h>
 #include <sys/types.h>
 #include <getopt.h>
+#include <wchar.h>
=20
 /* Redefine.  Otherwise, systems (Unicos for one) with headers that defi=
ne
    it to be a type get syntax errors for the variable declaration below.=
  */
@@ -135,6 +136,7 @@ struct Word
=20
     const char *text;		/* the text of the word */
     int length;			/* length of this word */
+    int width;
     int space;			/* the size of the following space */
     unsigned int paren:1;	/* starts with open paren */
     unsigned int period:1;	/* ends in [.?!])* */
@@ -259,6 +261,42 @@ static int next_prefix_indent;
    paragraphs chosen by fmt_paragraph().  */
 static int last_line_length;
=20
+static size_t
+get_display_width (const char *beg, const char *end)
+{
+  const char *ptr;
+  size_t r =3D 0;
+  mbstate_t ps;
+
+  memset (&ps, 0, sizeof (ps));
+
+  for (ptr =3D beg; *ptr && ptr < end; )
+    {
+      wchar_t wc;
+      size_t s;
+
+      s =3D mbrtowc (&wc, ptr, end - ptr, &ps);
+      if (s =3D=3D (size_t) -1)
+	break;
+      if (s =3D=3D (size_t) -2)
+	{
+	  ptr++;
+	  r++;
+	  continue;
+	}
+      if (wc =3D=3D '\e' && ptr + 3 < end
+	  && ptr[1] =3D=3D '[' && (ptr[2] =3D=3D '0' || ptr[2] =3D=3D '1')
+	  && ptr[3] =3D=3D 'm')
+	{
+	  ptr +=3D 4;
+	  continue;
+	}
+      r +=3D wcwidth (wc);
+      ptr +=3D s;
+    }
+  return r;
+}
+
 void
 usage (int status)
 {
@@ -669,7 +707,9 @@ get_line (FILE *f, int c)
           c =3D getc (f);
         }
       while (c !=3D EOF && !isspace (c));
-      in_column +=3D word_limit->length =3D wptr - word_limit->text;
+      word_limit->length =3D wptr - word_limit->text;
+      in_column +=3D word_limit->width =3D get_display_width (word_limit=
->text,
+							  wptr);
       check_punctuation (word_limit);
=20
       /* Scan inter-word space.  */
@@ -871,13 +911,13 @@ fmt_paragraph (void)
           if (w =3D=3D word_limit)
             break;
=20
-          len +=3D (w - 1)->space + w->length;	/* w > start >=3D word */=

+          len +=3D (w - 1)->space + w->width;	/* w > start >=3D word */
         }
       while (len < max_width);
       start->best_cost =3D best + base_cost (start);
     }
=20
-  word_limit->length =3D saved_length;
+  word_limit->width =3D saved_length;
 }
=20
 /* Return the constant component of the cost of breaking before the
@@ -902,13 +942,13 @@ base_cost (WORD *this)
       else if ((this - 1)->punct)
         cost -=3D PUNCT_BONUS;
       else if (this > word + 1 && (this - 2)->final)
-        cost +=3D WIDOW_COST ((this - 1)->length);
+        cost +=3D WIDOW_COST ((this - 1)->width);
     }
=20
   if (this->paren)
     cost -=3D PAREN_BONUS;
   else if (this->final)
-    cost +=3D ORPHAN_COST (this->length);
+    cost +=3D ORPHAN_COST (this->width);
=20
   return cost;
 }
@@ -983,7 +1023,7 @@ put_word (WORD *w)
   s =3D w->text;
   for (n =3D w->length; n !=3D 0; n--)
     putchar (*s++);
-  out_column +=3D w->length;
+  out_column +=3D w->width;
 }
=20
 /* Output to stdout SPACE spaces, or equivalent tabs.  */

--------------070803090808070203070801--

--------------enig86CCBDA9B4AA2A1799EDAF82
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iF4EAREKAAYFAk994C0ACgkQNak7dOguQgloYgD8D/xsDVdrE2wBlvEt8wYCapFB
c3pVzMyV1h6L2p1wfsYA/Rm7eqSkAdB5haDc5deq3ub4x81JEDJmPXALgviLHFGB
=IM6N
-----END PGP SIGNATURE-----

--------------enig86CCBDA9B4AA2A1799EDAF82--




Acknowledgement sent to Vladimir 'φ-coder/phcoder' Serbinenko <phcoder@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-coreutils@HIDDEN. Full text available.
Report forwarded to bug-coreutils@HIDDEN:
bug#11187; Package coreutils. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Fri, 19 Oct 2018 01:30:03 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.