GNU bug report logs -
#79824
fmt not correctly process text with UTF-8 characters encoding
Previous Next
To reply to this bug, email your comments to 79824 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org:
bug#79824; Package
coreutils.
(Wed, 12 Nov 2025 17:08:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Воронов Андрей Александрович <a.voronov <at> fintech.ru>:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org.
(Wed, 12 Nov 2025 17:08:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Good evening,
When I run the fmt to make a text with default 75 columns width it properly convert only the Latin letters from ASCII.
Russian & possible other not English/Latin (Greek, Cyrillic) characters which stored in two bytes in UTF-8 encoding
are shorter 2 times accordingly.
Use test case below:
Original text before (last 20 strings):
=================================================
$ tail -20 Kolisnichenko_D._Komandnaia_stroka_Linux_2.md
### Пакет coreutils
Программа expand полезна для преобразования табуляций в пробелы.
Например, программу с табуляциями в начале строк (опция `-i`) в файле `hellocool.c`
преобразует табуляции в несколько пробелов и запишет в файл `hc.c`:
expand -i hellocool.c > hc.c
Печатный текст форматируется под страницу (72 символа в строке) утилитой `fmt`.
## Источники
* [CDRDAO](http://cdrdao.sourceforge.net/) ; Disk-At-Once Recording of Audio and Data CD-Rs/CD-RWs
* [BChunk](https://github.com/hessu/bchunk) ;
* [ccd2iso](https://sourceforge.net/projects/ccd2iso/) ;
===================================================
Same text after transferring these strings by fmt utility with default options:
===================================================
$ fmt Kolisnichenko_D._Komandnaia_stroka_Linux_2.md
...
### Пакет coreutils
Программа expand полезна для
преобразования табуляций в пробелы.
Например, программу с табуляциями в
начале строк (опция `-i`) в файле `hellocool.c`
преобразует табуляции в несколько
пробелов и запишет в файл `hc.c`:
expand -i hellocool.c > hc.c
Печатный текст форматируется под
страницу (72 символа в строке) утилитой
`fmt`.
## Источники
* [CDRDAO](http://cdrdao.sourceforge.net/) ; Disk-At-Once Recording of
Audio and Data CD-Rs/CD-RWs * [BChunk](https://github.com/hessu/bchunk)
; * [ccd2iso](https://sourceforge.net/projects/ccd2iso/) ;
===================================================
Sorry for my English.
God bless you.
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org:
bug#79824; Package
coreutils.
(Wed, 12 Nov 2025 18:48:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 79824 <at> debbugs.gnu.org (full text, mbox):
On 12/11/2025 15:14, Воронов Андрей Александрович wrote:
> Good evening,
>
> When I run the fmt to make a text with default 75 columns width it properly convert only the Latin letters from ASCII.
> Russian & possible other not English/Latin (Greek, Cyrillic) characters which stored in two bytes in UTF-8 encoding
> are shorter 2 times accordingly.
Yes this is a known issue which we're gradually getting to.
thanks,
Padraig
Information forwarded
to
bug-coreutils <at> gnu.org:
bug#79824; Package
coreutils.
(Wed, 12 Nov 2025 22:30:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 79824 <at> debbugs.gnu.org (full text, mbox):
Pádraig Brady <P <at> draigBrady.com> writes:
> On 12/11/2025 15:14, Воронов Андрей Александрович wrote:
>> Good evening,
>> When I run the fmt to make a text with default 75 columns width it
>> properly convert only the Latin letters from ASCII.
>> Russian & possible other not English/Latin (Greek, Cyrillic) characters which stored in two bytes in UTF-8 encoding
>> are shorter 2 times accordingly.
>
> Yes this is a known issue which we're gradually getting to.
I can have a look at it using mbbuf_t in a similar way to 'fold'.
I think 'fmt' is similar, in that it does not matter much if it is a bit
slower. Handling unicode characters is more important, IMO.
Collin
This bug report was last modified 1 day ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.