GNU logs - #23086, boring messages


Message sent to bug-gnu-emacs@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#23086: 25.1.50; Emacs ignores Unicode line and paragraph separator characters
Resent-From: Philipp Stephani <p.stephani2@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@HIDDEN
Resent-Date: Tue, 22 Mar 2016 10:44:01 +0000
Resent-Message-ID: <handler.23086.B.145864338813150 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: report 23086
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: 
To: 23086 <at> debbugs.gnu.org
X-Debbugs-Original-To: bug-gnu-emacs@HIDDEN
Received: via spool by submit <at> debbugs.gnu.org id=B.145864338813150
          (code B ref -1); Tue, 22 Mar 2016 10:44:01 +0000
Received: (at submit) by debbugs.gnu.org; 22 Mar 2016 10:43:08 +0000
Received: from localhost ([127.0.0.1]:57735 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1aiJmF-0003Q1-R9
	for submit <at> debbugs.gnu.org; Tue, 22 Mar 2016 06:43:08 -0400
Received: from eggs.gnu.org ([208.118.235.92]:47353)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <p.stephani2@HIDDEN>) id 1aiJmD-0003PX-Lm
 for submit <at> debbugs.gnu.org; Tue, 22 Mar 2016 06:43:05 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <p.stephani2@HIDDEN>) id 1aiJm3-0000Ts-Tt
 for submit <at> debbugs.gnu.org; Tue, 22 Mar 2016 06:43:00 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: *
X-Spam-Status: No, score=1.1 required=5.0 tests=BAYES_50,
 FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,T_DKIM_INVALID autolearn=disabled
 version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:36531)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <p.stephani2@HIDDEN>) id 1aiJm3-0000Ti-Q0
 for submit <at> debbugs.gnu.org; Tue, 22 Mar 2016 06:42:55 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:38288)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <p.stephani2@HIDDEN>) id 1aiJm2-00037q-6Y
 for bug-gnu-emacs@HIDDEN; Tue, 22 Mar 2016 06:42:55 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <p.stephani2@HIDDEN>) id 1aiJm0-0000T3-TO
 for bug-gnu-emacs@HIDDEN; Tue, 22 Mar 2016 06:42:54 -0400
Received: from mail-wm0-x233.google.com ([2a00:1450:400c:c09::233]:38624)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <p.stephani2@HIDDEN>) id 1aiJm0-0000Sy-Id
 for bug-gnu-emacs@HIDDEN; Tue, 22 Mar 2016 06:42:52 -0400
Received: by mail-wm0-x233.google.com with SMTP id l68so157092836wml.1
 for <bug-gnu-emacs@HIDDEN>; Tue, 22 Mar 2016 03:42:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=from:to:subject:date:message-id:mime-version
 :content-transfer-encoding;
 bh=8mbV3ZWkCs0ykHki2eH592MwKmbkDwFM0KGRVzBd8Zc=;
 b=AXYDQ1hiOambIfh3JYeS5D5ZFeK2WIAuX8hMsyIDjWVgcSlMdErG6K1bM6mi9DkMqh
 CHoYtesK04PRPGY4BRpf4NMflUGWDQKi4Wg/1N2/OYaHYmGiaT60DH3qN5129w0EFYeS
 cnLTjLQY154y9N5x8fo4zFFlg72z0woGJ/PGbpEn80hUeWM2mY4Gvl0ps7hFTL4UdKyC
 CTYSxBNOcmW7fPN2E6Pb9jRbmUfIk7VbYpPuO1EmeSof/ckUuhHsUkF0lX4nSxxTzsK+
 tBmZu7cX/0BBwOYcU7XrYq3PV4lShnXS59VOTNsGig5cEGBPIWnI22H8uQvK3QNCkdZG
 8TXQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:from:to:subject:date:message-id:mime-version
 :content-transfer-encoding;
 bh=8mbV3ZWkCs0ykHki2eH592MwKmbkDwFM0KGRVzBd8Zc=;
 b=S2IjucHiksvE0dKGDu5a46R4m/RLW2wbW79DuG8FLiLg5DSOvKitw5PYb6YyuOreuP
 vJWpas4oX9bnGG3n8Vus335Q22udMLC5QdXpLUjKLRKzjYSrQ6UgO5FUn79yX6Rr/fpg
 S8+5wiPpbT7IU/aJHPJ7OhJEPo1jgjZCc6ayWXeiwZEPimd6JNgzWmqBhUmLS/8vqHah
 2daW+ZtNtGN020CKcU/l3cGZaxnBhFkzb4IVhz1ejiw5eYj/m2WmJEKSiFqFCEmrt0PR
 KL2AV8lejddF453h0OO0rk+5ZYQVJLAy7z6CMvhpm6koCE74jmdrC8UOhbNuaYjE436E
 aENQ==
X-Gm-Message-State: AD7BkJJ1v5XzI6xNVHHsa+NOsX1nVvTXTO9oMVpaBm5virGz24NjvTK0XLHehniwCQRhpw==
X-Received: by 10.28.50.138 with SMTP id y132mr20555440wmy.52.1458643371488;
 Tue, 22 Mar 2016 03:42:51 -0700 (PDT)
Received: from phst2.muc.corp.google.com ([2a00:79e0:15:4:2067:167c:e8b3:7ba3])
 by smtp.gmail.com with ESMTPSA id v5sm16614098wmg.16.2016.03.22.03.42.49
 for <bug-gnu-emacs@HIDDEN>
 (version=TLS1_2 cipher=AES128-SHA bits=128/128);
 Tue, 22 Mar 2016 03:42:50 -0700 (PDT)
From: Philipp Stephani <p.stephani2@HIDDEN>
Date: Tue, 22 Mar 2016 11:42:46 +0100
Message-ID: <wvr4fuvix07t.fsf@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -3.8 (---)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.8 (---)


Type some characters
C-x 8 RET LINE SEPARATOR (or PARAGRAPH SEPARATOR)
Type some more characters
M-q

Expected behavior: Emacs treats these characters as line and paragraph
separators: they are displayed as line breaks, M-q doesn't remove them,
and forward-paragraph etc. treat the paragraph separator as paragraph
end.

Actual behavior: These characters are displayed as one-pixel horizontal
whitespace and otherwise ignore.

Also discussed in
https://lists.gnu.org/archive/html/emacs-devel/2015-08/msg01043.html.
https://www.emacswiki.org/emacs/unicode-whitespace.el supposedly adds
support for these characters, but I think proper treatment of Unicode
separators should be part of Emacs.



In GNU Emacs 25.1.50.1 (x86_64-unknown-linux-gnu, GTK+ Version 3.10.8)
Repository revision: 780a605e1d2de4b975e6f1f29b491c9af419dcff
Windowing system distributor 'The X.Org Foundation', version 11.0.11501000
System Description:	Ubuntu 14.04 LTS

Configured using:
 'configure --with-modules --disable-build-details 'CFLAGS=3D-g -O0''

Configured features:
XPM JPEG TIFF GIF PNG RSVG SOUND GPM DBUS GCONF GSETTINGS NOTIFY ACL
LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 MODULES

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Text

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
Quit
Fill column set to 10 (was 70)
Quit
Making completion list...

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message dired dired-loaddefs format-spec
rfc822 mml easymenu mml-sec password-cache epa derived epg epg-config
gnus-util rmail rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse
rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045
ietf-drums mm-util mail-prsvr mail-utils iso-transl time-date mule-util
tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type
mwheel term/x-win x-win term/common-win x-dnd tool-bar dnd fontset image
regexp-opt fringe tabulated-list newcomment elisp-mode lisp-mode
prog-mode register page menu-bar rfn-eshadow timer select scroll-bar
mouse jit-lock font-lock syntax facemenu font-core term/tty-colors frame
cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan thai
tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian
slovak czech european ethiopic indian cyrillic chinese charscript
case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote dbusbind inotify
dynamic-setting system-font-setting font-render-setting move-toolbar gtk
x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 174467 8982)
 (symbols 48 30106 0)
 (miscs 40 468 148)
 (strings 32 66519 6641)
 (string-bytes 1 1505951)
 (vectors 16 13333)
 (vector-slots 8 488346 23035)
 (floats 8 167 91)
 (intervals 56 233 2)
 (buffers 976 13)
 (heap 1024 43667 1138))

--=20
Google Germany GmbH
Erika-Mann-Stra=C3=9Fe 33
80636 M=C3=BCnchen

Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Gesch=C3=A4ftsf=C3=BChrer: Matthew Scott Sucherman, Paul Terence Manicle

Diese E-Mail ist vertraulich.  Wenn Sie nicht der richtige Adressat sind,
leiten Sie diese bitte nicht weiter, informieren Sie den Absender und l=C3=
=B6schen
Sie die E-Mail und alle Anh=C3=A4nge.  Vielen Dank.

This e-mail is confidential.  If you are not the right addressee please do =
not
forward it, please inform the sender, and please erase this e-mail including
any attachments.  Thanks.




Message sent:


Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Mailer: MIME-tools 5.505 (Entity 5.505)
Content-Type: text/plain; charset=utf-8
X-Loop: help-debbugs@HIDDEN
From: help-debbugs@HIDDEN (GNU bug Tracking System)
To: Philipp Stephani <p.stephani2@HIDDEN>
Subject: bug#23086: Acknowledgement (25.1.50; Emacs ignores Unicode line
 and paragraph separator characters)
Message-ID: <handler.23086.B.145864338813150.ack <at> debbugs.gnu.org>
References: <wvr4fuvix07t.fsf@HIDDEN>
X-Gnu-PR-Message: ack 23086
X-Gnu-PR-Package: emacs
Reply-To: 23086 <at> debbugs.gnu.org
Date: Tue, 22 Mar 2016 10:44:02 +0000

Thank you for filing a new bug report with debbugs.gnu.org.

This is an automatically generated reply to let you know your message
has been received.

Your message is being forwarded to the package maintainers and other
interested parties for their attention; they will reply in due course.

Your message has been sent to the package maintainer(s):
 bug-gnu-emacs@HIDDEN

If you wish to submit further information on this problem, please
send it to 23086 <at> debbugs.gnu.org.

Please do not send mail to help-debbugs@HIDDEN unless you wish
to report a problem with the Bug-tracking system.

--=20
23086: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D23086
GNU Bug Tracking System
Contact help-debbugs@HIDDEN with problems


Message sent to bug-gnu-emacs@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#23086: 25.1.50; Emacs ignores Unicode line and paragraph separator characters
Resent-From: Eli Zaretskii <eliz@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@HIDDEN
Resent-Date: Tue, 22 Mar 2016 16:14:01 +0000
Resent-Message-ID: <handler.23086.B23086.1458663226556 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 23086
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: 
To: Philipp Stephani <p.stephani2@HIDDEN>
Cc: 23086 <at> debbugs.gnu.org
Reply-To: Eli Zaretskii <eliz@HIDDEN>
Received: via spool by 23086-submit <at> debbugs.gnu.org id=B23086.1458663226556
          (code B ref 23086); Tue, 22 Mar 2016 16:14:01 +0000
Received: (at 23086) by debbugs.gnu.org; 22 Mar 2016 16:13:46 +0000
Received: from localhost ([127.0.0.1]:60191 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1aiOwD-00008u-JL
	for submit <at> debbugs.gnu.org; Tue, 22 Mar 2016 12:13:45 -0400
Received: from eggs.gnu.org ([208.118.235.92]:46270)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1aiOwB-00008h-LE
 for 23086 <at> debbugs.gnu.org; Tue, 22 Mar 2016 12:13:44 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <eliz@HIDDEN>) id 1aiOw2-00078b-FX
 for 23086 <at> debbugs.gnu.org; Tue, 22 Mar 2016 12:13:38 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,T_RP_MATCHES_RCVD
 autolearn=disabled version=3.3.2
Received: from fencepost.gnu.org ([2001:4830:134:3::e]:58336)
 by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@HIDDEN>)
 id 1aiOw2-00078W-Bv; Tue, 22 Mar 2016 12:13:34 -0400
Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:4094
 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128)
 (Exim 4.82) (envelope-from <eliz@HIDDEN>)
 id 1aiOw1-0008Hb-OP; Tue, 22 Mar 2016 12:13:34 -0400
Date: Tue, 22 Mar 2016 18:13:15 +0200
Message-Id: <831t725w4k.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
In-reply-to: <wvr4fuvix07t.fsf@HIDDEN> (message from
 Philipp Stephani on Tue, 22 Mar 2016 11:42:46 +0100)
References: <wvr4fuvix07t.fsf@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2001:4830:134:3::e
X-Spam-Score: -5.0 (-----)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -5.0 (-----)

> From: Philipp Stephani <p.stephani2@HIDDEN>
> Date: Tue, 22 Mar 2016 11:42:46 +0100
> 
> Type some characters
> C-x 8 RET LINE SEPARATOR (or PARAGRAPH SEPARATOR)
> Type some more characters
> M-q
> 
> Expected behavior: Emacs treats these characters as line and paragraph
> separators: they are displayed as line breaks, M-q doesn't remove them,
> and forward-paragraph etc. treat the paragraph separator as paragraph
> end.
> 
> Actual behavior: These characters are displayed as one-pixel horizontal
> whitespace and otherwise ignore.
> 
> Also discussed in
> https://lists.gnu.org/archive/html/emacs-devel/2015-08/msg01043.html.
> https://www.emacswiki.org/emacs/unicode-whitespace.el supposedly adds
> support for these characters, but I think proper treatment of Unicode
> separators should be part of Emacs.

It is not clear to me what exactly is the requested feature.  Can you
propose a detailed list of requirements?

I'm asking because these characters come in Unicode with a non-trivial
baggage, that is a far cry from just breaking the line; see

  http://unicode.org/reports/tr14/
  http://unicode.org/reports/tr29/

There are also implications on the bidirectional display (it is
sensitive to where the line and the paragraph begin and end).

If we want to support these two characters, we should think about
which parts of the relevant functionality we want to see in Emacs,
because users will expect that.  In addition, there are other
white-space characters defined by Unicode, and it would make sense to
treat them all alike.  I'm not sure it makes sense to support just the
line-breaking and paragraph-separator parts of only these two
characters.

Then there are Emacs-specific issues, for example:

 . do we treat u+2028 and u+2029 as literal characters, or as a form
   of EOL encoding?
 . if the former, how do we distinguish them from newlines on display?
 . should Isearch find these when looking for "\n"? how about regexp
   search for "$"?

There are probably more implications, these just the ones that popped
in my mind in 5 sec.  IOW, I think Someoneā„¢ should think this over and
present a detailed proposal.

Thanks.




Message sent to bug-gnu-emacs@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#23086: 25.1.50; Emacs ignores Unicode line and paragraph separator characters
Resent-From: John Wiegley <jwiegley@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@HIDDEN
Resent-Date: Sun, 27 Mar 2016 00:21:03 +0000
Resent-Message-ID: <handler.23086.B23086.145903806121207 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 23086
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: 
To: Eli Zaretskii <eliz@HIDDEN>
Cc: Philipp Stephani <p.stephani2@HIDDEN>, 23086 <at> debbugs.gnu.org
Received: via spool by 23086-submit <at> debbugs.gnu.org id=B23086.145903806121207
          (code B ref 23086); Sun, 27 Mar 2016 00:21:03 +0000
Received: (at 23086) by debbugs.gnu.org; 27 Mar 2016 00:21:01 +0000
Received: from localhost ([127.0.0.1]:38998 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ajyRx-0005Vy-5l
	for submit <at> debbugs.gnu.org; Sat, 26 Mar 2016 20:21:01 -0400
Received: from mail-pa0-f49.google.com ([209.85.220.49]:33742)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <jwiegley@HIDDEN>) id 1ajyRv-0005VW-7J
 for 23086 <at> debbugs.gnu.org; Sat, 26 Mar 2016 20:20:59 -0400
Received: by mail-pa0-f49.google.com with SMTP id fl4so71197861pad.0
 for <23086 <at> debbugs.gnu.org>; Sat, 26 Mar 2016 17:20:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=from:to:cc:subject:in-reply-to:date:message-id:references
 :user-agent:mime-version:content-transfer-encoding;
 bh=6xjEcuC4AoCvDZPVAveUH6TvgWzUjiMPHDBPjooomUM=;
 b=0RgqCfoHAEoyVGXHYXnspoBeCo9shV0t/dCG/BNhp55EpVAg9KcTfPJLEPjoT2vpba
 vx7QIdi6YQzXs4kNvA9JHPFut+ZY1/UyTggBgwIJxzHj42HBOGXiUySmIqN6MXBje0uP
 VX4F3lJ7WUNsRhJ0x/CY/rowZbX9ADDG99PenDetSh6xjFfubNhbRM1ubINko0yf8cG8
 E2VDqZDmV1pPZP+q2/BmHpneWwNlgg54ns8PsmcjmQd59+O/1g4IWAE4d7ZgWb4pyynK
 TVQ3QFGnW945xMWBy69rynZdKGjYEZ7mfYhmjzMpuXsAdS367x8Z9OUZCshFX+tmouqw
 lZsg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:from:to:cc:subject:in-reply-to:date:message-id
 :references:user-agent:mime-version:content-transfer-encoding;
 bh=6xjEcuC4AoCvDZPVAveUH6TvgWzUjiMPHDBPjooomUM=;
 b=RIZVnZwAdfZlihE48T3LNFPbKCJ798ppFRUz9K3XRFpSiKIUDHJDKqD9AvFEh8II6K
 +EidOCL9yAqSUwS/aYB3wMl+hLxn8ebU5YZXvd/YL8OVgTL78vlP1zc1U9q1EJRoPkQW
 871DYCNQp0pP3JUaxl87/nAmtEIcgqZ+krj1ywemtgd/8ly4a3XcTHMq0Y+P7r3pPF3t
 kNZ6AODEmDo3LAlOyjPgZIX8w/SgQE8/pPsKsuqP9G/CYbj4RxO9EFo5Ejys5/dBWBYR
 7GuDpE2fDaJIxJgmiHrRIHH3yj6dNKZVkV3kGdgB88Ku+8ow51hrbqYdHFV4vf9ZDz4H
 Jsmg==
X-Gm-Message-State: AD7BkJIJT9zduqrfvW7wInB1Nx/S0kDG4eA/vHleg5Du5NWx1MNUghrGXYmoBeyku/Om8g==
X-Received: by 10.67.8.100 with SMTP id dj4mr31875399pad.88.1459038053681;
 Sat, 26 Mar 2016 17:20:53 -0700 (PDT)
Received: from Hermes.local (76-234-68-79.lightspeed.frokca.sbcglobal.net.
 [76.234.68.79])
 by smtp.gmail.com with ESMTPSA id 3sm25462056pfn.59.2016.03.26.17.20.51
 (version=TLS1 cipher=AES128-SHA bits=128/128);
 Sat, 26 Mar 2016 17:20:52 -0700 (PDT)
From: John Wiegley <jwiegley@HIDDEN>
X-Google-Original-From: "John Wiegley" <johnw@HIDDEN>
Received: by Hermes.local (Postfix, from userid 501)
 id 102114FB4C57; Sat, 26 Mar 2016 17:20:50 -0700 (PDT)
In-Reply-To: <831t725w4k.fsf@HIDDEN> (Eli Zaretskii's message of "Tue, 22 Mar
 2016 18:13:15 +0200")
Date: Sat, 26 Mar 2016 16:49:53 -0700
Message-ID: <m2shzcre8u.fsf@HIDDEN>
References: <wvr4fuvix07t.fsf@HIDDEN>
 <831t725w4k.fsf@HIDDEN>
User-Agent: Gnus/5.130014 (Ma Gnus v0.14) Emacs/25.0.92 (darwin)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -0.7 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.7 (/)

>>>>> Eli Zaretskii <eliz@HIDDEN> writes:

> There are probably more implications, these just the ones that popped in =
my
> mind in 5 sec. IOW, I think Someone=E2=84=A2 should think this over and p=
resent a
> detailed proposal.

Very much agreed. Reading this bug description gives me that "There be
dragons" feeling. :)

--=20
John Wiegley                  GPG fingerprint =3D 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2




Message sent to bug-gnu-emacs@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#23086: 25.1.50; Emacs ignores Unicode line and paragraph separator characters
Resent-From: Eli Zaretskii <eliz@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@HIDDEN
Resent-Date: Mon, 17 Jul 2017 15:10:02 +0000
Resent-Message-ID: <handler.23086.B23086.15003041881083 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 23086
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: 
To: p.stephani2@HIDDEN
Cc: 23086 <at> debbugs.gnu.org
Reply-To: Eli Zaretskii <eliz@HIDDEN>
Received: via spool by 23086-submit <at> debbugs.gnu.org id=B23086.15003041881083
          (code B ref 23086); Mon, 17 Jul 2017 15:10:02 +0000
Received: (at 23086) by debbugs.gnu.org; 17 Jul 2017 15:09:48 +0000
Received: from localhost ([127.0.0.1]:44643 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1dX7ed-0000HO-RV
	for submit <at> debbugs.gnu.org; Mon, 17 Jul 2017 11:09:48 -0400
Received: from eggs.gnu.org ([208.118.235.92]:52225)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1dX7ec-0000H9-Td
 for 23086 <at> debbugs.gnu.org; Mon, 17 Jul 2017 11:09:47 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <eliz@HIDDEN>) id 1dX7eU-0002SV-BD
 for 23086 <at> debbugs.gnu.org; Mon, 17 Jul 2017 11:09:41 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD
 autolearn=disabled version=3.3.2
Received: from fencepost.gnu.org ([2001:4830:134:3::e]:35197)
 by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@HIDDEN>)
 id 1dX7eU-0002SR-7B; Mon, 17 Jul 2017 11:09:38 -0400
Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:4324
 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <eliz@HIDDEN>)
 id 1dX7eT-0006ko-LZ; Mon, 17 Jul 2017 11:09:38 -0400
Date: Mon, 17 Jul 2017 18:09:46 +0300
Message-Id: <83o9sjcd6t.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
In-reply-to: <831t725w4k.fsf@HIDDEN> (message from Eli Zaretskii on Tue, 22
 Mar 2016 18:13:15 +0200)
References: <wvr4fuvix07t.fsf@HIDDEN>
 <831t725w4k.fsf@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2001:4830:134:3::e
X-Spam-Score: -5.0 (-----)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -5.0 (-----)

> Date: Tue, 22 Mar 2016 18:13:15 +0200
> From: Eli Zaretskii <eliz@HIDDEN>
> Cc: 23086 <at> debbugs.gnu.org
> 
> > From: Philipp Stephani <p.stephani2@HIDDEN>
> > Date: Tue, 22 Mar 2016 11:42:46 +0100
> > 
> > Type some characters
> > C-x 8 RET LINE SEPARATOR (or PARAGRAPH SEPARATOR)
> > Type some more characters
> > M-q
> > 
> > Expected behavior: Emacs treats these characters as line and paragraph
> > separators: they are displayed as line breaks, M-q doesn't remove them,
> > and forward-paragraph etc. treat the paragraph separator as paragraph
> > end.
> > 
> > Actual behavior: These characters are displayed as one-pixel horizontal
> > whitespace and otherwise ignore.
> > 
> > Also discussed in
> > https://lists.gnu.org/archive/html/emacs-devel/2015-08/msg01043.html.
> > https://www.emacswiki.org/emacs/unicode-whitespace.el supposedly adds
> > support for these characters, but I think proper treatment of Unicode
> > separators should be part of Emacs.
> 
> It is not clear to me what exactly is the requested feature.  Can you
> propose a detailed list of requirements?
> 
> I'm asking because these characters come in Unicode with a non-trivial
> baggage, that is a far cry from just breaking the line; see
> 
>   http://unicode.org/reports/tr14/
>   http://unicode.org/reports/tr29/
> 
> There are also implications on the bidirectional display (it is
> sensitive to where the line and the paragraph begin and end).
> 
> If we want to support these two characters, we should think about
> which parts of the relevant functionality we want to see in Emacs,
> because users will expect that.  In addition, there are other
> white-space characters defined by Unicode, and it would make sense to
> treat them all alike.  I'm not sure it makes sense to support just the
> line-breaking and paragraph-separator parts of only these two
> characters.
> 
> Then there are Emacs-specific issues, for example:
> 
>  . do we treat u+2028 and u+2029 as literal characters, or as a form
>    of EOL encoding?
>  . if the former, how do we distinguish them from newlines on display?
>  . should Isearch find these when looking for "\n"? how about regexp
>    search for "$"?
> 
> There are probably more implications, these just the ones that popped
> in my mind in 5 sec.  IOW, I think Someoneā„¢ should think this over and
> present a detailed proposal.

So I've dusted off this year-old bug reported and decided to improve
Emacs in this area.  Here's what I propose:

 . u+2028 and u+2029 (and also perhaps u+0085) will be treated a form
   of EOL encoding, which means they will not appear on display, and
   will cause the next character be displayed on the next screen line
 . M-q will remove u+2028, as it removes newlines, and put newlines
   at all EOLs as part of filling
 . M-q will NOT remove u+2029, unless the user wants to refill several
   paragraphs as a single paragraph, and there happens to be a u+2029
   between some of the paragraphs
 . forward-paragraph etc. will treat u+2029 as paragraph end
 . bidi reordering will treat u+2029 as paragraph end

There are some compromises in these decisions, but they make the job
much easier and less intrusive, and I think they will advance the
level of our Unicode support quite a bit.

Comments?

I think we should also make $ match these two characters, in addition
to the newline, but that could be more difficult.  Would someone who
knows their way in regex.c want to work on this part?





Last modified: Mon, 25 Nov 2019 12:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.