GNU bug report logs - #1877
Request: Regular expressions that can match Unicode general categories

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: emacs; Severity: wishlist; Reported by: Derick Eddington <derick.eddington@HIDDEN>; Done: Lars Ingebrigtsen <larsi@HIDDEN>; Maintainer for emacs is bug-gnu-emacs@HIDDEN.
bug closed, send any further explanations to 1877 <at> debbugs.gnu.org and Derick Eddington <derick.eddington@HIDDEN> Request was from Lars Ingebrigtsen <larsi@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 1877 <at> debbugs.gnu.org:


Received: (at 1877) by debbugs.gnu.org; 14 Nov 2021 06:28:18 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Nov 14 01:28:18 2021
Received: from localhost ([127.0.0.1]:48875 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1mm8zy-00062b-DC
	for submit <at> debbugs.gnu.org; Sun, 14 Nov 2021 01:28:18 -0500
Received: from quimby.gnus.org ([95.216.78.240]:56880)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <larsi@HIDDEN>) id 1mm8zw-00062L-VS
 for 1877 <at> debbugs.gnu.org; Sun, 14 Nov 2021 01:28:17 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnus.org;
 s=20200322; h=Content-Type:MIME-Version:Message-ID:In-Reply-To:Date:
 References:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:
 Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender:
 Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:
 List-Subscribe:List-Post:List-Owner:List-Archive;
 bh=htJEtKLYwU8Hb2P5y33LjTPrvERHPxc/ADB7WGDZt8U=; b=FMgPLYBQEJ5SaqL1aGNTEzuRHh
 2QUfabAfsVK/3ihGJ7sUV88ExRZc3vIcb+vmok9vNGef5VhcEcuXqS/VdePr/qaRJG2/58nyOumto
 LtYXfMd4pEerYz878E6QpqFGIRC5x9nv8YneYCQGeogISB0Dm7gCDeX9NFqoxMFeoNmI=;
Received: from [84.212.220.105] (helo=xo)
 by quimby.gnus.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.92) (envelope-from <larsi@HIDDEN>)
 id 1mm8zn-0001Je-Bl; Sun, 14 Nov 2021 07:28:10 +0100
From: Lars Ingebrigtsen <larsi@HIDDEN>
To: Eli Zaretskii <eliz@HIDDEN>
Subject: Re: bug#1877: Request: Regular expressions that can match Unicode
 general categories
References: <1231792692.22467.115.camel@eep> <87zhimfcs4.fsf@HIDDEN>
 <83r23ycgv9.fsf@HIDDEN>
X-Now-Playing: Charles Manier's _Two Synths, A Guitar (And) A Drum Machine_:
 "Sift Through Art Collecting People"
Date: Sun, 14 Nov 2021 07:28:06 +0100
In-Reply-To: <83r23ycgv9.fsf@HIDDEN> (Eli Zaretskii's message of "Mon, 30 Sep
 2019 11:45:14 +0300")
Message-ID: <877ddbb6a1.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org",
 has NOT identified this incoming email as spam.  The original
 message has been attached to this so you can view it or label
 similar future email.  If you have any questions, see
 @@CONTACT_ADDRESS@@ for details.
 Content preview:  Eli Zaretskii <eliz@HIDDEN> writes: > It is not clear to
 me which categories are of interest here. Some of > them are nowadays
 definitely
 available indirectly via the classes > mentioned above (they weren't available
 in Emacs 23 when th [...] 
 Content analysis details:   (-2.9 points, 5.0 required)
 pts rule name              description
 ---- ---------------------- --------------------------------------------------
 -1.0 ALL_TRUSTED            Passed through trusted hosts only via SMTP
 -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
 [score: 0.0000]
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 1877
Cc: derick.eddington@HIDDEN, 1877 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Eli Zaretskii <eliz@HIDDEN> writes:

> It is not clear to me which categories are of interest here.  Some of
> them are nowadays definitely available indirectly via the classes
> mentioned above (they weren't available in Emacs 23 when the bug was
> filed).  Maybe the OP could provide an explicit list of categories
> needed for this Scheme mode, together with their required usage in
> this mode.  Looking at R6RS sec 4.2.1, all I see is "whitespace"
> (which we provide via [:blank:]), "letter" (provided by [:alpha:]),
> "digit" (provided by [:alnum:]), and "intraline whitespace" (provided
> by [:blank:]).  If this is all, then we have all the required support
> now.

There was no response here (in two years), so I'm guessing that we have
the categories required, and I'm closing this bug report.  If there are
any further categories that would be useful to have added, please
respond to the debbugs address and we'll reopen.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#1877; Package emacs. Full text available.
Removed tag(s) moreinfo. Request was from Stefan Kangas <stefan@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 1877 <at> debbugs.gnu.org:


Received: (at 1877) by debbugs.gnu.org; 30 Sep 2019 08:45:30 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Sep 30 04:45:30 2019
Received: from localhost ([127.0.0.1]:56913 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1iErJC-0005Gy-B7
	for submit <at> debbugs.gnu.org; Mon, 30 Sep 2019 04:45:30 -0400
Received: from eggs.gnu.org ([209.51.188.92]:44451)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1iErJ9-0005Gk-Vy
 for 1877 <at> debbugs.gnu.org; Mon, 30 Sep 2019 04:45:28 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:35580)
 by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@HIDDEN>)
 id 1iErJ0-00043Z-7I; Mon, 30 Sep 2019 04:45:20 -0400
Received: from [176.228.60.248] (port=4816 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <eliz@HIDDEN>)
 id 1iErIz-0000qO-JQ; Mon, 30 Sep 2019 04:45:18 -0400
Date: Mon, 30 Sep 2019 11:45:14 +0300
Message-Id: <83r23ycgv9.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Lars Ingebrigtsen <larsi@HIDDEN>
In-reply-to: <87zhimfcs4.fsf@HIDDEN> (message from Lars Ingebrigtsen on Mon, 
 30 Sep 2019 09:45:15 +0200)
Subject: Re: bug#1877: Request: Regular expressions that can match Unicode
 general categories
References: <1231792692.22467.115.camel@eep> <87zhimfcs4.fsf@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 1877
Cc: derick.eddington@HIDDEN, 1877 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Lars Ingebrigtsen <larsi@HIDDEN>
> Date: Mon, 30 Sep 2019 09:45:15 +0200
> Cc: 1877 <at> debbugs.gnu.org
> 
> Derick Eddington <derick.eddington@HIDDEN> writes:
> 
> > A new Scheme major mode I've made [1] requires regular expressions that
> > can match characters by their Unicode general categories.  It seems
> > Emacs regular expressions do not provide a way to do that directly (I'm
> > using GNU Emacs 23.0.60.1)
> 
> (I'm going through old bug reports that unfortunately didn't get any
> response at the time.)
> 
> I'm not quite sure what Unicode general categories you're referring to,
> but the Emacs regexp matcher has gained a bunch of categories in the ten
> years since you made the request.
> 
> Are the categories below what you were thinking of?
> 
> ‘[:print:]’
>      This matches any printing character—either whitespace, or a graphic
>      character matched by ‘[:graph:]’.
> ‘[:punct:]’
>      This matches any punctuation character.  (At present, for multibyte
>      characters, it matches anything that has non-word syntax.)
> ‘[:space:]’
>      This matches any character that has whitespace syntax (*note Syntax
>      Class Table::).
> ‘[:upper:]’
>      This matches any upper-case letter, as determined by the current
>      case table (*note Case Tables::).  If ‘case-fold-search’ is
>      non-‘nil’, this also matches any lower-case letter.
> ‘[:word:]’
>      This matches any character that has word syntax (*note Syntax Class
>      Table::).

No, he means the categories described in the node "Character
Properties" of the ELisp manual.

We don't yet have full support for the Unicode Regular Expressions, as
specified in UTS#18.  In particular, see

  http://unicode.org/reports/tr18/#General_Category_Property

for General Category regexp specs.

It is not clear to me which categories are of interest here.  Some of
them are nowadays definitely available indirectly via the classes
mentioned above (they weren't available in Emacs 23 when the bug was
filed).  Maybe the OP could provide an explicit list of categories
needed for this Scheme mode, together with their required usage in
this mode.  Looking at R6RS sec 4.2.1, all I see is "whitespace"
(which we provide via [:blank:]), "letter" (provided by [:alpha:]),
"digit" (provided by [:alnum:]), and "intraline whitespace" (provided
by [:blank:]).  If this is all, then we have all the required support
now.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#1877; Package emacs. Full text available.
Added tag(s) moreinfo. Request was from Lars Ingebrigtsen <larsi@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 1877 <at> debbugs.gnu.org:


Received: (at 1877) by debbugs.gnu.org; 30 Sep 2019 07:45:21 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Sep 30 03:45:21 2019
Received: from localhost ([127.0.0.1]:56803 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1iEqMy-0004Ac-Nr
	for submit <at> debbugs.gnu.org; Mon, 30 Sep 2019 03:45:21 -0400
Received: from quimby.gnus.org ([80.91.231.51]:46652)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <larsi@HIDDEN>) id 1iEqMw-00048G-Ve
 for 1877 <at> debbugs.gnu.org; Mon, 30 Sep 2019 03:45:19 -0400
Received: from cm-84.212.202.86.getinternet.no ([84.212.202.86] helo=marnie)
 by quimby.gnus.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.89) (envelope-from <larsi@HIDDEN>)
 id 1iEqMt-0004sw-Gf; Mon, 30 Sep 2019 09:45:17 +0200
From: Lars Ingebrigtsen <larsi@HIDDEN>
To: Derick Eddington <derick.eddington@HIDDEN>
Subject: Re: bug#1877: Request: Regular expressions that can match Unicode
 general categories
References: <1231792692.22467.115.camel@eep>
Date: Mon, 30 Sep 2019 09:45:15 +0200
In-Reply-To: <1231792692.22467.115.camel@eep> (Derick Eddington's message of
 "Mon, 12 Jan 2009 12:38:12 -0800")
Message-ID: <87zhimfcs4.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org",
 has NOT identified this incoming email as spam.  The original
 message has been attached to this so you can view it or label
 similar future email.  If you have any questions, see
 @@CONTACT_ADDRESS@@ for details.
 Content preview:  Derick Eddington <derick.eddington@HIDDEN> writes: > A
 new Scheme major mode I've made [1] requires regular expressions that > can
 match characters by their Unicode general categories. It seems > Emacs regular
 expressions do not provide a way to do th [...] 
 Content analysis details:   (-2.9 points, 5.0 required)
 pts rule name              description
 ---- ---------------------- --------------------------------------------------
 -1.0 ALL_TRUSTED            Passed through trusted hosts only via SMTP
 -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
 [score: 0.0000]
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 1877
Cc: 1877 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

Derick Eddington <derick.eddington@HIDDEN> writes:

> A new Scheme major mode I've made [1] requires regular expressions that
> can match characters by their Unicode general categories.  It seems
> Emacs regular expressions do not provide a way to do that directly (I'm
> using GNU Emacs 23.0.60.1)

(I'm going through old bug reports that unfortunately didn't get any
response at the time.)

I'm not quite sure what Unicode general categories you're referring to,
but the Emacs regexp matcher has gained a bunch of categories in the ten
years since you made the request.

Are the categories below what you were thinking of?

=E2=80=98[:print:]=E2=80=99
     This matches any printing character=E2=80=94either whitespace, or a gr=
aphic
     character matched by =E2=80=98[:graph:]=E2=80=99.
=E2=80=98[:punct:]=E2=80=99
     This matches any punctuation character.  (At present, for multibyte
     characters, it matches anything that has non-word syntax.)
=E2=80=98[:space:]=E2=80=99
     This matches any character that has whitespace syntax (*note Syntax
     Class Table::).
=E2=80=98[:upper:]=E2=80=99
     This matches any upper-case letter, as determined by the current
     case table (*note Case Tables::).  If =E2=80=98case-fold-search=E2=80=
=99 is
     non-=E2=80=98nil=E2=80=99, this also matches any lower-case letter.
=E2=80=98[:word:]=E2=80=99
     This matches any character that has word syntax (*note Syntax Class
     Table::).

(etc)

--=20
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#1877; Package emacs. Full text available.
Severity set to `wishlist' from `normal' Request was from Glenn Morris <rgm@HIDDEN> to control@HIDDEN. Full text available.

Message received at submit@HIDDEN:


Received: (at submit) by emacsbugs.donarmstrong.com; 12 Jan 2009 20:38:30 +0000
From derick.eddington@HIDDEN Mon Jan 12 12:38:30 2009
X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02
	(2008-06-10) on rzlab.ucr.edu
X-Spam-Level: 
X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available.
	hammytokens:Tokens not available.
X-Spam-Status: No, score=0.0 required=4.0 tests=none autolearn=ham
	version=3.2.5-bugs.debian.org_2005_01_02
Received: from lists.gnu.org (lists.gnu.org [199.232.76.165])
	by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n0CKcR7L005066
	for <submit@HIDDEN>; Mon, 12 Jan 2009 12:38:28 -0800
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1LMTYI-00020w-Ey
	for bug-gnu-emacs@HIDDEN; Mon, 12 Jan 2009 15:38:26 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1LMTYE-0001zl-Lh
	for bug-gnu-emacs@HIDDEN; Mon, 12 Jan 2009 15:38:24 -0500
Received: from [199.232.76.173] (port=48856 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1LMTYE-0001zc-1i
	for bug-gnu-emacs@HIDDEN; Mon, 12 Jan 2009 15:38:22 -0500
Received: from rv-out-0708.google.com ([209.85.198.242]:12632)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <derick.eddington@HIDDEN>)
	id 1LMTYD-0002Cm-HX
	for bug-gnu-emacs@HIDDEN; Mon, 12 Jan 2009 15:38:21 -0500
Received: by rv-out-0708.google.com with SMTP id k29so13038755rvb.6
        for <bug-gnu-emacs@HIDDEN>; Mon, 12 Jan 2009 12:38:19 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=gamma;
        h=domainkey-signature:received:received:subject:from:to:content-type
         :date:message-id:mime-version:x-mailer:content-transfer-encoding;
        bh=O0iZ+h5Pz6ufGrCNgD87H6KRlGuoyapQ/89yBt37cG4=;
        b=qS5xWeGfQBfyi9mFIFw69ZGnzp97xSjE2ny2D9sjF0u74lvM8D/hLVa30SzAk2z2Ci
         mgfB6OlRcgzGFyVEYgWHj3zh6+bjR10zPv/+667iDBY/Q47TlhDY0cUBgh5yuDXjJiK0
         c5HSnax7UXnW09mLcY7wJpjD1vjIRKAadHb0k=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=subject:from:to:content-type:date:message-id:mime-version:x-mailer
         :content-transfer-encoding;
        b=a22iK9MEk4alGtvXXvt4HtzgHS/iy6HlajjuPqkwEcxUNFVIX3Ek/EDJnguMrZA1Ut
         43Li9vlV4zxbHxQjXhNxyIEWqiTtTdwil8vRM1hUNMDtwvD/ExU79Kd1ndmgvheZeujT
         ByXvixW5B4b1BDD4WmI7+ElkCr19UJRJsomkY=
Received: by 10.114.147.7 with SMTP id u7mr19737822wad.138.1231792699779;
        Mon, 12 Jan 2009 12:38:19 -0800 (PST)
Received: from ?192.168.1.2? (pool-173-51-86-88.lsanca.fios.verizon.net [173.51.86.88])
        by mx.google.com with ESMTPS id y25sm47669915pod.10.2009.01.12.12.38.17
        (version=SSLv3 cipher=RC4-MD5);
        Mon, 12 Jan 2009 12:38:18 -0800 (PST)
Subject: Request: Regular expressions that can match Unicode general
 categories
From: Derick Eddington <derick.eddington@HIDDEN>
To: bug-gnu-emacs@HIDDEN
Content-Type: text/plain
Date: Mon, 12 Jan 2009 12:38:12 -0800
Message-Id: <1231792692.22467.115.camel@eep>
Mime-Version: 1.0
X-Mailer: Evolution 2.24.2 
Content-Transfer-Encoding: 7bit
X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 2)

A new Scheme major mode I've made [1] requires regular expressions that
can match characters by their Unicode general categories.  It seems
Emacs regular expressions do not provide a way to do that directly (I'm
using GNU Emacs 23.0.60.1) (I couldn't find anything about it in the
Emacs documentation, emacswiki.org, or by asking on
help-gnu-emacs@HIDDEN or in that list's archives).  So currently I
pre-compute character sets for the needed general categories (using
`get-char-code-property') and place these in their positions in the
larger regular expressions.  However, including character sets for every
general category I need makes the regular expressions too large for
Emacs and it errors trying to use them (some of them are pretty big); so
currently I'm not supporting all of them that are required.  Another
issue is these character sets are duplicated in different regular
expressions and since they're so large this causes code size bloat.
Another issue is I suspect matching character sets this large is not the
most time-efficient.

If Emacs regular expressions had some construct, similar to the existing
`\cC' one, that matched a character by its general category, I think
that would solve all the above issues nicely.  PLT Scheme regular
expressions have this ability [2].  

[1]
https://code.launchpad.net/~derick-eddington/scheme-mode/derick-.emacs.d
[2] http://docs.plt-scheme.org/reference/regexp.html

Thank you for your work on Emacs and for your time,

-- 
: Derick
----------------------------------------------------------------







Acknowledgement sent to Derick Eddington <derick.eddington@HIDDEN>:
New bug report received and forwarded. Copy sent to Emacs Bugs <bug-gnu-emacs@HIDDEN>. Full text available.
Report forwarded to bug-submit-list@HIDDEN, Emacs Bugs <bug-gnu-emacs@HIDDEN>:
bug#1877; Package emacs. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Sun, 14 Nov 2021 06:30:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.