GNU bug report logs - #41006
26.3; regular expressions documentation

Package: emacs;

Reported by: jan <rtm443x <at> googlemail.com>

Date: Fri, 1 May 2020 19:07:01 UTC

Severity: wishlist

Found in version 26.3

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 41006 in the body.
You can then email your comments to 41006 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Fri, 01 May 2020 19:07:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to jan <rtm443x <at> googlemail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Fri, 01 May 2020 19:07:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: jan <rtm443x <at> googlemail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 26.3; regular expressions documentation
Date: Fri, 1 May 2020 20:06:06 +0100

Hi, 3 issues (nothing major).

1. Suggest emacs' excellent documentation should not distinguish between
Regexps and Regexp Backslash in the manual.
That is, these 2 should be combined:


  * Regexps::                   Syntax of regular expressions.
  * Regexp Backslash::          Regular expression constructs starting with ‘\’.

AFAICS the difference is purely arbitrary.
There have been times I've looked for the syntax for such syntax and
found it only because I knew it was there, but not in the same
section. A beginner might conclude emacs doesn't support them.


2. The documentation for {N, M} repetitions matching is incomplete (has
been a long time). Docs say M may be omitted but from testing I find N
may be omitted also ie \{,M\} is valid, and useful.

eg.

ba\{,2\}d

correctly matches only the top 2 of

bad
baad
baaad
baaaad


3. This from the docs

‘\=’
     matches the empty string, but only at point.

baffles me. I've had a hard look for any examples of how it may be used
and found nothing. I feel I may be missing something extremely useful, I
just don't know what!

thanks

jan



In GNU Emacs 26.3 (build 1, x86_64-w64-mingw32)
 of 2019-08-29 built on CIRROCUMULUS
Repository revision: 96dd0196c28bc36779584e47fffcca433c9309cd
Windowing system distributor 'Microsoft Corp.', version 6.1.7601
Recent messages:
Loading desktop...done
Warning: desktop file appears to be in use by PID 4740.
Using it may cause conflicts.  Use it anyway? (y or n) n
Desktop file in use; not loaded.
For information about GNU Emacs and the GNU system, type C-h C-a.
Mark saved where search started
Mark set
Quit
Mark saved where search started

Configured using:
 'configure --without-dbus --host=x86_64-w64-mingw32
 --without-compress-install 'CFLAGS=-O2 -static -g3''

Configured features:
XPM JPEG TIFF GIF PNG RSVG SOUND NOTIFY ACL GNUTLS LIBXML2 ZLIB
TOOLKIT_SCROLL_BARS THREADS LCMS2

Important settings:
  value of $LANG: ENG
  locale-coding-system: cp1252

Major mode: Text

Minor modes in effect:
  desktop-save-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: (only . t)

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc puny dired dired-loaddefs
format-spec rfc822 mml mml-sec epa derived epg gnus-util rmail
rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse rfc2231
mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums
mm-util mail-prsvr mail-utils browse-url url-util thingatpt misearch
multi-isearch elec-pair edmacro kmacro desktop frameset cus-start
cus-load finder-inf info package easymenu epg-config url-handlers
url-parse auth-source cl-seq eieio eieio-core cl-macs eieio-loaddefs
password-cache url-vars seq byte-opt gv bytecomp byte-compile cconv
cl-loaddefs cl-lib time-date mule-util tooltip eldoc electric uniquify
ediff-hook vc-hooks lisp-float-type mwheel dos-w32 ls-lisp disp-table
term/w32-win w32-win w32-vars term/common-win tool-bar dnd fontset image
regexp-opt fringe tabulated-list replace newcomment text-mode elisp-mode
lisp-mode prog-mode register page menu-bar rfn-eshadow isearch timer
select scroll-bar mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame cl-generic cham georgian utf-8-lang misc-lang
vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932
hebrew greek romanian slovak czech european ethiopic indian cyrillic
chinese composite charscript charprop case-table epa-hook jka-cmpr-hook
help simple abbrev obarray minibuffer cl-preloaded nadvice loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote threads w32notify w32 lcms2 multi-tty make-network-process
emacs)

Memory information:
((conses 16 132611 9138)
 (symbols 48 23577 1)
 (miscs 40 56 151)
 (strings 32 41277 1782)
 (string-bytes 1 1117275)
 (vectors 16 18093)
 (vector-slots 8 539691 7632)
 (floats 8 60 91)
 (intervals 56 259 5)
 (buffers 992 12))

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Sun, 03 May 2020 03:41:02 GMT) Full text and rfc822 format available.

Message #8 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: jan <rtm443x <at> googlemail.com>
Cc: 41006 <at> debbugs.gnu.org
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Sat, 02 May 2020 23:40:27 -0400

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > 1. Suggest emacs' excellent documentation should not distinguish between
  > Regexps and Regexp Backslash in the manual.
  > That is, these 2 should be combined:

When a node is too long, browsing in Info becomes inconvenient.
Therefore, we look for a reading way to split up the node.

We found that way to split up the node on regexps.
There is no logical _need_ to split the topic that way, but it is not
unreasonable, so it was a valid solution to the overlongness.

I expect that many nodes are too long now, and we should look for
reasonable ways to split them.

-- 
Dr Richard Stallman
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Sun, 03 May 2020 10:33:01 GMT) Full text and rfc822 format available.

Message #11 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: jan <rtm443x <at> googlemail.com>
To: rms <at> gnu.org
Cc: 41006 <at> debbugs.gnu.org
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Sun, 3 May 2020 11:31:52 +0100

Well there's a name I recognise!  Wasn't expecting that.

Anyway, allow me to push back slightly.  I have long ago training in
user interfaces, but I'm no expert.

To transmit info it has to be, among other things, 'evident' or
'obvious'.  In this case I'd say that means visible.
Section "15.6 Syntax of Regular Expressions" is very visible.  It's
easily navigable up and down by mouse or keyboard, also the scroll bar
gives clearer indication of how much there is and where you are on the
page.  Great stuff.
Even better it's easy to browse 'exhaustively' - if I start at the top
of the page and into the bottom I know I've covered everything.  I
find that property very useful as I do *a lot* of technical reading.

But it's only showing about half the information.  There is no evident
indication that there is more.  I'm not the first person to get
confused by this
(<http://emacs.1067599.n8.nabble.com/regex-question-td75006.html> "The
answer you seek seems to be in a separate section (Regexp Backslash),
at least in the version I am currently using" - that took about 15
seconds to find).

If you combine  both sections together it would be visible and
exhaustive.  Whether it would be too long is something I can't answer,
but my opinion would be it's okay (based purely on my evidence-free
opinion).

If you don't want to combine them then make the other half reasonably
visible (it's rather odd that the top of the section points you at
"(elisp)Regular Expressions" but not to backslash section).  About the
only evidence there is a second section,  is right at the very top in
the breadcrumbs (Next: Regexp Backslash) and a little hint at the end
("...since backslashes can legitimately precede these characters where
they _have_ special meaning, as in...").

Specifically may have suggests that the start that currently looks like this:

"
15.6 Syntax of Regular Expressions
==================================

This section (and this manual in general)...
"

perhaps have a direct link to the next section, like this

"
15.6 Syntax of Regular Expressions
==================================

Non-Backslash Regular Expressions   <--- dead link because you're here
Backslash Regular Expressions (more regexp syntax)  <--- live link

This section (and this manual in general)...
"

And perhaps repeat that link at the end of that help page as well.

Or something else.  Whatever you think works, assuming you even think
I have a point.

regards

jan

On 03/05/2020, Richard Stallman <rms <at> gnu.org> wrote:
> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
>   > 1. Suggest emacs' excellent documentation should not distinguish
> between
>   > Regexps and Regexp Backslash in the manual.
>   > That is, these 2 should be combined:
>
> When a node is too long, browsing in Info becomes inconvenient.
> Therefore, we look for a reading way to split up the node.
>
> We found that way to split up the node on regexps.
> There is no logical _need_ to split the topic that way, but it is not
> unreasonable, so it was a valid solution to the overlongness.
>
> I expect that many nodes are too long now, and we should look for
> reasonable ways to split them.
>
> --
> Dr Richard Stallman
> Chief GNUisance of the GNU Project (https://gnu.org)
> Founder, Free Software Foundation (https://fsf.org)
> Internet Hall-of-Famer (https://internethalloffame.org)
>
>
>

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Sun, 03 May 2020 13:08:02 GMT) Full text and rfc822 format available.

Message #14 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: jan <rtm443x <at> googlemail.com>
Cc: Richard Stallman <rms <at> gnu.org>, 41006 <at> debbugs.gnu.org
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Sun, 3 May 2020 15:07:39 +0200

The disposition of the regexp documentation could be improved, yes. Currently it's arranged by syntax, which is the implementor's view, rather than by function, which is the user's. Condensing related text into a single page would help. (Cf. the more recently written section on rx in Emacs 27.)

The manual does say

‘\{M,N\}’
     [...] If M is omitted, the minimum is 0; if N is omitted, there is no maximum.

so you may be mistaken on that point.

The \= anchor is probably less frequently used than the other zero-width assertions such as $, \< etc but does come in handy occasionally. It's there in case you need it.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Sun, 03 May 2020 14:02:02 GMT) Full text and rfc822 format available.

Message #17 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: jan <rtm443x <at> googlemail.com>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: Richard Stallman <rms <at> gnu.org>, 41006 <at> debbugs.gnu.org
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Sun, 3 May 2020 15:00:43 +0100

> Currently it's arranged by syntax, which is the implementor's view, rather than by function, which is the user's.

Nicely put.

> The manual does say
>
> ‘\{M,N\}’
>      [...] If M is omitted, the minimum is 0; if N is omitted, there is no
> maximum.

I did install 26.3 to make sure, and I've checked again and I
genuinely can't see that, but if it does, sorted, thanks.

The \= I'm sure is great, I just don't know where it might be useful.
Some examples might be of help however the manual isn't the place for
them.
The value of other anchors is obvious, I've used them all IIRC.

Thank you all.

jan

On 03/05/2020, Mattias Engdegård <mattiase <at> acm.org> wrote:
> The disposition of the regexp documentation could be improved, yes.
> Currently it's arranged by syntax, which is the implementor's view, rather
> than by function, which is the user's. Condensing related text into a single
> page would help. (Cf. the more recently written section on rx in Emacs 27.)
>
> The manual does say
>
> ‘\{M,N\}’
>      [...] If M is omitted, the minimum is 0; if N is omitted, there is no
> maximum.
>
> so you may be mistaken on that point.
>
> The \= anchor is probably less frequently used than the other zero-width
> assertions such as $, \< etc but does come in handy occasionally. It's there
> in case you need it.
>
>

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Sun, 03 May 2020 20:11:01 GMT) Full text and rfc822 format available.

Message #20 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Mattias Engdegård <mattiase <at> acm.org>, jan
 <rtm443x <at> googlemail.com>
Cc: Richard Stallman <rms <at> gnu.org>, 41006 <at> debbugs.gnu.org
Subject: RE: bug#41006: 26.3; regular expressions documentation
Date: Sun, 3 May 2020 13:08:14 -0700 (PDT)

> The disposition of the regexp documentation could be improved, yes.
> Currently it's arranged by syntax, which is the implementor's view,
> rather than by function, which is the user's.

FWIW, I disagree with that characterization.

Especially when it comes to the doc for regexp
patterns, as a user I want it to be organized
according to syntax.  A regexp (regardless of
the particular syntax system used for regexps
in a given language) is very much about syntax.

And if you try to organize the content instead
by the functions performed by different regexp
constructs (syntax) or their combinations, then
there are a zillion, conflicting possibilities.

A given such "use" organization might be perfect
for user U1 when looking for help with use case
C1, but it won't be so great for user U2 or even
for U1 when looking for help with a different use
case.

That's the trouble with use-case/task-oriented
doc.  Everyone thinks it's a great idea: "If I
just had some doc that directly addressed this
particular problem...".  And it is a great idea
as far as it goes.  But in general it is not a
good way to structure doc.  A set of tasks/use
cases is not easily structured in a useful way
for users.  Searching the doc can help, but
that's about it.

The Elisp manual is a combination of reference
doc (what) with user-guide doc (how-to).  Guide
doc can usefully include task help.  But guide
doc necessarily supplements - stands on top of -
reference doc; it's no substitute for it.

And when it comes to regexp doc in the Elisp
manual, we need solid reference doc, first and
foremost.  And the best organization for it in
this case is in terms of regexp syntax.

That doesn't mean that we can't _also_ have
some guidance (how-to) doc, which directly
addresses _using_ regexps: what you can (and
can't) do with them, and examples of how to
make best use of them in certain cases.

(Just one opinion.)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Sun, 03 May 2020 20:32:02 GMT) Full text and rfc822 format available.

Message #23 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: Mattias Engdegård <mattiase <at> acm.org>,
 Richard Stallman <rms <at> gnu.org>, 41006 <at> debbugs.gnu.org,
 jan <rtm443x <at> googlemail.com>
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Sun, 03 May 2020 22:31:20 +0200

Drew Adams <drew.adams <at> oracle.com> writes:

>> The disposition of the regexp documentation could be improved, yes.
>> Currently it's arranged by syntax, which is the implementor's view,
>> rather than by function, which is the user's.
>
> FWIW, I disagree with that characterization.
>
> Especially when it comes to the doc for regexp
> patterns, as a user I want it to be organized
> according to syntax.  A regexp (regardless of
> the particular syntax system used for regexps
> in a given language) is very much about syntax.

For me, this is different.  My background before ELisp was already
having using regexps extensively in other languages, having read the
"Mastering Regular Expressions" book, and so on.  (I expect that this
is fairly typical.)

So I go to the "Regular Expressions" node, looking mostly for how to
use them.  But I find nothing on that there.  I only find a review of
what looks like everything I already knew about regular expressions.

In the past, I did this: scratched my head, gave up and searched the
web instead.  And it left me thinking that it's weird that Emacs
documentation on Regular Expression is so poor...

I have since learned that the information I have been looking for is
actually in a separate node, for some reason not sorting under
"Regular Expressions", called "Regexp Search".  This section is
expertly written and exactly what I would have needed, only too bad I
couldn't find it!  :-)

It definitely seems to me that there is room for improvement here.
And I think it's more about the structure than content.

---

BTW, while we're on it, it would be very handy to have an overview in
the manual of the quirks of regexps in Emacs in comparison to other
languages.  Mastering Regular Expressions does a very good job here,
as far as I recall.  That plus a list of which functions to use would
get me, when I first started out with ELisp, 99% of where I needed to
be, I think.

Just my 2 cents here.

Best regards,
Stefan Kangas

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Mon, 04 May 2020 01:02:02 GMT) Full text and rfc822 format available.

Message #26 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Stefan Kangas <stefan <at> marxist.se>
Cc: Mattias Engdegård <mattiase <at> acm.org>,
 Richard Stallman <rms <at> gnu.org>, 41006 <at> debbugs.gnu.org,
 jan <rtm443x <at> googlemail.com>
Subject: RE: bug#41006: 26.3; regular expressions documentation
Date: Sun, 3 May 2020 18:00:48 -0700 (PDT)

> >> The disposition of the regexp documentation could be improved, yes.
> >> Currently it's arranged by syntax, which is the implementor's view,
> >> rather than by function, which is the user's.
> >
> > FWIW, I disagree with that characterization.
> >
> > Especially when it comes to the doc for regexp
> > patterns, as a user I want it to be organized
> > according to syntax.  A regexp (regardless of
> > the particular syntax system used for regexps
> > in a given language) is very much about syntax.
> 
> For me, this is different.  My background before ELisp was already
> having using regexps extensively in other languages, having read the
> "Mastering Regular Expressions" book, and so on.  (I expect that this
> is fairly typical.)

It's my case too, FWIW.  So far "this is different" isn't.

> So I go to the "Regular Expressions" node, looking mostly for how to
> use them.  But I find nothing on that there.  I only find a review of
> what looks like everything I already knew about regular expressions.

Hm.  Use of regexps is precisely what I'd have thought
you had experience with before Elisp.  What's particular
about Elisp regexps is their syntax and the behavior of
the regexp engine.  (And those pecularities are, BTW,
presented in "Mastering Regular Expressions", where
different regexp dialects are compared.)

> In the past, I did this: scratched my head, gave up and searched the
> web instead.  And it left me thinking that it's weird that Emacs
> documentation on Regular Expression is so poor...

Can you give an example (even artificial, if you don't
recall) of some "use" of regexps that you might have
wanted to find, and could not, in the manual.  Or even
just that you wanted to find (even if you found it).
I think I'm probably not getting what you have in mind.

> I have since learned that the information I have been looking for is
> actually in a separate node, for some reason not sorting under
> "Regular Expressions", called "Regexp Search".  This section is
> expertly written and exactly what I would have needed, only too bad I
> couldn't find it!  :-)

Ah, I see.  That's what you mean by using regexps.
OK, makes sense.  But that info, as you say, is there.
It's just that you couldn't find it at first.  You
didn't care about the Elisp regexp syntax etc.  You
wanted to know how to use a regexp to match text.

So the problem, I guess, was only that you had some
difficulty finding that doc.  Do you have an idea
what the difficulty was?  A guess: could it have
been because that doc was represented as being about
"searching" and you were looking for something about
"matching"?  Maybe the index could be improved, if
it's something as simple as that.

In addition, perhaps there could be a cross-reference
to the doc you were really looking for (node `Regexp
Search') from nodes `Regular Expressions' and `Regexp
Functions'.  Do you think that would help?

(Note BTW that the menu in node `Searching and
Matching' lists menu items `Regular Expressions' and
`Regexp Search'.)

> It definitely seems to me that there is room for improvement here.
> And I think it's more about the structure than content.

Got it.  Think it over and see if you can come up
with a suggested change.  I'm thinking indexing
and xrefs, but maybe something else is needed.

Maybe the order of nodes `Regular Expressions' and
`Regexp Search' should be switched: present how to
use them, before diving deep down into the exact
syntax.  I think that might make sense.

> BTW, while we're on it, it would be very handy to have an overview in
> the manual of the quirks of regexps in Emacs in comparison to other
> languages.  Mastering Regular Expressions does a very good job here,
> as far as I recall.  That plus a list of which functions to use would
> get me, when I first started out with ELisp, 99% of where I needed to
> be, I think.

That's just the comparison I mentioned above.
I'd suggest that instead of reproducing something
like that (which needs updating from time to time)
in the manual, the manual just have an external
link to such a comparison on the web.  If that
already exists then the updating might take care
of itself.  If not, then so be it.  If the book
is available as HTML, that could work.  If not,
and if there's no existing comparison, Someone(TM)
could create it on EmacsWiki.  Just a thought.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Mon, 04 May 2020 03:11:01 GMT) Full text and rfc822 format available.

Message #29 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: Mattias EngdegÃ¥rd <mattiase <at> acm.org>
Cc: rtm443x <at> googlemail.com, 41006 <at> debbugs.gnu.org
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Sun, 03 May 2020 23:10:33 -0400

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > The disposition of the regexp documentation could be improved,
  > yes. Currently it's arranged by syntax, which is the implementor's
  > view, rather than by function, which is the user's.

Would you like to propose an ordering and classification by function?
Then we could think about whether it is better.

-- 
Dr Richard Stallman
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Mon, 04 May 2020 09:14:01 GMT) Full text and rfc822 format available.

Message #32 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: jan <rtm443x <at> googlemail.com>
To: rms <at> gnu.org
Cc: Mattias EngdegÃ¥rd <mattiase <at> acm.org>,
 41006 <at> debbugs.gnu.org
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Mon, 4 May 2020 10:13:06 +0100

Hi all,
I'd like to push back here a little, to what I originally raised
because it's simple.

In my case I couldn't find ~50% of the regex docs. I only continued to
hunt because I knew it was there.
In the linked question I gave
(<http://emacs.1067599.n8.nabble.com/regex-question-td75006.html>) the
questioner had to be actually told of the other section in the manual.
The problem is the information is split into sections but without
visibility from one section that the other exists.

Solution: that's up to you. I'd say simply make one long list instead
of 2 shorter, but if not that then *make a clear link between them*.
The fact that they're split by some historic classification is true,
but not IMO important any more - all I want is the basic property of
findability. Unify these 2 islands or show a clear bridge between
them, anything that stops one being being marooned for lack of
visibility.

Stefan's experience indicates a third island, also unbridged if I read
him right.

Make the relationship between these semantically related places highly
visible is all I'm suggesting.

cheers

jan

On 04/05/2020, Richard Stallman <rms <at> gnu.org> wrote:
> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
>   > The disposition of the regexp documentation could be improved,
>   > yes. Currently it's arranged by syntax, which is the implementor's
>   > view, rather than by function, which is the user's.
>
> Would you like to propose an ordering and classification by function?
> Then we could think about whether it is better.
>
> --
> Dr Richard Stallman
> Chief GNUisance of the GNU Project (https://gnu.org)
> Founder, Free Software Foundation (https://fsf.org)
> Internet Hall-of-Famer (https://internethalloffame.org)
>
>
>

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Tue, 05 May 2020 02:57:02 GMT) Full text and rfc822 format available.

Message #35 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: jan <rtm443x <at> googlemail.com>
Cc: mattiase <at> acm.org, 41006 <at> debbugs.gnu.org
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Mon, 04 May 2020 22:56:26 -0400

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

You've explained that the division of the regexp documentation caused
a problem for you.  I understand the kind of problem you describe,
but I don't understand why the problem happened.

You ask for these two nodes to be combined.

  * Regexps::                   Syntax of regular expressions.
  * Regexp Backslash::          Regular expression constructs starting with ‘\’.

What version of the Emacs Lisp Reference Manual were
you looking at?  From which Emacs version?

The current master version has a subsection called
Syntax of Regexps, which has these three subsubsections:

* Regexp Special::      Special characters in regular expressions.
* Char Classes::        Character classes used in regular expressions.
* Regexp Backslash::    Backslash-sequences in regular expressions.

Does this change in structure fix the problem?

-- 
Dr Richard Stallman
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Tue, 05 May 2020 02:57:02 GMT) Full text and rfc822 format available.

Message #38 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: Stefan Kangas <stefan <at> marxist.se>
Cc: mattiase <at> acm.org, 41006 <at> debbugs.gnu.org, drew.adams <at> oracle.com,
 rtm443x <at> googlemail.com
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Mon, 04 May 2020 22:56:28 -0400

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > So I go to the "Regular Expressions" node, looking mostly for how to
  > use them.  But I find nothing on that there.  I only find a review of
  > what looks like everything I already knew about regular expressions.

What I see in master is this structure.
Is this different from what you saw?
Does this structure eliminate the problem you had?


@node Regular Expressions
@section Regular Expressions
@cindex regular expression
@cindex regexp

...

@menu
* Syntax of Regexps::       Rules for writing regular expressions.
* Regexp Example::          Illustrates regular expression syntax.
@ifnottex
* Rx Notation::             An alternative, structured regexp notation.
@end ifnottex
* Regexp Functions::        Functions for operating on regular expressions.
@end menu

-- 
Dr Richard Stallman
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Tue, 05 May 2020 10:03:02 GMT) Full text and rfc822 format available.

Message #41 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: jan <rtm443x <at> googlemail.com>
To: rms <at> gnu.org
Cc: mattiase <at> acm.org, 41006 <at> debbugs.gnu.org
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Tue, 5 May 2020 11:02:13 +0100

Hi,
C-h C-a ('about emacs') gives the splash screen with

GNU Emacs 26.3 (build 1, x86_64-w64-mingw32)
 of 2019-08-29

I'm using the online reference manual.  From the menu:
'Search Documentation' submenu 'Look Up Subject In User Manual'

minibuffer says:
'Subject to look up:'

I enter:
regexp RET

I immediately get taken to the page "15.6 Syntax of Regular
Expressions", directly into the node, *not* into the one level higher
menu which would visibly show there are 2 regular expression nodes.

I would guess this is a most people find it.

cheers

jan (but please see other reply on this thread which shows why I've
been getting confused about what I'm seeing)


On 05/05/2020, Richard Stallman <rms <at> gnu.org> wrote:
> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
> You've explained that the division of the regexp documentation caused
> a problem for you.  I understand the kind of problem you describe,
> but I don't understand why the problem happened.
>
> You ask for these two nodes to be combined.
>
>   * Regexps::                   Syntax of regular expressions.
>   * Regexp Backslash::          Regular expression constructs starting with
> ‘\’.
>
> What version of the Emacs Lisp Reference Manual were
> you looking at?  From which Emacs version?
>
> The current master version has a subsection called
> Syntax of Regexps, which has these three subsubsections:
>
> * Regexp Special::      Special characters in regular expressions.
> * Char Classes::        Character classes used in regular expressions.
> * Regexp Backslash::    Backslash-sequences in regular expressions.
>
> Does this change in structure fix the problem?
>
> --
> Dr Richard Stallman
> Chief GNUisance of the GNU Project (https://gnu.org)
> Founder, Free Software Foundation (https://fsf.org)
> Internet Hall-of-Famer (https://internethalloffame.org)
>
>
>

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Tue, 05 May 2020 10:06:01 GMT) Full text and rfc822 format available.

Message #44 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: jan <rtm443x <at> googlemail.com>
To: rms <at> gnu.org
Cc: mattiase <at> acm.org, Stefan Kangas <stefan <at> marxist.se>, 41006 <at> debbugs.gnu.org,
 drew.adams <at> oracle.com
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Tue, 5 May 2020 11:05:25 +0100

Hi all,
I know this was addressed at Stephan but I'm getting increasingly
puzzled by the difference between what other people are describing and
what I'm seeing.
 In the other e-mail Mr Stallman said he could see this:

* Regexp Special::      Special characters in regular expressions.
* Char Classes::        Character classes used in regular expressions.
* Regexp Backslash::    Backslash-sequences in regular expressions.

My emacs (Windows, 26.3) shows

* Regexp Search::             Search for match for a regexp.
* Regexps::                   Syntax of regular expressions.
* Regexp Backslash::          Regular expression constructs starting with ‘\’.
* Regexp Example::            A complex regular expression explained.

I started up my ubuntu machine and try to look in the manual to see if
it was different from the windows, FYI I got "Info file emacs does not
exist".

OK... it finally twigged. I've just looked elsewhere. From the emacs menu:
'Search Documentation' submenu 'Look Up Subject In Elisp Manual'

Enter:
regexp RET

Click 'Syntax Of Regexps' gets you to:

"34.3.1 Syntax of Regular Expressions" which looks like what Mr
Stallman has been describing

and finally I can see what Eli has described:

"
‘\{M,N\}’
     is a more general postfix operator that specifies repetition with a
     minimum of M repeats and a maximum of N repeats.  If M is omitted,
     the minimum is 0; if N is omitted, there is no maximum.  For both
     forms, M and N, if specified, may be no larger than 2**15 - 1 .

     For example, ‘c[ad]\{1,2\}r’ matches the strings ‘car’, ‘cdr’,
     ‘caar’, ‘cadr’, ‘cdar’, and ‘cddr’, and nothing else.
     ‘\{0,1\}’ or ‘\{,1\}’ is equivalent to ‘?’.
     ‘\{0,\}’ or ‘\{,\}’ is equivalent to ‘*’.
     ‘\{1,\}’ is equivalent to ‘+’.
"

Now compare this to what's in the user manual (not the Elisp manual):

"
‘\{N,M\}’
     is a postfix operator specifying between N and M repetitions—that
     is, the preceding regular expression must match at least N times,
     but no more than M times.  If M is omitted, then there is no upper
     limit, but the preceding regular expression must match at least N
     times.
     ‘\{0,1\}’ is equivalent to ‘?’.
     ‘\{0,\}’ is equivalent to ‘*’.
     ‘\{1,\}’ is equivalent to ‘+’.
"
The user manual doesn't state that the {,x} syntax is valid where the
Elisp manual does.

thanks

jan



On 05/05/2020, Richard Stallman <rms <at> gnu.org> wrote:
> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
>   > So I go to the "Regular Expressions" node, looking mostly for how to
>   > use them.  But I find nothing on that there.  I only find a review of
>   > what looks like everything I already knew about regular expressions.
>
> What I see in master is this structure.
> Is this different from what you saw?
> Does this structure eliminate the problem you had?
>
>
> @node Regular Expressions
> @section Regular Expressions
> @cindex regular expression
> @cindex regexp
>
> ...
>
> @menu
> * Syntax of Regexps::       Rules for writing regular expressions.
> * Regexp Example::          Illustrates regular expression syntax.
> @ifnottex
> * Rx Notation::             An alternative, structured regexp notation.
> @end ifnottex
> * Regexp Functions::        Functions for operating on regular expressions.
> @end menu
>
> --
> Dr Richard Stallman
> Chief GNUisance of the GNU Project (https://gnu.org)
> Founder, Free Software Foundation (https://fsf.org)
> Internet Hall-of-Famer (https://internethalloffame.org)
>
>
>

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Tue, 05 May 2020 16:52:02 GMT) Full text and rfc822 format available.

Message #47 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: jan <rtm443x <at> googlemail.com>
Cc: 41006 <at> debbugs.gnu.org, Stefan Kangas <stefan <at> marxist.se>, rms <at> gnu.org,
 drew.adams <at> oracle.com
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Tue, 5 May 2020 18:51:33 +0200

5 maj 2020 kl. 12.05 skrev jan <rtm443x <at> googlemail.com>:

> The user manual doesn't state that the {,x} syntax is valid where the
> Elisp manual does.

Sorry, I somehow got the impression that you referred to the Elisp manual; I should have read your message more attentively.

It looks like they both would benefit from some restructuring; having "Regexp backslash" as a separate, parallel section doesn't make sense in either. Note that omitting \{,N\} from the Emacs manual is no great sin, because the user can just use \{0,N\} instead.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Tue, 05 May 2020 17:13:02 GMT) Full text and rfc822 format available.

Message #50 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Richard Stallman <rms <at> gnu.org>
Cc: jan <rtm443x <at> googlemail.com>, Stefan Kangas <stefan <at> marxist.se>,
 41006 <at> debbugs.gnu.org
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Tue, 5 May 2020 19:12:50 +0200

4 maj 2020 kl. 05.10 skrev Richard Stallman <rms <at> gnu.org>:

> Would you like to propose an ordering and classification by function?

Start with the lexical details: special chars, escaping, literals.
Then something like this:

* concatenation and alternative
* repetition: * + ? etc
* bracketing: \(?: ... \)
* single-character expressions: [...] '.' \cX etc
* zero-width assertions: ^ $ \< etc
* capture groups and backrefs

Aim for all in a single node, using subheadings as appropriate, as this is what the user probably wants to see. Use subnodes for some in-depth information such as named character classes.

As Stefan pointed out, a section comparing the syntax with that of other regexp implementations likely seen by the user would be very welcome. It would permit the reader to make use of existing knowledge while reducing mistakes coming from incorrect assumptions.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Tue, 05 May 2020 17:42:02 GMT) Full text and rfc822 format available.

Message #53 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: rtm443x <at> googlemail.com, stefan <at> marxist.se, rms <at> gnu.org,
 41006 <at> debbugs.gnu.org
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Tue, 05 May 2020 20:40:03 +0300

> From: Mattias Engdegård <mattiase <at> acm.org>
> Date: Tue, 5 May 2020 19:12:50 +0200
> Cc: jan <rtm443x <at> googlemail.com>, Stefan Kangas <stefan <at> marxist.se>,
>  41006 <at> debbugs.gnu.org
> 
> Aim for all in a single node, using subheadings as appropriate

I usually prefer not to use subheadings, because Info doesn't have
commands to go to a subheading.  Can we make each subheading a
subsubsection and give it a node?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Tue, 05 May 2020 17:51:02 GMT) Full text and rfc822 format available.

Message #56 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: jan <rtm443x <at> googlemail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Mattias Engdegård <mattiase <at> acm.org>, stefan <at> marxist.se,
 rms <at> gnu.org, 41006 <at> debbugs.gnu.org
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Tue, 5 May 2020 18:50:36 +0100

I'd prefer to see a single longish page (it's still not that long I
don't think, but purely a personal view). If you were to split it

1. how would it be split?

2. how would it be done to prevent the 'islands of information' effect
that may be causing problems?
In a sense that's my concern - one page or 5, just so long as a person
can't get stranded on one.

thanks

jan

On 05/05/2020, Eli Zaretskii <eliz <at> gnu.org> wrote:
>> From: Mattias Engdegård <mattiase <at> acm.org>
>> Date: Tue, 5 May 2020 19:12:50 +0200
>> Cc: jan <rtm443x <at> googlemail.com>, Stefan Kangas <stefan <at> marxist.se>,
>>  41006 <at> debbugs.gnu.org
>>
>> Aim for all in a single node, using subheadings as appropriate
>
> I usually prefer not to use subheadings, because Info doesn't have
> commands to go to a subheading.  Can we make each subheading a
> subsubsection and give it a node?
>

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Tue, 05 May 2020 18:11:01 GMT) Full text and rfc822 format available.

Message #59 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Mattias Engdegård <mattiase <at> acm.org>,
 Richard Stallman <rms <at> gnu.org>, 41006 <at> debbugs.gnu.org, rtm443x <at> googlemail.com
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Tue, 5 May 2020 20:09:55 +0200

Eli Zaretskii <eliz <at> gnu.org> writes:

> I usually prefer not to use subheadings, because Info doesn't have
> commands to go to a subheading.

Sorry if this is slightly off-topic, but is there any reason not to
add such a command?  Would it be hard to do so?  Just a thought.

Best regards,
Stefan Kangas

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Tue, 05 May 2020 18:22:02 GMT) Full text and rfc822 format available.

Message #62 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Richard Stallman <rms <at> gnu.org>
Cc: Mattias Engdegård <mattiase <at> acm.org>,
 41006 <at> debbugs.gnu.org, Drew Adams <drew.adams <at> oracle.com>,
 rtm443x <at> googlemail.com
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Tue, 5 May 2020 20:20:59 +0200

Richard Stallman <rms <at> gnu.org>writes:

> What I see in master is this structure.
> Is this different from what you saw?
> Does this structure eliminate the problem you had?

I'm also looking at master.  Let me be more specific.

> @menu
> * Syntax of Regexps::       Rules for writing regular expressions.
> * Regexp Example::          Illustrates regular expression syntax.
> @ifnottex
> * Rx Notation::             An alternative, structured regexp notation.
> @end ifnottex
> * Regexp Functions::        Functions for operating on regular expressions.
> @end menu

The above is the problem, for me: it does not include "Regexp Search".
This is where I find any function to actually use the regexps I learn
to construct here.

If I go up one node, to (elisp) Searching and Matching, I can see:

* Regular Expressions::   Describing classes of strings.
* Regexp Search::         Searching for a match for a regexp.

So I suggest to move "Regexp Search" so that it is a section under
"Regular Expressions" (instead of parallel to it).  Maybe conceptually
this is not as clean, but it is more pedagogical and user-friendly,
IMHO.

I would also suggest to place "Regexp Search" first, even before
"Syntax of Regexps", but this is just my personal preference and less
important.  It should at least, from my point of view, come before "Rx
Notation".

While we're at it, I also think "Regexp Search" could state, at the
top, how to access match data.  This is a bit hard to find, IME.

Best regards,
Stefan Kangas

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Tue, 05 May 2020 18:24:02 GMT) Full text and rfc822 format available.

Message #65 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Kangas <stefan <at> marxist.se>
Cc: mattiase <at> acm.org, rms <at> gnu.org, 41006 <at> debbugs.gnu.org,
 rtm443x <at> googlemail.com
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Tue, 05 May 2020 21:23:14 +0300

> From: Stefan Kangas <stefan <at> marxist.se>
> Date: Tue, 5 May 2020 20:09:55 +0200
> Cc: Mattias Engdegård <mattiase <at> acm.org>, 
> 	Richard Stallman <rms <at> gnu.org>, rtm443x <at> googlemail.com, 41006 <at> debbugs.gnu.org
> 
> > I usually prefer not to use subheadings, because Info doesn't have
> > commands to go to a subheading.
> 
> Sorry if this is slightly off-topic, but is there any reason not to
> add such a command?  Would it be hard to do so?  Just a thought.

Maybe not hard, but I think this needs to be negotiated with the
Texinfo maintainers, since Emacs is not the only Info reader out
there.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Tue, 05 May 2020 18:41:02 GMT) Full text and rfc822 format available.

Message #68 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Stefan Kangas <stefan <at> marxist.se>, Richard Stallman <rms <at> gnu.org>
Cc: Mattias Engdegård <mattiase <at> acm.org>,
 41006 <at> debbugs.gnu.org, rtm443x <at> googlemail.com
Subject: RE: bug#41006: 26.3; regular expressions documentation
Date: Tue, 5 May 2020 11:40:36 -0700 (PDT)

> If I go up one node, to (elisp) Searching and Matching, I can see:
> 
> * Regular Expressions::   Describing classes of strings.
> * Regexp Search::         Searching for a match for a regexp.
> 
> So I suggest to move "Regexp Search" so that it is a section under
> "Regular Expressions" (instead of parallel to it).  Maybe conceptually
> this is not as clean, but it is more pedagogical and user-friendly,
> IMHO.
> 
> I would also suggest to place "Regexp Search" first, even before
> "Syntax of Regexps", but this is just my personal preference and less
> important.  It should at least, from my point of view, come before "Rx
> Notation".

I think `Regexp Search' belongs under `Searching
and Matching'.

But I think `Regexp Search' should come before
`Regular Expressions'.

And maybe `POSIX Regexps' and `Standard Regexps'
should be under `Regular Expressions'.

IOW, separate using regexps (searching and matching)
from details about what regexps are and what their
syntax is.

We can present how to use them without requiring
much knowledge of what they are.

And of course we should have xrefs between the
doc about what they are and the doc about using
them.  (Probably there are already some such.)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Tue, 05 May 2020 19:05:02 GMT) Full text and rfc822 format available.

Message #71 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Drew Adams <drew.adams <at> oracle.com>, Richard Stallman <rms <at> gnu.org>
Cc: Mattias Engdegård <mattiase <at> acm.org>,
 41006 <at> debbugs.gnu.org, rtm443x <at> googlemail.com
Subject: RE: bug#41006: 26.3; regular expressions documentation
Date: Tue, 5 May 2020 15:04:25 -0400

Drew Adams <drew.adams <at> oracle.com> writes:

> IOW, separate using regexps (searching and matching)
> from details about what regexps are and what their
> syntax is.

This is the exact opposite of what I proposed, I think.

I argued that keeping them together would be more user-friendly and
pedagogical.  Could you expand on why keeping them separate is better,
in your opinion?

Best regards,
Stefan Kangas

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Tue, 05 May 2020 19:43:02 GMT) Full text and rfc822 format available.

Message #74 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Stefan Kangas <stefan <at> marxist.se>, Richard Stallman <rms <at> gnu.org>
Cc: Mattias Engdegård <mattiase <at> acm.org>,
 41006 <at> debbugs.gnu.org, rtm443x <at> googlemail.com
Subject: RE: bug#41006: 26.3; regular expressions documentation
Date: Tue, 5 May 2020 12:42:08 -0700 (PDT)

> > IOW, separate using regexps (searching and matching)
> > from details about what regexps are and what their
> > syntax is.
> 
> This is the exact opposite of what I proposed, I think.
> 
> I argued that keeping them together would be more user-friendly and
> pedagogical.  Could you expand on why keeping them separate is better,
> in your opinion?

It's not so much about separating them physically.

1. Let users who happen to read the manual
consecutively learn about _using_ regexps before
delving into the detailed reference info about
what they are - their syntax, etc.

Which was the problem reported: you were looking
for info about how to _use_ regexps, having prior
knowledge about what regexps are and what, in
general, their syntax is.

Info about _use_ before reference info about
_what_ they are.  That's possible, and probably
more helpful.

But yes, the order between "use" (search & match)
and "what" isn't all that important, especially
since each is itself a big topic with multiple
subtopics.  The main thing is grouping like with
like, "together" - #2 (next).

2. Group all of the what-they-are info together.
POSIX etc. belongs with the reference info about
regexp syntax etc.

3. Now, as to "together" in terms of getting use
info if you happen to be (e.g. to land) in what
land: xrefs.  And vice versa: getting to what
they are from the info about using them - xrefs.

4. Putting all together physically, in one giant
node, is not feasible (especially if you include
all the other nodes about "what").  And it's not
helpful.

Just one opinion.

If you disagree, fine.  But go back to your
original problem.  You were looking for info
about how to _use_ regexps in Elisp.  And you
instead landed in the bowels of _what_ they are,
including syntax details etc.

The solution for that is (a) better or additional
indexing, and (b) possibly changing the order
(organization).  My suggestion here was for (b):
put all the "what they are" info together, and
put it under the top of the "what": `Regular
Expressions', not directly under searching and
matching.

Caveat: I haven't looked into details of what
moving this stuff around would really give.
My suggestion is to take a look and see whether
it makes sense, in particular for your use case.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Tue, 05 May 2020 21:24:01 GMT) Full text and rfc822 format available.

Message #77 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Drew Adams <drew.adams <at> oracle.com>, Richard Stallman <rms <at> gnu.org>
Cc: Mattias Engdegård <mattiase <at> acm.org>,
 41006 <at> debbugs.gnu.org, rtm443x <at> googlemail.com
Subject: RE: bug#41006: 26.3; regular expressions documentation
Date: Tue, 5 May 2020 17:23:12 -0400

Drew Adams <drew.adams <at> oracle.com> writes:

> 1. Let users who happen to read the manual
> consecutively learn about _using_ regexps before
> delving into the detailed reference info about
> what they are - their syntax, etc.

Thanks for explaining.  My suggestion was simply to have it all under
one node: "Regular Expressions".  The idea being that when you go
there, you probably also want to know how to use them.

The same goes for the index, IMHO.  But that's just how I imagine the
manual could be improved.

> 4. Putting all together physically, in one giant
> node, is not feasible (especially if you include
> all the other nodes about "what").  And it's not
> helpful.

Agreed, my idea was to have them as subnodes.

Best regards,
Stefan Kangas

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Tue, 05 May 2020 21:43:02 GMT) Full text and rfc822 format available.

Message #80 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Stefan Kangas <stefan <at> marxist.se>, Richard Stallman <rms <at> gnu.org>
Cc: Mattias Engdegård <mattiase <at> acm.org>,
 41006 <at> debbugs.gnu.org, rtm443x <at> googlemail.com
Subject: RE: bug#41006: 26.3; regular expressions documentation
Date: Tue, 5 May 2020 14:37:30 -0700 (PDT)

> > 1. Let users who happen to read the manual
> > consecutively learn about _using_ regexps before
> > delving into the detailed reference info about
> > what they are - their syntax, etc.
> 
> Thanks for explaining.  My suggestion was simply to have it all under
> one node: "Regular Expressions".  The idea being that when you go
> there, you probably also want to know how to use them.
> 
> The same goes for the index, IMHO.  But that's just how I imagine the
> manual could be improved.

That's OK too.

It was your problem - you had difficulty finding the
kind of info you were looking for - so the solution
that appeals to you most is likely the best one, at
least for addressing that particular problem.

In that case, in the section an search there'd need
to be some mention the ability to do regexp search,
and xrefs to the relevant "use" sections of the
regexp doc.

My own view favors "Every Page Is Page One"

https://everypageispageone.com/the-book/

Readers will arrive at a given part of the doc any
which way.  (In general, for doc on the web they
arrive from a search-engine search.)

With that perspective, _the_ most important thing
is to have rich, rich, rich linking among related
topics.

Very few of us will sit down and read a manual,
or even a chunk of it, in order, from beginning to
end.

That's not an argument that order/structure doesn't
matter.  It's just an argument for good linking
among related parts.

> > 4. Putting all together physically, in one giant
> > node, is not feasible (especially if you include
> > all the other nodes about "what").  And it's not
> > helpful.
> 
> Agreed, my idea was to have them as subnodes.

OK by me.  Go for it, if it's OK'd.  But please be
sure that the resulting search section still ends
up useful when deprived of the regexp-search stuff.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Thu, 07 May 2020 02:41:01 GMT) Full text and rfc822 format available.

Message #83 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: Stefan Kangas <stefan <at> marxist.se>
Cc: mattiase <at> acm.org, 41006 <at> debbugs.gnu.org, drew.adams <at> oracle.com,
 rtm443x <at> googlemail.com
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Wed, 06 May 2020 22:40:52 -0400

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Thanks for explaining.  My suggestion was simply to have it all under
  > one node: "Regular Expressions".  The idea being that when you go
  > there, you probably also want to know how to use them.

The problem with a long node is that, when something links to that node,
it is not easy to find what the pertinent text in that node.
For example, if the text about re-search-forward is in a large node,
the link from the index will point to the start of the node.
It will be a hassle to find that text in the node.

Therefore we subdivide nodes when they are inconveniently long.

Having a parent node with a number of subnodes is the way we
handle this.  The Lisp Manual already seems to be set up that way.
Do you think a change is needed there?

It looks like the Emacs Manual could use a small reorganization to a similar
structure.  Would you like to propose a patch?


-- 
Dr Richard Stallman
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Thu, 07 May 2020 02:42:02 GMT) Full text and rfc822 format available.

Message #86 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: mattiase <at> acm.org, stefan <at> marxist.se, 41006 <at> debbugs.gnu.org,
 rtm443x <at> googlemail.com
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Wed, 06 May 2020 22:41:44 -0400

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > But I think `Regexp Search' should come before
  > `Regular Expressions'.

To me, what a regular expression consists of
is logically prior to how to search for one.
You can't search for one
unless you can write one.


-- 
Dr Richard Stallman
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Thu, 07 May 2020 02:42:02 GMT) Full text and rfc822 format available.

Message #89 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: Stefan Kangas <stefan <at> marxist.se>
Cc: mattiase <at> acm.org, 41006 <at> debbugs.gnu.org, drew.adams <at> oracle.com,
 rtm443x <at> googlemail.com
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Wed, 06 May 2020 22:41:52 -0400

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > While we're at it, I also think "Regexp Search" could state, at the
  > top, how to access match data.  This is a bit hard to find, IME.

How about proposing a patch to point at that?

-- 
Dr Richard Stallman
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Thu, 07 May 2020 02:43:01 GMT) Full text and rfc822 format available.

Message #92 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: Stefan Kangas <stefan <at> marxist.se>
Cc: mattiase <at> acm.org, 41006 <at> debbugs.gnu.org, drew.adams <at> oracle.com,
 rtm443x <at> googlemail.com
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Wed, 06 May 2020 22:41:51 -0400

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > If I go up one node, to (elisp) Searching and Matching, I can see:

  > * Regular Expressions::   Describing classes of strings.
  > * Regexp Search::         Searching for a match for a regexp.

  > So I suggest to move "Regexp Search" so that it is a section under
  > "Regular Expressions" (instead of parallel to it).

It seems to me that the syntax of regular expressions and how to do
things with regular expressions ought to be two parallel topics.  If
this needs to be clarified, rather than making one a subnode of the
other, could we make some OTHER change that would do the job?



-- 
Dr Richard Stallman
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Thu, 07 May 2020 02:43:02 GMT) Full text and rfc822 format available.

Message #95 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: Mattias EngdegÃ¥rd <mattiase <at> acm.org>
Cc: rtm443x <at> googlemail.com, stefan <at> marxist.se, 41006 <at> debbugs.gnu.org
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Wed, 06 May 2020 22:42:38 -0400

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > * concatenation and alternative
  > * repetition: * + ? etc
  > * bracketing: \(?: ... \)
  > * single-character expressions: [...] '.' \cX etc
  > * zero-width assertions: ^ $ \< etc
  > * capture groups and backrefs

That list could be a starting point.  (What is a "capture group"?
I have no idea.)

However, I don't think alternatives are very useful without
brackets, so brackets should come before alternatives.

I think single-character expressions
should come before repetition, because the former are more basic
and conceptually simpler.

-- 
Dr Richard Stallman
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Thu, 07 May 2020 03:18:01 GMT) Full text and rfc822 format available.

Message #98 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: rms <at> gnu.org
Cc: mattiase <at> acm.org, stefan <at> marxist.se, 41006 <at> debbugs.gnu.org,
 rtm443x <at> googlemail.com
Subject: RE: bug#41006: 26.3; regular expressions documentation
Date: Wed, 6 May 2020 20:17:11 -0700 (PDT)

>   > But I think `Regexp Search' should come before
>   > `Regular Expressions'.
> 
> To me, what a regular expression consists of
> is logically prior to how to search for one.
> You can't search for one
> unless you can write one.

I also said (in a different msg):

  Info about _use_ before reference info about
  _what_ they are.  That's possible, and probably
  more helpful.

and

  But yes, the order between "use" (search & match)
  and "what" isn't all that important, especially
  since each is itself a big topic with multiple
  subtopics.  The main thing is grouping like with
  like, "together"

Logically, as you say, in-depth/practical use of
regexps to search/match requires understanding
regexps, including soem details of their syntax
and behavior.

It doesn't follow that general use of regexps to
search/match ("how") can't usefully be introduced
before the full "what".

You saw quite a lot about the use of a car before
you had any good knowledge of the relations of its
parts, how it and they function, or the details of
how to drive.  To really know how to drive, yes,
you need to know something about the relations
among engine, brakes, lights, signals, etc.  But
to get the overall picture of how to drive, no -
you mainly need to know what a car is for, and
have some experience of seeing a car driven.

Anyway, I have no druthers here.  I was trying
to help with Stefan K's problem: he was _looking_
for how-to-use regexps, he already had knowledge
about regexps (from other languages), and when
trying to find the how-to-use info he was plunged
into the what-regexps-are info, which he didn't
care about (at that point).

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Thu, 07 May 2020 03:19:02 GMT) Full text and rfc822 format available.

Message #101 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: rms <at> gnu.org, Stefan Kangas <stefan <at> marxist.se>
Cc: mattiase <at> acm.org, 41006 <at> debbugs.gnu.org, rtm443x <at> googlemail.com
Subject: RE: bug#41006: 26.3; regular expressions documentation
Date: Wed, 6 May 2020 20:18:17 -0700 (PDT)

>   > So I suggest to move "Regexp Search" so that it is a section under
>   > "Regular Expressions" (instead of parallel to it).
> 
> It seems to me that the syntax of regular expressions and how to do
> things with regular expressions ought to be two parallel topics.

I agree, FWIW.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Thu, 07 May 2020 10:32:01 GMT) Full text and rfc822 format available.

Message #104 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: rms <at> gnu.org, Drew Adams <drew.adams <at> oracle.com>
Cc: mattiase <at> acm.org, 41006 <at> debbugs.gnu.org, rtm443x <at> googlemail.com
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Thu, 7 May 2020 06:31:31 -0400

Richard Stallman <rms <at> gnu.org> writes:

>   > But I think `Regexp Search' should come before
>   > `Regular Expressions'.
>
> To me, what a regular expression consists of
> is logically prior to how to search for one.
> You can't search for one
> unless you can write one.

Having thought about this a bit more, I think you're right.

The most important question to me is why the "Regular Expressions"
chapter does not include documentation on how to use them.

Best regards,
Stefan Kangas

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Thu, 07 May 2020 10:33:02 GMT) Full text and rfc822 format available.

Message #107 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: rms <at> gnu.org
Cc: mattiase <at> acm.org, 41006 <at> debbugs.gnu.org, drew.adams <at> oracle.com,
 rtm443x <at> googlemail.com
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Thu, 7 May 2020 06:32:01 -0400

Richard Stallman <rms <at> gnu.org> writes:

>   > If I go up one node, to (elisp) Searching and Matching, I can see:
>
>   > * Regular Expressions::   Describing classes of strings.
>   > * Regexp Search::         Searching for a match for a regexp.
>
>   > So I suggest to move "Regexp Search" so that it is a section under
>   > "Regular Expressions" (instead of parallel to it).
>
> It seems to me that the syntax of regular expressions and how to do
> things with regular expressions ought to be two parallel topics.  If
> this needs to be clarified, rather than making one a subnode of the
> other, could we make some OTHER change that would do the job?

I used the word "parallel" only in the sense that you first have to
leave the chapter called "Regular Expressions" to find "Regexp
Search".  I didn't mean to suggest this information should not be
divided into separate sections.

My proposal is to include "how to use them" inside the chapter called
"Regular Expressions".  Is there any reason not to do that?

(FWIW, I believe most other programming languages do it like that.)

Best regards,
Stefan Kangas

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Fri, 08 May 2020 02:50:02 GMT) Full text and rfc822 format available.

Message #110 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: Stefan Kangas <stefan <at> marxist.se>
Cc: mattiase <at> acm.org, 41006 <at> debbugs.gnu.org, rtm443x <at> googlemail.com
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Thu, 07 May 2020 22:49:22 -0400

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > My proposal is to include "how to use them" inside the chapter called
  > "Regular Expressions".  Is there any reason not to do that?

Now we have

>   > * Regular Expressions::   Describing classes of strings.
>   > * Regexp Search::         Searching for a match for a regexp.

We could convert Regexp Search into a subsection under Regular
Expressions.  I don't see any harm in doing that.

The Regexp Search node could come before, or after, the
existing subsection, Regexp Functions.  Which would be better?

Does anyone object to this change?


Meanwhile, the node Regexp Search is 230 lines long.
Index entries pointing to such a long node are not
very helpful.  It would be good to subdivide that
(soon to be) subsection into several subsubsections.

This kind of splitting job calls for creativity,
perhaps reordering material in the node.

Splitting a node doesn't stop you from reading it as a whole.
If you visit in Info a node that has subnodes, 
you can read it and its subnodes sequentially
just by typing SPC repeatedly.

-- 
Dr Richard Stallman
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Fri, 08 May 2020 02:52:01 GMT) Full text and rfc822 format available.

Message #113 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: mattiase <at> acm.org, stefan <at> marxist.se, 41006 <at> debbugs.gnu.org,
 rtm443x <at> googlemail.com
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Thu, 07 May 2020 22:51:02 -0400

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > It doesn't follow that general use of regexps to
  > search/match ("how") can't usefully be introduced
  > before the full "what".

In an introductory manual, or introductory chapters at the beginning,
we sometimes do that.  However, in a reference manual, that would
make it hard for the reader to find the whole of any topic.

-- 
Dr Richard Stallman
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Fri, 08 May 2020 06:49:02 GMT) Full text and rfc822 format available.

Message #116 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: rms <at> gnu.org
Cc: mattiase <at> acm.org, stefan <at> marxist.se, 41006 <at> debbugs.gnu.org,
 rtm443x <at> googlemail.com
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Fri, 08 May 2020 09:47:46 +0300

> From: Richard Stallman <rms <at> gnu.org>
> Date: Thu, 07 May 2020 22:49:22 -0400
> Cc: mattiase <at> acm.org, 41006 <at> debbugs.gnu.org, rtm443x <at> googlemail.com
> 
>   > My proposal is to include "how to use them" inside the chapter called
>   > "Regular Expressions".  Is there any reason not to do that?
> 
> Now we have
> 
> >   > * Regular Expressions::   Describing classes of strings.
> >   > * Regexp Search::         Searching for a match for a regexp.
> 
> We could convert Regexp Search into a subsection under Regular
> Expressions.  I don't see any harm in doing that.

Let's first decide whether we are talking about the Emacs user manual,
the ELisp reference manual, or both.  The current organization of this
stuff is slightly different in each one.  The OP meant the user
manual, AFAIU.

To answer the specific question you asked: this is all part of a
chapter called "Searching and Replacement" in the user manual and
"Searching and Matching" in the ELisp manual.  So having there a
section called "Regular Expressions" which would include a subsection
about regexp search makes less sense to me than the other way around:
have a section "Regular Expression Search" which would start with
syntax of regexps and go on to a subsection that describes the regexp
search facilities (it should then probably include the "POSIX Regexps"
section as well).

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Fri, 08 May 2020 10:06:01 GMT) Full text and rfc822 format available.

Message #119 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: jan <rtm443x <at> googlemail.com>
To: rms <at> gnu.org
Cc: mattiase <at> acm.org, Stefan Kangas <stefan <at> marxist.se>, 41006 <at> debbugs.gnu.org
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Fri, 8 May 2020 11:04:58 +0100

> Now we have
>
>>   > * Regular Expressions::   Describing classes of strings.
>>   > * Regexp Search::         Searching for a match for a regexp.
>
> We could convert Regexp Search into a subsection under Regular
> Expressions.  I don't see any harm in doing that.

I don't know if this is relevant, but I'd *mentally* place  Regexp
Search as a subtype of Search, from a purely classification POV.

> Splitting a node doesn't stop you from reading it as a whole.
> If you visit in Info a node that has subnodes,
> you can read it and its subnodes sequentially
> just by typing SPC repeatedly.

Hmm. I'd forgotten you could do that.
But crucially being able to do so just by itself does not solve the
'marooned on one island' problem.The user has to have some way of
knowing there is a *more*. It has to be very clear somehow that there
*is* more.
Otherwise, yup.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Fri, 08 May 2020 10:11:01 GMT) Full text and rfc822 format available.

Message #122 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Eli Zaretskii <eliz <at> gnu.org>, rms <at> gnu.org
Cc: mattiase <at> acm.org, 41006 <at> debbugs.gnu.org, rtm443x <at> googlemail.com
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Fri, 8 May 2020 03:10:45 -0700

Eli Zaretskii <eliz <at> gnu.org> writes:

>> Now we have
>>
>> >   > * Regular Expressions::   Describing classes of strings.
>> >   > * Regexp Search::         Searching for a match for a regexp.
>>
>> We could convert Regexp Search into a subsection under Regular
>> Expressions.  I don't see any harm in doing that.

I'm glad to hear that.

> Let's first decide whether we are talking about the Emacs user manual,
> the ELisp reference manual, or both.  The current organization of this
> stuff is slightly different in each one.  The OP meant the user
> manual, AFAIU.

Yes, I brought up the ELisp manual.  Sorry to bring this related issue
into this discussion without clearly marking it as such.

> To answer the specific question you asked: this is all part of a
> chapter called "Searching and Replacement" in the user manual and
> "Searching and Matching" in the ELisp manual.  So having there a
> section called "Regular Expressions" which would include a subsection
> about regexp search makes less sense to me than the other way around:
> have a section "Regular Expression Search" which would start with
> syntax of regexps and go on to a subsection that describes the regexp
> search facilities (it should then probably include the "POSIX Regexps"
> section as well).

I see your point here.

In the user manual, perhaps we could re-organize what we have now
(excluding other subsections for the sake of brevity):

* Searching and Replacement
** Regexp Search::             Search for match for a regexp.
** Regexps::                   Syntax of regular expressions.
** Regexp Backslash::          Regular expression constructs starting with ‘\’.
** Regexp Example::            A complex regular expression explained.

Into something like:

* Searching and Replacement
** Regular Expression Search   Search for match for a regexp.
*** Regexp Syntax::             Syntax of regular expressions.
*** Regexp Backslash::          Regular expression constructs starting with ‘\’.
*** Regexp Example::            A complex regular expression explained.

I think one needs to look at it in the context of the user manual, and
what the Info node `(emacs) Search' looks like to fully understand the
merit of your argument.  Every other subheading in that chapter is
called something with "<foo> Search" (besides one, which is called
"Replace").

---

In the ELisp manual, I think it could fine to have a section called
simply "Regular Expressions".  Either the user already knows about
regexps, in which case we have no problem, or the user does not know but
will soon find out.

I argue this only because I find the shorter name more elegant.  This is
a minor stylistic point, though, and I'm fine with "Regular Expression
Search" there, too.

In any case, it looks like it needs a bit more work.  There are actually
bits and pieces about regular expressions spread out in the chapter
`(elisp) Searching and Matching':

* Regular Expressions::   Describing classes of strings.
* Regexp Search::         Searching for a match for a regexp.
* POSIX Regexps::         Searching POSIX-style for the longest match.
* Match Data::            Finding out which part of the text matched,
                            after a string or regexp search.
* Search and Replace::    Commands that loop, searching and replacing.
* Standard Regexps::      Useful regexps for finding sentences, pages,...

Moving "Regexp Search" into a more general section called "Regular
Expressions" is a step in the right direction.  But then comes the
problem with finding match data, which is part of the "Match Data"
section, or using `re-search-forward', which is in the "Search and
Replace" section.  I have found all this information to be hard to find
in the past.

Maybe we can take a small first step here.  But perhaps this section
needs a bigger rethink?

Best regards,
Stefan Kangas

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Fri, 08 May 2020 10:32:01 GMT) Full text and rfc822 format available.

Message #125 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Kangas <stefan <at> marxist.se>
Cc: mattiase <at> acm.org, rms <at> gnu.org, 41006 <at> debbugs.gnu.org,
 rtm443x <at> googlemail.com
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Fri, 08 May 2020 13:31:28 +0300

> From: Stefan Kangas <stefan <at> marxist.se>
> Date: Fri, 8 May 2020 03:10:45 -0700
> Cc: mattiase <at> acm.org, 41006 <at> debbugs.gnu.org, rtm443x <at> googlemail.com
> 
> In the user manual, perhaps we could re-organize what we have now
> (excluding other subsections for the sake of brevity):
> 
> * Searching and Replacement
> ** Regexp Search::             Search for match for a regexp.
> ** Regexps::                   Syntax of regular expressions.
> ** Regexp Backslash::          Regular expression constructs starting with ‘\’.
> ** Regexp Example::            A complex regular expression explained.
> 
> Into something like:
> 
> * Searching and Replacement
> ** Regular Expression Search   Search for match for a regexp.
> *** Regexp Syntax::             Syntax of regular expressions.
> *** Regexp Backslash::          Regular expression constructs starting with ‘\’.
> *** Regexp Example::            A complex regular expression explained.

Is "Regular Expression Search" just a parent node of its children, or
does it have content of its own (apart of the menu)?  If the former,
we need a subsection about searching with regexps, after we describe
the regular expressions.  If the latter, then do you suggest to
describe the search before describing the regexps themselves?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Fri, 08 May 2020 16:50:02 GMT) Full text and rfc822 format available.

Message #128 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: rms <at> gnu.org, Stefan Kangas <stefan <at> marxist.se>
Cc: mattiase <at> acm.org, 41006 <at> debbugs.gnu.org, rtm443x <at> googlemail.com
Subject: RE: bug#41006: 26.3; regular expressions documentation
Date: Fri, 8 May 2020 09:49:11 -0700 (PDT)

>   > My proposal is to include "how to use them" inside the chapter
>   > called "Regular Expressions".  Is there any reason not to do that?
> 
> Now we have
> 
> * Regular Expressions::   Describing classes of strings.
> * Regexp Search::         Searching for a match for a regexp.
> 
> We could convert Regexp Search into a subsection under Regular
> Expressions.  I don't see any harm in doing that.

I too agreed with that, and added:

  In that case, in the section on search there'd need
  to be some mention the ability to do regexp search,
  and xrefs to the relevant "use" sections of the
  regexp doc.

IOW, if regexp search is under regexps then the (other)
search section needs to link to the regexp-search topic.

> The Regexp Search node could come before, or after, the
> existing subsection, Regexp Functions.  Which would be better?

Suggestion:

1. Some high-level description of regexps - an idea
   of what they are and what you can use them for.
2. How to use them to match/search: regexp search.
3. Details of what they are - syntax, etc.

But the order between 2 & 3 could be the opposite.
What's important is for #1 to introduce both 2 & 3.

And of course links among topics, for people who
land on one of them out of the blue (e.g. by
googling and getting to the HTML manual on the web).

> Does anyone object to this change?

(Not I.)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Fri, 08 May 2020 18:18:02 GMT) Full text and rfc822 format available.

Message #131 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: mattiase <at> acm.org, rms <at> gnu.org, 41006 <at> debbugs.gnu.org,
 rtm443x <at> googlemail.com
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Fri, 8 May 2020 11:17:50 -0700

Eli Zaretskii <eliz <at> gnu.org> writes:

>> * Searching and Replacement
>> ** Regular Expression Search   Search for match for a regexp.
>> *** Regexp Syntax::             Syntax of regular expressions.
>> *** Regexp Backslash::          Regular expression constructs starting with ‘\’.
>> *** Regexp Example::            A complex regular expression explained.
>
> Is "Regular Expression Search" just a parent node of its children, or
> does it have content of its own (apart of the menu)?

I can imagine both ways working, but I think my preference would be to
have a brief introduction.

> If the former, we need a subsection about searching with regexps,
> after we describe the regular expressions.

Agreed.

> If the latter, then do you suggest to describe the search before
> describing the regexps themselves?

For the user manual, I actually like that the commands are introduced
early in `(emacs) Regexp Search', which is what users are looking for.
Then we can explain the details of how to use them.

Best regards,
Stefan Kangas

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Fri, 08 May 2020 18:48:01 GMT) Full text and rfc822 format available.

Message #134 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Kangas <stefan <at> marxist.se>
Cc: mattiase <at> acm.org, rms <at> gnu.org, 41006 <at> debbugs.gnu.org,
 rtm443x <at> googlemail.com
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Fri, 08 May 2020 21:47:30 +0300

> From: Stefan Kangas <stefan <at> marxist.se>
> Date: Fri, 8 May 2020 11:17:50 -0700
> Cc: rms <at> gnu.org, mattiase <at> acm.org, 41006 <at> debbugs.gnu.org, 
> 	rtm443x <at> googlemail.com
> 
> > If the latter, then do you suggest to describe the search before
> > describing the regexps themselves?
> 
> For the user manual, I actually like that the commands are introduced
> early in `(emacs) Regexp Search', which is what users are looking for.
> Then we can explain the details of how to use them.

I think this will be suboptimal.  The chapter (or the whole manual) is
either read in its entirety (if the reader is new to the topic), or is
used as a reference.  For the former use case, we should introduce
regular expressions before we explain how to use them, otherwise the
text will be confusing (see below).  For the latter use case, the
order is entirely immaterial, since the readers are supposed to use
index-search to get directly where the subject they are looking for is
described.

IME, it sometimes helps to describe usage of something before
explaining what that something is, but only if the latter follows very
closely in the footsteps of the former.  Something like this (to take
an example of something I used just today):

  In buffers where Font Lock mode is enabled, patterns are highlighted
  using font lock.  In buffers where Font Lock mode is disabled,
  patterns are applied using overlays; in this case, the highlighting
  will not be updated as you type.  The Font Lock mode is considered
  "enabled" in a buffer if its `major-mode' causes
  `font-lock-specified-p' to return non-nil, which means the major
  mode specifies support for Font Lock.

This is not the case in the regexp search description case: there, we
will have a long list of search commands, and the reader will read all
that and all the time wonder what the heck is that "regular
expression" thing and how to write one.  When I bump into text that
doesn't explain an unusual term a sentence or two after it is first
used, I become annoyed and scan quickly forward until I find the
definition; only then I can continue reading.  My recommendation is to
avoid such "delayed" definitions.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Fri, 08 May 2020 20:10:02 GMT) Full text and rfc822 format available.

Message #137 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: mattiase <at> acm.org, rms <at> gnu.org, 41006 <at> debbugs.gnu.org,
 rtm443x <at> googlemail.com
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Fri, 8 May 2020 13:09:08 -0700

Eli Zaretskii <eliz <at> gnu.org> writes:

> I think this will be suboptimal.  [...]

OK, you have convinced me.

Best regards,
Stefan Kangas

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Sat, 09 May 2020 03:49:02 GMT) Full text and rfc822 format available.

Message #140 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: Stefan Kangas <stefan <at> marxist.se>
Cc: mattiase <at> acm.org, 41006 <at> debbugs.gnu.org, drew.adams <at> oracle.com,
 rtm443x <at> googlemail.com
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Fri, 08 May 2020 23:48:08 -0400

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > IOW, separate using regexps (searching and matching)
  > > from details about what regexps are and what their
  > > syntax is.

  > This is the exact opposite of what I proposed, I think.

  > I argued that keeping them together would be more user-friendly and
  > pedagogical.  Could you expand on why keeping them separate is better,
  > in your opinion?

I have a hunch that you are miscommunicating using "separate" and
"together" in slightly different ways, and that you don't really
disagree much.

-- 
Dr Richard Stallman
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Sat, 09 May 2020 03:55:01 GMT) Full text and rfc822 format available.

Message #143 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: mattiase <at> acm.org, stefan <at> marxist.se, 41006 <at> debbugs.gnu.org,
 rtm443x <at> googlemail.com
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Fri, 08 May 2020 23:53:58 -0400

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > 1. Some high-level description of regexps - an idea
  >    of what they are and what you can use them for.
  > 2. How to use them to match/search: regexp search.
  > 3. Details of what they are - syntax, etc.

This suggestion is rather vague because it doesn't relate to
the node structure.  The node structure is crucial for a manual in Info.

The overall node for regexps, Regular Expressions, should start with a
brief introduction.  (Every node that has subnodes should start with a
brief introduction.)  It could be as much as 10 or 12 lines.

But the subnode menu should fit on the screen, so more than 10 or 12 lines
is too many.

The node intro normally does NOT start to teach those topics; it only
says what the topics are.  We can't describe very much of regexp
syntax there.

It could say a little bit about some very simple regexps, but no more.

So the first subnodes need to describe how to write a regexp.

I am thinking about the Lisp Manual here.

-- 
Dr Richard Stallman
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Sat, 09 May 2020 03:55:02 GMT) Full text and rfc822 format available.

Message #146 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: mattiase <at> acm.org, stefan <at> marxist.se, 41006 <at> debbugs.gnu.org,
 rtm443x <at> googlemail.com
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Fri, 08 May 2020 23:54:00 -0400

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

Once any bigger changes are made, please suggest links anywhere.
It will be easy to install them then.

-- 
Dr Richard Stallman
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Sat, 09 May 2020 03:57:01 GMT) Full text and rfc822 format available.

Message #149 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: jan <rtm443x <at> googlemail.com>
Cc: mattiase <at> acm.org, stefan <at> marxist.se, 41006 <at> debbugs.gnu.org
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Fri, 08 May 2020 23:56:00 -0400

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I don't know if this is relevant, but I'd *mentally* place  Regexp
  > Search as a subtype of Search, from a purely classification POV.

In the Lisp manual, regexps are in the Searching and Matching chapter,
so it is already in the right place.

  > But crucially being able to do so just by itself does not solve the
  > 'marooned on one island' problem.The user has to have some way of
  > knowing there is a *more*.

Unless you're at the end of the index, there is _always_ more.
SPC will take you through the whole manual if you keep typing it.

At some point SPC will take you to some other topic, and that tells
you there was no more of the the topic you were looking at.


-- 
Dr Richard Stallman
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41006; Package emacs. (Fri, 29 Apr 2022 12:23:01 GMT) Full text and rfc822 format available.

Message #152 received at 41006 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: jan <rtm443x <at> googlemail.com>
Cc: 41006 <at> debbugs.gnu.org
Subject: Re: bug#41006: 26.3; regular expressions documentation
Date: Fri, 29 Apr 2022 14:22:36 +0200

jan <rtm443x <at> googlemail.com> writes:

> 1. Suggest emacs' excellent documentation should not distinguish between
> Regexps and Regexp Backslash in the manual.
> That is, these 2 should be combined:
>
>   * Regexps::                   Syntax of regular expressions.
>   * Regexp Backslash::          Regular expression constructs starting with ‘\’.
>
> AFAICS the difference is purely arbitrary.
> There have been times I've looked for the syntax for such syntax and
> found it only because I knew it was there, but not in the same
> section. A beginner might conclude emacs doesn't support them.

(I'm going through old bug reports that unfortunately weren't resolved
at the time.)

Skimming this bug thread and the two nodes in the Emacs user manual, I
don't really see much to improve here.  The Emacs user manual is there
for users to read to introduce them to concepts and guide them -- it's
not a reference manual to look up things like this.  Reading those two
chapters, it seems like they do that quite well.

So I don't think we want to rearrange these chapters, and I'm therefore
closing this bug report.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

bug closed, send any further explanations to 41006 <at> debbugs.gnu.org and jan <rtm443x <at> googlemail.com> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Fri, 29 Apr 2022 12:23:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 28 May 2022 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 36 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #41006 26.3; regular expressions documentation

GNU bug report logs - #41006
26.3; regular expressions documentation