GNU bug report logs - #41706
26.1; sort-subr predicate cannot be set successfully

Previous Next

Package: emacs;

Reported by: Marvin Gülker <post+ebugs <at> guelker.eu>

Date: Thu, 4 Jun 2020 11:10:01 UTC

Severity: minor

Found in version 26.1

To reply to this bug, email your comments to 41706 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#41706; Package emacs. (Thu, 04 Jun 2020 11:10:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Marvin Gülker <post+ebugs <at> guelker.eu>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 04 Jun 2020 11:10:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Marvin Gülker <post+ebugs <at> guelker.eu>
To: bug-gnu-emacs <at> gnu.org
Subject: 26.1; sort-subr predicate cannot be set successfully
Date: Thu, 4 Jun 2020 13:09:22 +0200
Dear maintainers,

I was trying to define a variant of `sort-lines' which sorts lines 
according to the current locale's rules, i.e. instead of comparing the 
strings with the default (as the docs of the underlying function 
`sort-subr' say) predicate of `string<' they should be compared with 
`string-collate-lessp'. I copied the source code of `sort-lines' and 
adapted the call to `sort-subr' to use `string-collate-lessp', resulting 
in this function:

(defun my-sort-lines-collate (reverse beg end)
 (interactive "P\nr")
 (save-excursion
   (save-restriction
     (narrow-to-region beg end)
     (goto-char (point-min))
     (let ;; To make `end-of-line' and etc. to ignore fields.
	  ((inhibit-field-text-motion t))
	(sort-subr reverse 'forward-line 'end-of-line nil nil 'string-collate-lessp)))))

This function, when called, only results in an error "Wrong type 
argument: stringp" (tested in `emacs -Q'). Interestingly, if the 6th 
argument to `sort-subr' is changed to `string<' (which is said to be the 
default value!) the very same error results.

It does not make a difference to explicitely define it with a lambda 
instead:

(defun my-sort-lines-collate (reverse beg end)
 (interactive "P\nr")
 (save-excursion
   (save-restriction
     (narrow-to-region beg end)
     (goto-char (point-min))
     (let ;; To make `end-of-line' and etc. to ignore fields.
	  ((inhibit-field-text-motion t))
	(sort-subr reverse 'forward-line 'end-of-line nil nil (lambda (a b) (string-collate-lessp a b)))))))

This code results in the same error.

I assume this is a kind of bug. Either way, I'd like to request as a 
feature a variant of `sort-lines' that is aware of the locale's sorting 
rules.


In GNU Emacs 26.1 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.24.5)
of 2019-09-23, modified by Debian built on x86-grnet-01
Windowing system distributor 'The X.Org Foundation', version 11.0.12004000
System Description:	Debian GNU/Linux 10 (buster)

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
Mark set
my-sort-lines-collate
Mark set
Wrong type argument: stringp, (193 . 215)
Configured using:
'configure --build x86_64-linux-gnu --prefix=/usr
--sharedstatedir=/var/lib --libexecdir=/usr/lib
--localstatedir=/var/lib --infodir=/usr/share/info
--mandir=/usr/share/man --enable-libsystemd --with-pop=yes
--enable-locallisppath=/etc/emacs:/usr/local/share/emacs/26.1/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/26.1/site-lisp:/usr/share/emacs/site-lisp
--with-sound=alsa --without-gconf --with-mailutils --build
x86_64-linux-gnu --prefix=/usr --sharedstatedir=/var/lib
--libexecdir=/usr/lib --localstatedir=/var/lib
--infodir=/usr/share/info --mandir=/usr/share/man --enable-libsystemd
--with-pop=yes
--enable-locallisppath=/etc/emacs:/usr/local/share/emacs/26.1/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/26.1/site-lisp:/usr/share/emacs/site-lisp
--with-sound=alsa --without-gconf --with-mailutils --with-x=yes
--with-x-toolkit=gtk3 --with-toolkit-scroll-bars 'CFLAGS=-g -O2
-fdebug-prefix-map=/build/emacs-StqULU/emacs-26.1+1=. -fstack-protector-strong
-Wformat -Werror=format-security -Wall' 'CPPFLAGS=-Wdate-time
-D_FORTIFY_SOURCE=2' LDFLAGS=-Wl,-z,relro'

Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GSETTINGS NOTIFY
ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 THREADS LIBSYSTEMD LCMS2

Important settings:
 value of $LANG: de_DE.UTF-8
 locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
 tooltip-mode: t
 global-eldoc-mode: t
 eldoc-mode: t
 electric-indent-mode: t
 mouse-wheel-mode: t
 tool-bar-mode: t
 menu-bar-mode: t
 file-name-shadow-mode: t
 global-font-lock-mode: t
 font-lock-mode: t
 blink-cursor-mode: t
 auto-composition-mode: t
 auto-encryption-mode: t
 auto-compression-mode: t
 line-number-mode: t
 transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow mail-extr emacsbug message rmc puny seq byte-opt gv bytecomp
byte-compile cconv cl-loaddefs cl-lib dired dired-loaddefs format-spec
rfc822 mml easymenu mml-sec password-cache epa derived epg epg-config
gnus-util rmail rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse
rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045
ietf-drums mm-util mail-prsvr mail-utils sort elec-pair time-date
mule-util tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar
dnd fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode elisp-mode lisp-mode prog-mode register page menu-bar
rfn-eshadow isearch timer select scroll-bar mouse jit-lock font-lock
syntax facemenu font-core term/tty-colors frame cl-generic cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote dbusbind inotify lcms2
dynamic-setting system-font-setting font-render-setting move-toolbar gtk
x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 95777 7946)
(symbols 48 20407 1)
(miscs 40 49 118)
(strings 32 28381 1410)
(string-bytes 1 744119)
(vectors 16 14643)
(vector-slots 8 496828 7804)
(floats 8 50 209)
(intervals 56 304 7)
(buffers 992 11))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41706; Package emacs. (Thu, 04 Jun 2020 12:12:02 GMT) Full text and rfc822 format available.

Message #8 received at 41706 <at> debbugs.gnu.org (full text, mbox):

From: Michael Heerdegen <michael_heerdegen <at> web.de>
To: Marvin Gülker <post+ebugs <at> guelker.eu>
Cc: 41706 <at> debbugs.gnu.org
Subject: Re: bug#41706: 26.1; sort-subr predicate cannot be set successfully
Date: Thu, 04 Jun 2020 14:11:33 +0200
Marvin Gülker <post+ebugs <at> guelker.eu> writes:

> I copied the source code of `sort-lines' and
> adapted the call to `sort-subr' to use `string-collate-lessp',
> resulting in this function:
>
> (defun my-sort-lines-collate (reverse beg end)
>  (interactive "P\nr")
>  (save-excursion
>    (save-restriction
>      (narrow-to-region beg end)
>      (goto-char (point-min))
>      (let ;; To make `end-of-line' and etc. to ignore fields.
> 	  ((inhibit-field-text-motion t))
> 	(sort-subr reverse 'forward-line 'end-of-line nil nil 'string-collate-lessp)))))
>
> This function, when called, only results in an error "Wrong type
> argument: stringp" (tested in `emacs -Q')

> Wrong type argument: stringp, (193 . 215)

I think you use it wrong: the keys are not strings but cons cells.  So
the predicate function should be something like
`compare-buffer-substrings'.

I guess it would be better to specify a STARTKEYFUN to return a value to
be used as the key.

Michael.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41706; Package emacs. (Thu, 04 Jun 2020 19:06:01 GMT) Full text and rfc822 format available.

Message #11 received at 41706 <at> debbugs.gnu.org (full text, mbox):

From: Marvin Gülker <post+ebugs <at> guelker.eu>
To: Michael Heerdegen <michael_heerdegen <at> web.de>
Cc: 41706 <at> debbugs.gnu.org
Subject: Re: bug#41706: 26.1; sort-subr predicate cannot be set successfully
Date: Thu, 4 Jun 2020 21:05:26 +0200
Am 04. Juni 2020 um 14:11 Uhr +0200 schrieb Michael Heerdegen:
>I think you use it wrong: the keys are not strings but cons cells.  So
>the predicate function should be something like
>`compare-buffer-substrings'.

You are correct. Your comment put me on the right track (but see below). 
This version works and sorts the lines according to the locale's rules:

   (defun my-sort-lines-collate (reverse beg end)
    (interactive "P\nr")
    (save-excursion
      (save-restriction
        (narrow-to-region beg end)
        (goto-char (point-min))
        (let ;; To make `end-of-line' and etc. to ignore fields.
            ((inhibit-field-text-motion t))
          (sort-subr
           reverse 'forward-line 'end-of-line nil nil
           (lambda (a b) (string-collate-lessp (buffer-substring (car a) (cdr a)) (buffer-substring (car b) (cdr b)))))))))

I have broken it slightly different for readability. The change I did was 
to use `buffer-substring' in the predicate lambda. It was not clear to me 
that the arguments given to the predicate function are actually buffer 
positions and not the buffer substrings themselves as I expected. This 
is not explained in the manual (section § 32.15) on `sort-subr').

>To understand how ‘sort-subr’ works, consider the whole accessible
>portion of the buffer as being divided into disjoint pieces called
>“sort records”.  The records may or may not be contiguous, but they
>must not overlap.  A portion of each sort record (perhaps all of
>it) is designated as the sort key. [...]
>The argument PREDICATE is the function to use to compare keys.  If
>keys are numbers, it defaults to ‘<’; otherwise it defaults to
>string<’.

No further mention of how to use PREDICATE is made, and the manual 
continues with reproducing the source code of `sort-lines'. I took that 
section to mean that the PREDICATE is in the default case `<' and 
with strings `string<'. From the manual's text it became not clear to me 
under which condition the keys are not strings, and I assumed they will 
normally always be strings as buffer substrings are always strings.

Now I see that the documentation for the `sort-subr' function is a 
little more specific, but still it doesn't really make clear which kind 
of "key" is received as arguments in the PREDICATE under which 
conditions. The call above appearently yields cons cell keys whose CAR 
and CDR correspond to buffer positions, but the documentation makes me 
suspect that calling `sort-subr' differently will yield other kinds of 
keys.

I would suggest to amend the manual to be more precise about the 
arguments that PREDICATE receives, and perhaps add an example.

Thank you for your explanation.

-- 
Blog: https://mg.guelker.eu




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41706; Package emacs. (Fri, 05 Jun 2020 08:34:01 GMT) Full text and rfc822 format available.

Message #14 received at 41706 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Marvin Gülker <post+ebugs <at> guelker.eu>
Cc: michael_heerdegen <at> web.de, 41706 <at> debbugs.gnu.org
Subject: Re: bug#41706: 26.1; sort-subr predicate cannot be set successfully
Date: Fri, 05 Jun 2020 11:32:51 +0300
> Date: Thu, 4 Jun 2020 21:05:26 +0200
> From: Marvin Gülker <post+ebugs <at> guelker.eu>
> Cc: 41706 <at> debbugs.gnu.org
> 
> >To understand how ‘sort-subr’ works, consider the whole accessible
> >portion of the buffer as being divided into disjoint pieces called
> >“sort records”.  The records may or may not be contiguous, but they
> >must not overlap.  A portion of each sort record (perhaps all of
> >it) is designated as the sort key. [...]
> >The argument PREDICATE is the function to use to compare keys.  If
> >keys are numbers, it defaults to ‘<’; otherwise it defaults to
> >string<’.
> 
> No further mention of how to use PREDICATE is made, and the manual 
> continues with reproducing the source code of `sort-lines'. I took that 
> section to mean that the PREDICATE is in the default case `<' and 
> with strings `string<'. From the manual's text it became not clear to me 
> under which condition the keys are not strings, and I assumed they will 
> normally always be strings as buffer substrings are always strings.
> 
> Now I see that the documentation for the `sort-subr' function is a 
> little more specific, but still it doesn't really make clear which kind 
> of "key" is received as arguments in the PREDICATE under which 
> conditions. The call above appearently yields cons cell keys whose CAR 
> and CDR correspond to buffer positions, but the documentation makes me 
> suspect that calling `sort-subr' differently will yield other kinds of 
> keys.
> 
> I would suggest to amend the manual to be more precise about the 
> arguments that PREDICATE receives, and perhaps add an example.

It is hard to be specific about the arguments to PREDICATE, because
they depend on what the other arguments to sort-subr return.  I
augmented the manual to at least have the same information as the doc
string; more detail is only possible if we discuss specific values of
STARTKEYFUN and ENDKEYFUN.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41706; Package emacs. (Fri, 05 Jun 2020 09:44:01 GMT) Full text and rfc822 format available.

Message #17 received at 41706 <at> debbugs.gnu.org (full text, mbox):

From: Michael Heerdegen <michael_heerdegen <at> web.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41706 <at> debbugs.gnu.org,
 Marvin Gülker <post+ebugs <at> guelker.eu>
Subject: Re: bug#41706: 26.1; sort-subr predicate cannot be set successfully
Date: Fri, 05 Jun 2020 11:43:19 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

> It is hard to be specific about the arguments to PREDICATE, because
> they depend on what the other arguments to sort-subr return.  I
> augmented the manual to at least have the same information as the doc
> string; more detail is only possible if we discuss specific values of
> STARTKEYFUN and ENDKEYFUN.

But this sentence is not correct:

"It [i.e. STARTKEYFUN, added by me] may return either a non-nil value to
be used as the key, or else the key is the substring between the values
of point after STARTKEYFUN and ENDKEYFUN are called."

This is the sentence that misled the OP I think.  Could we be more
precise and tell that the key is actually the cons cell (BEG . END)
denoting these strings?


Michael.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41706; Package emacs. (Fri, 05 Jun 2020 11:44:02 GMT) Full text and rfc822 format available.

Message #20 received at 41706 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Michael Heerdegen <michael_heerdegen <at> web.de>
Cc: 41706 <at> debbugs.gnu.org, post+ebugs <at> guelker.eu
Subject: Re: bug#41706: 26.1; sort-subr predicate cannot be set successfully
Date: Fri, 05 Jun 2020 14:43:39 +0300
> From: Michael Heerdegen <michael_heerdegen <at> web.de>
> Cc: Marvin Gülker <post+ebugs <at> guelker.eu>,
>   41706 <at> debbugs.gnu.org
> Date: Fri, 05 Jun 2020 11:43:19 +0200
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > It is hard to be specific about the arguments to PREDICATE, because
> > they depend on what the other arguments to sort-subr return.  I
> > augmented the manual to at least have the same information as the doc
> > string; more detail is only possible if we discuss specific values of
> > STARTKEYFUN and ENDKEYFUN.
> 
> But this sentence is not correct:
> 
> "It [i.e. STARTKEYFUN, added by me] may return either a non-nil value to
> be used as the key, or else the key is the substring between the values
> of point after STARTKEYFUN and ENDKEYFUN are called."

I didn't change that part.

> This is the sentence that misled the OP I think.  Could we be more
> precise and tell that the key is actually the cons cell (BEG . END)
> denoting these strings?

Patches welcome.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41706; Package emacs. (Wed, 10 Jun 2020 19:00:03 GMT) Full text and rfc822 format available.

Message #23 received at 41706 <at> debbugs.gnu.org (full text, mbox):

From: Michael Heerdegen <michael_heerdegen <at> web.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41706 <at> debbugs.gnu.org, post+ebugs <at> guelker.eu
Subject: Re: bug#41706: 26.1; sort-subr predicate cannot be set successfully
Date: Wed, 10 Jun 2020 20:59:30 +0200
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:

> > This is the sentence that misled the OP I think.  Could we be more
> > precise and tell that the key is actually the cons cell (BEG . END)
> > denoting these strings?
>
> Patches welcome.

Something like this (maybe enhanced using better English):

[subr.patch (text/x-diff, inline)]
diff --git a/lisp/sort.el b/lisp/sort.el
index de0e1b9519..a87f33eaee 100644
--- a/lisp/sort.el
+++ b/lisp/sort.el
@@ -67,11 +67,12 @@ sort-subr
 ENDRECFUN is called with point within the record.
 It should move point to the end of the record.

-STARTKEYFUN moves from the start of the record to the start of the key.
-It may return either a non-nil value to be used as the key, or
-else the key is the substring between the values of point after
-STARTKEYFUN and ENDKEYFUN are called.  If STARTKEYFUN is nil, the key
-starts at the beginning of the record.
+STARTKEYFUN moves from the start of the record to the start of
+the key.  It may return either a non-nil value to be used as the
+key, or else the key is a cons (BEG . END) designating the
+substring of the record between the values of point after
+STARTKEYFUN and ENDKEYFUN are called.  If STARTKEYFUN is nil, the
+key starts at the beginning of the record.

 ENDKEYFUN moves from the start of the sort key to the end of the sort key.
 ENDKEYFUN may be nil if STARTKEYFUN returns a value or if it would be the
[Message part 3 (text/plain, inline)]
Then the doc should already list all possible key formats AFAIU.

BTW, what about the suggestion to support collation order out of the
box?


Thanks,

Michael.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41706; Package emacs. (Wed, 10 Jun 2020 19:09:02 GMT) Full text and rfc822 format available.

Message #26 received at 41706 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Michael Heerdegen <michael_heerdegen <at> web.de>
Cc: 41706 <at> debbugs.gnu.org, post+ebugs <at> guelker.eu
Subject: Re: bug#41706: 26.1; sort-subr predicate cannot be set successfully
Date: Wed, 10 Jun 2020 22:08:21 +0300
> From: Michael Heerdegen <michael_heerdegen <at> web.de>
> Cc: post+ebugs <at> guelker.eu,  41706 <at> debbugs.gnu.org
> Date: Wed, 10 Jun 2020 20:59:30 +0200
> 
> +STARTKEYFUN moves from the start of the record to the start of
> +the key.  It may return either a non-nil value to be used as the
> +key, or else the key is a cons (BEG . END) designating the
> +substring of the record between the values of point after
> +STARTKEYFUN and ENDKEYFUN are called.  If STARTKEYFUN is nil, the
> +key starts at the beginning of the record.

That's a beginning, thanks.  But it needs more work, IMO:

 . the cons cell case hints on a possible return value of STARTKEYFUN,
   AFAIU, but the text doesn't say so
 . the case of STARTKEYFUN being nil is another use case for what
   PREDICATE must cope with, and the text should say so
 . the text about ENDKEYFUN should also be augmented, IMO
 . the description of PREDICATE should reference this text in some
   useful way

> BTW, what about the suggestion to support collation order out of the
> box?

What collation would you like to support, and in what form?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41706; Package emacs. (Thu, 11 Jun 2020 12:56:01 GMT) Full text and rfc822 format available.

Message #29 received at 41706 <at> debbugs.gnu.org (full text, mbox):

From: Michael Heerdegen <michael_heerdegen <at> web.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41706 <at> debbugs.gnu.org, post+ebugs <at> guelker.eu
Subject: Re: bug#41706: 26.1; sort-subr predicate cannot be set successfully
Date: Thu, 11 Jun 2020 14:55:14 +0200
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:

> That's a beginning, thanks.  But it needs more work, IMO:
>
>  . the cons cell case hints on a possible return value of STARTKEYFUN,
>    AFAIU, but the text doesn't say so
>  . the case of STARTKEYFUN being nil is another use case for what
>    PREDICATE must cope with, and the text should say so
>  . the text about ENDKEYFUN should also be augmented, IMO
>  . the description of PREDICATE should reference this text in some
>    useful way

I must admit I found the doc sufficient for all of these.  Actually,
reading the implementation might be simpler than describing it:

#+begin_src emacs-lisp
(setq key (catch 'key
		  (or (and startkeyfun (funcall startkeyfun))
		      ;; If key was not returned as value,
		      ;; move to end of key and get key from the buffer.
		      (let ((start (point)))
			(funcall (or endkeyfun
				     (prog1 endrecfun (setq done t))))
			(cons start (point))))))
#+end_src

What the doc is missing to mention is the `catch' - `sort-regexp-fields'
uses this feature for example.

Here is an attempt (not including the `catch' feature), anyway:

[sort-subr.patch (text/x-diff, inline)]
diff --git a/lisp/sort.el b/lisp/sort.el
index de0e1b9519..e22f062dbd 100644
--- a/lisp/sort.el
+++ b/lisp/sort.el
@@ -80,7 +80,15 @@ sort-subr
 PREDICATE, if non-nil, is the predicate function for comparing
 keys; it is called with two arguments, the keys to compare, and
 should return non-nil if the first key should sort before the
-second key.  If PREDICATE is nil, comparison is done with `<' if
+second key.  The key values PREDICATE is called with are the
+either the return values of STARTKEYFUN when that function is
+specified and returns a non-nil value.  In any other case the keys
+are cons cells of the form (BEG . END), where BEG is the value of
+point after calling STARTKEYFUN when given, else after calling
+ENDRECFUN, and END is the value of point after calling ENDKEYFUN when
+given, and after calling ENDRECFUN else.
+
+If PREDICATE is nil, comparison is done with `<' if
 the keys are numbers, with `compare-buffer-substrings' if the
 keys are cons cells (the car and cdr of each cons cell are taken
 as start and end positions), and with `string<' otherwise."
[Message part 3 (text/plain, inline)]
What I also would like to add to the docstring of this function, and of
that of `sort', is that the PREDICATE must be transitive and
antisymmetric - mentioning only in the manual is not enough IMHO.


> > BTW, what about the suggestion to support collation order out of the
> > box?
>
> What collation would you like to support, and in what form?

I don't know much about this stuff.  The canonical way from my ignorant
point of view would be that `compare-buffer-substrings' would not only
respect `case-fold-search' but also some other variable which would tell
how the behavior should be w.r.t. collation.

Michael.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41706; Package emacs. (Thu, 11 Jun 2020 13:41:02 GMT) Full text and rfc822 format available.

Message #32 received at 41706 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Michael Heerdegen <michael_heerdegen <at> web.de>
Cc: 41706 <at> debbugs.gnu.org, post+ebugs <at> guelker.eu
Subject: Re: bug#41706: 26.1; sort-subr predicate cannot be set successfully
Date: Thu, 11 Jun 2020 16:39:59 +0300
> From: Michael Heerdegen <michael_heerdegen <at> web.de>
> Cc: post+ebugs <at> guelker.eu,  41706 <at> debbugs.gnu.org
> Date: Thu, 11 Jun 2020 14:55:14 +0200
> 
> >  . the cons cell case hints on a possible return value of STARTKEYFUN,
> >    AFAIU, but the text doesn't say so
> >  . the case of STARTKEYFUN being nil is another use case for what
> >    PREDICATE must cope with, and the text should say so
> >  . the text about ENDKEYFUN should also be augmented, IMO
> >  . the description of PREDICATE should reference this text in some
> >    useful way
> 
> I must admit I found the doc sufficient for all of these.  Actually,
> reading the implementation might be simpler than describing it:

It is? if so, that's a clear sign that the description "needs work",
IME.

> --- a/lisp/sort.el
> +++ b/lisp/sort.el
> @@ -80,7 +80,15 @@ sort-subr
>  PREDICATE, if non-nil, is the predicate function for comparing
>  keys; it is called with two arguments, the keys to compare, and
>  should return non-nil if the first key should sort before the
> -second key.  If PREDICATE is nil, comparison is done with `<' if
> +second key.  The key values PREDICATE is called with are the
> +either the return values of STARTKEYFUN when that function is
> +specified and returns a non-nil value.

The last sentence is incomplete and/or needs some fixing, AFAICT.

>                                         In any other case the keys
> +are cons cells of the form (BEG . END), where BEG is the value of
> +point after calling STARTKEYFUN when given, else after calling
> +ENDRECFUN, and END is the value of point after calling ENDKEYFUN when
> +given, and after calling ENDRECFUN else.

This seems to contradict the following part, which seems to say that
the key arguments are not always cons cells:

> +If PREDICATE is nil, comparison is done with `<' if
>  the keys are numbers, with `compare-buffer-substrings' if the
>  keys are cons cells (the car and cdr of each cons cell are taken
>  as start and end positions), and with `string<' otherwise."

Am I missing something?

> What I also would like to add to the docstring of this function, and of
> that of `sort', is that the PREDICATE must be transitive and
> antisymmetric - mentioning only in the manual is not enough IMHO.

Fine with me, provided that you explain what those two attributes
mean.

> > > BTW, what about the suggestion to support collation order out of the
> > > box?
> >
> > What collation would you like to support, and in what form?
> 
> I don't know much about this stuff.  The canonical way from my ignorant
> point of view would be that `compare-buffer-substrings' would not only
> respect `case-fold-search' but also some other variable which would tell
> how the behavior should be w.r.t. collation.

The main problem with collation is that it's locale-specific, and
different C libraries implement collation for the same locale in
slightly different ways.  The result is that the sorted text may not
be the same even if you do that on two systems in the same locale.
How would you suggest to solve this issue?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41706; Package emacs. (Thu, 11 Jun 2020 14:00:02 GMT) Full text and rfc822 format available.

Message #35 received at 41706 <at> debbugs.gnu.org (full text, mbox):

From: Michael Heerdegen <michael_heerdegen <at> web.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41706 <at> debbugs.gnu.org, post+ebugs <at> guelker.eu
Subject: Re: bug#41706: 26.1; sort-subr predicate cannot be set successfully
Date: Thu, 11 Jun 2020 15:59:39 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

> > +second key.  The key values PREDICATE is called with are the
> > +either the return values of STARTKEYFUN when that function is
> > +specified and returns a non-nil value.
>
> The last sentence is incomplete and/or needs some fixing, AFAICT.

If you mean something else than in you subsequent comment, please
elaborate.

> >                                         In any other case the keys
> > +are cons cells of the form (BEG . END), where BEG is the value of
> > +point after calling STARTKEYFUN when given, else after calling
> > +ENDRECFUN, and END is the value of point after calling ENDKEYFUN when
> > +given, and after calling ENDRECFUN else.
>
> This seems to contradict the following part, which seems to say that
> the key arguments are not always cons cells:
>
> > +If PREDICATE is nil, comparison is done with `<' if
> >  the keys are numbers, with `compare-buffer-substrings' if the
> >  keys are cons cells (the car and cdr of each cons cell are taken
> >  as start and end positions), and with `string<' otherwise."
>
> Am I missing something?

Not really.  AFAIU the only case where this applies is when STARTKEYFUN
is specified and returns values of this type - else `sort-build-lists'
always generates conses.  The different defaults of PREDICATE depending
on some types of STARTKEYFUN seem to be predefined just for convenience.

> > What I also would like to add to the docstring of this function, and of
> > that of `sort', is that the PREDICATE must be transitive and
> > antisymmetric - mentioning only in the manual is not enough IMHO.
>
> Fine with me, provided that you explain what those two attributes
> mean.

Is it enough to relegate to the manual?

> The main problem with collation is that it's locale-specific, and
> different C libraries implement collation for the same locale in
> slightly different ways.  The result is that the sorted text may not
> be the same even if you do that on two systems in the same locale.
> How would you suggest to solve this issue?

I have no clue.  Is this a general requirement?  For `sort-subr' I think
the requirement is not so much absolute reproducibility (sorry if that
word doesn't exist) but that it is able to sort lexicographically in a
reasonably sensible way.

Michael.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41706; Package emacs. (Thu, 11 Jun 2020 16:37:01 GMT) Full text and rfc822 format available.

Message #38 received at 41706 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Michael Heerdegen <michael_heerdegen <at> web.de>
Cc: 41706 <at> debbugs.gnu.org, post+ebugs <at> guelker.eu
Subject: Re: bug#41706: 26.1; sort-subr predicate cannot be set successfully
Date: Thu, 11 Jun 2020 19:36:09 +0300
> From: Michael Heerdegen <michael_heerdegen <at> web.de>
> Cc: post+ebugs <at> guelker.eu,  41706 <at> debbugs.gnu.org
> Date: Thu, 11 Jun 2020 15:59:39 +0200
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > > +second key.  The key values PREDICATE is called with are the
> > > +either the return values of STARTKEYFUN when that function is
> > > +specified and returns a non-nil value.
> >
> > The last sentence is incomplete and/or needs some fixing, AFAICT.
> 
> If you mean something else than in you subsequent comment, please
> elaborate.

I meant that a sentence which says "either ..." is expected to say
"or ..." at some later point, and also that "are the either the return
values" doesn't sound correct English to me.

> > >                                         In any other case the keys
> > > +are cons cells of the form (BEG . END), where BEG is the value of
> > > +point after calling STARTKEYFUN when given, else after calling
> > > +ENDRECFUN, and END is the value of point after calling ENDKEYFUN when
> > > +given, and after calling ENDRECFUN else.
> >
> > This seems to contradict the following part, which seems to say that
> > the key arguments are not always cons cells:
> >
> > > +If PREDICATE is nil, comparison is done with `<' if
> > >  the keys are numbers, with `compare-buffer-substrings' if the
> > >  keys are cons cells (the car and cdr of each cons cell are taken
> > >  as start and end positions), and with `string<' otherwise."
> >
> > Am I missing something?
> 
> Not really.  AFAIU the only case where this applies is when STARTKEYFUN
> is specified and returns values of this type - else `sort-build-lists'
> always generates conses.

I think this should be stated in the doc string, not implied.

> > > What I also would like to add to the docstring of this function, and of
> > > that of `sort', is that the PREDICATE must be transitive and
> > > antisymmetric - mentioning only in the manual is not enough IMHO.
> >
> > Fine with me, provided that you explain what those two attributes
> > mean.
> 
> Is it enough to relegate to the manual?

Not IMO.

> > The main problem with collation is that it's locale-specific, and
> > different C libraries implement collation for the same locale in
> > slightly different ways.  The result is that the sorted text may not
> > be the same even if you do that on two systems in the same locale.
> > How would you suggest to solve this issue?
> 
> I have no clue.  Is this a general requirement?

It doesn't have to be a requirement, but if it isn't, then what is the
utility of an API that yields different results in different
environments?  What is/are the use case(s) you have in mind?

> For `sort-subr' I think
> the requirement is not so much absolute reproducibility (sorry if that
> word doesn't exist) but that it is able to sort lexicographically in a
> reasonably sensible way.

My point is that lexicographical sorting is locale dependent, and some
people might be surprised by seeing a sort after A, but before B.

However, if this is clearly documented, I guess it's up to the Lisp
programs that call such an API to deal with the consequences.  Bonus
points for allowing the caller to specify the locale, like
string-collate-lessp does.




This bug report was last modified 3 years and 327 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.