GNU bug report logs -
#55315
[elpa/csv-mode] [PATCH] CSV separator guessing
Previous Next
Reported by: Simen Heggestøyl <simenheg <at> runbox.com>
Date: Sun, 8 May 2022 14:13:02 UTC
Severity: normal
Tags: patch
Done: Simen Heggestøyl <simenheg <at> runbox.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 55315 in the body.
You can then email your comments to 55315 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#55315
; Package
emacs
.
(Sun, 08 May 2022 14:13:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Simen Heggestøyl <simenheg <at> runbox.com>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Sun, 08 May 2022 14:13:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi.
Attached is a proposed patch to csv-mode.el in GNU ELPA which adds CSV
separator guessing functionality to CSV mode.
It adds two new commands: `csv-guess-set-separator' that automatically
guesses and sets the CSV separator of the current buffer, and
`csv-set-separator' for setting it manually.
The idea is that `csv-guess-set-separator' can be useful to add to the
mode hook to have CSV mode guess and set the separator automatically
when visiting a buffer:
(add-hook 'csv-mode-hook 'csv-guess-set-separator)
Been using it myself for the past weeks and have been happy with it so
far.
[0001-Add-CSV-separator-guessing-functionality.patch (text/x-diff, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#55315
; Package
emacs
.
(Sun, 08 May 2022 17:57:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 55315 <at> debbugs.gnu.org (full text, mbox):
> + (setq csv-separator-chars (mapcar #'string-to-char value))
> + (let ((quoted-value (mapcar #'regexp-quote value)))
> + (setq csv--skip-chars (apply #'concat "^\n" quoted-value))
> + (setq csv-separator-regexp
> + (apply #'concat `("[" ,@quoted-value "]"))))
`regexp-quote` produces a regexp from a string literal, but what goes inside the square brackets is not a regexp -- the syntax rules are different. More specifically, other characters are special, and backslash does not quote anything.
To produce a regexp that matches one in a set of characters, try rx-to-string or regexp-opt. For example,
(setq csv-separator-regexp (rx-to-string `(or ,@csv-separator-chars) t))
The same applies to csv--skip-chars: this isn't a regexp either, but uses yet another syntax so regexp-quote is inappropriate here too. Easiest is to precede each char with a backslash since that always yields a correctly quoted character: "ABC" -> "\\A\\B\\C".
This is not a judgement on the rest of the patch which may be fine for all I know.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#55315
; Package
emacs
.
(Sun, 08 May 2022 19:32:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 55315 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Mattias Engdegård <mattiase <at> acm.org> writes:
>> + (setq csv-separator-chars (mapcar #'string-to-char value))
>> + (let ((quoted-value (mapcar #'regexp-quote value)))
>> + (setq csv--skip-chars (apply #'concat "^\n" quoted-value))
>> + (setq csv-separator-regexp
>> + (apply #'concat `("[" ,@quoted-value "]"))))
>
> `regexp-quote` produces a regexp from a string literal, but what goes
> inside the square brackets is not a regexp -- the syntax rules are
> different. More specifically, other characters are special, and
> backslash does not quote anything.
>
> To produce a regexp that matches one in a set of characters, try rx-to-string or regexp-opt. For example,
>
> (setq csv-separator-regexp (rx-to-string `(or ,@csv-separator-chars) t))
>
> The same applies to csv--skip-chars: this isn't a regexp either, but
> uses yet another syntax so regexp-quote is inappropriate here
> too. Easiest is to precede each char with a backslash since that
> always yields a correctly quoted character: "ABC" -> "\\A\\B\\C".
>
> This is not a judgement on the rest of the patch which may be fine for all I know.
Thanks Mattias.
Does it look better in the updated patch attached?
Note that `csv--skip-chars' and `csv-separator-regexp' are set in two
different places in the patch, the first time from a list of strings in
`csv-separators', and the second time from a single character in
`csv-set-separator'. Am I right in thinking that the use of
`regexp-quote' in the `csv-set-separator' case gives the right result?
-- Simen
[0001-Add-CSV-separator-guessing-functionality.patch (text/x-diff, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#55315
; Package
emacs
.
(Mon, 09 May 2022 09:38:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 55315 <at> debbugs.gnu.org (full text, mbox):
8 maj 2022 kl. 21.31 skrev Simen Heggestøyl <simenheg <at> runbox.com>:
> Am I right in thinking that the use of
> `regexp-quote' in the `csv-set-separator' case gives the right result?
Yes, I think so. `csv-set-separator` should probably escape the character in `csv--skip-chars`, however:
(setq-local csv--skip-chars (format "^\n%c" sep))
should be
(setq-local csv--skip-chars (format "^\n\\%c" sep))
I'm not sure if a separator can be chosen that needs escaping here but better be safe; who knows how the code will be used.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#55315
; Package
emacs
.
(Mon, 09 May 2022 11:05:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 55315 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Mattias Engdegård <mattiase <at> acm.org> writes:
> 8 maj 2022 kl. 21.31 skrev Simen Heggestøyl <simenheg <at> runbox.com>:
>
>> Am I right in thinking that the use of
>> `regexp-quote' in the `csv-set-separator' case gives the right result?
>
> Yes, I think so. `csv-set-separator` should probably escape the character in `csv--skip-chars`, however:
>
> (setq-local csv--skip-chars (format "^\n%c" sep))
>
> should be
>
> (setq-local csv--skip-chars (format "^\n\\%c" sep))
>
> I'm not sure if a separator can be chosen that needs escaping here but
> better be safe; who knows how the code will be used.
Ah, thanks, I misread the docstring of `skip-chars-forward':
(but not at the end of a range; quoting is never needed there)
I somehow misinterpreted that as quoting not being necessary at the end
of the string fed to `skip-chars-forward'.
Updated patch with your proposed fix attached.
[0001-Add-CSV-separator-guessing-functionality.patch (text/x-diff, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#55315
; Package
emacs
.
(Mon, 09 May 2022 11:30:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 55315 <at> debbugs.gnu.org (full text, mbox):
9 maj 2022 kl. 13.03 skrev Simen Heggestøyl <simenheg <at> runbox.com>:
> Updated patch with your proposed fix attached.
Thanks, looks fine with respect to the regexp and skip-set generation.
For the remainder of the patch (the vast bulk) you are probably more qualified to judge!
By the way, thanks for the reference to the CSV wrangling paper.
Reply sent
to
Simen Heggestøyl <simenheg <at> runbox.com>
:
You have taken responsibility.
(Thu, 12 May 2022 20:00:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Simen Heggestøyl <simenheg <at> runbox.com>
:
bug acknowledged by developer.
(Thu, 12 May 2022 20:00:02 GMT)
Full text and
rfc822 format available.
Message #25 received at 55315-done <at> debbugs.gnu.org (full text, mbox):
Mattias Engdegård <mattiase <at> acm.org> writes:
> 9 maj 2022 kl. 13.03 skrev Simen Heggestøyl <simenheg <at> runbox.com>:
>
>> Updated patch with your proposed fix attached.
>
> Thanks, looks fine with respect to the regexp and skip-set generation.
Good! Thanks for taking another look. I've merged the patch.
-- Simen
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 10 Jun 2022 11:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 1 year and 292 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.