GNU bug report logs - #55315
[elpa/csv-mode] [PATCH] CSV separator guessing

Previous Next

Package: emacs;

Reported by: Simen Heggestøyl <simenheg <at> runbox.com>

Date: Sun, 8 May 2022 14:13:02 UTC

Severity: normal

Tags: patch

Done: Simen Heggestøyl <simenheg <at> runbox.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 55315 in the body.
You can then email your comments to 55315 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#55315; Package emacs. (Sun, 08 May 2022 14:13:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Simen Heggestøyl <simenheg <at> runbox.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 08 May 2022 14:13:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Simen Heggestøyl <simenheg <at> runbox.com>
To: bug-gnu-emacs <at> gnu.org
Subject: [elpa/csv-mode] [PATCH] CSV separator guessing
Date: Sun, 08 May 2022 16:12:00 +0200
[Message part 1 (text/plain, inline)]
Hi.

Attached is a proposed patch to csv-mode.el in GNU ELPA which adds CSV
separator guessing functionality to CSV mode.

It adds two new commands: `csv-guess-set-separator' that automatically
guesses and sets the CSV separator of the current buffer, and
`csv-set-separator' for setting it manually.

The idea is that `csv-guess-set-separator' can be useful to add to the
mode hook to have CSV mode guess and set the separator automatically
when visiting a buffer:

  (add-hook 'csv-mode-hook 'csv-guess-set-separator)

Been using it myself for the past weeks and have been happy with it so
far.

[0001-Add-CSV-separator-guessing-functionality.patch (text/x-diff, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55315; Package emacs. (Sun, 08 May 2022 17:57:02 GMT) Full text and rfc822 format available.

Message #8 received at 55315 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Simen Heggestøyl <simenheg <at> runbox.com>
Cc: 55315 <at> debbugs.gnu.org
Subject: bug#55315: [elpa/csv-mode] [PATCH] CSV separator guessing
Date: Sun, 8 May 2022 19:56:00 +0200
> +         (setq csv-separator-chars (mapcar #'string-to-char value))
> +         (let ((quoted-value (mapcar #'regexp-quote value)))
> +           (setq csv--skip-chars (apply #'concat "^\n" quoted-value))
> +           (setq csv-separator-regexp
> +                 (apply #'concat `("[" ,@quoted-value "]"))))

`regexp-quote` produces a regexp from a string literal, but what goes inside the square brackets is not a regexp -- the syntax rules are different. More specifically, other characters are special, and backslash does not quote anything.

To produce a regexp that matches one in a set of characters, try rx-to-string or regexp-opt. For example,

(setq csv-separator-regexp (rx-to-string `(or ,@csv-separator-chars) t))

The same applies to csv--skip-chars: this isn't a regexp either, but uses yet another syntax so regexp-quote is inappropriate here too. Easiest is to precede each char with a backslash since that always yields a correctly quoted character: "ABC" -> "\\A\\B\\C".

This is not a judgement on the rest of the patch which may be fine for all I know.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55315; Package emacs. (Sun, 08 May 2022 19:32:02 GMT) Full text and rfc822 format available.

Message #11 received at 55315 <at> debbugs.gnu.org (full text, mbox):

From: Simen Heggestøyl <simenheg <at> runbox.com>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: 55315 <at> debbugs.gnu.org
Subject: Re: bug#55315: [elpa/csv-mode] [PATCH] CSV separator guessing
Date: Sun, 08 May 2022 21:31:12 +0200
[Message part 1 (text/plain, inline)]
Mattias Engdegård <mattiase <at> acm.org> writes:

>> +         (setq csv-separator-chars (mapcar #'string-to-char value))
>> +         (let ((quoted-value (mapcar #'regexp-quote value)))
>> +           (setq csv--skip-chars (apply #'concat "^\n" quoted-value))
>> +           (setq csv-separator-regexp
>> +                 (apply #'concat `("[" ,@quoted-value "]"))))
>
> `regexp-quote` produces a regexp from a string literal, but what goes
> inside the square brackets is not a regexp -- the syntax rules are
> different. More specifically, other characters are special, and
> backslash does not quote anything.
>
> To produce a regexp that matches one in a set of characters, try rx-to-string or regexp-opt. For example,
>
> (setq csv-separator-regexp (rx-to-string `(or ,@csv-separator-chars) t))
>
> The same applies to csv--skip-chars: this isn't a regexp either, but
> uses yet another syntax so regexp-quote is inappropriate here
> too. Easiest is to precede each char with a backslash since that
> always yields a correctly quoted character: "ABC" -> "\\A\\B\\C".
>
> This is not a judgement on the rest of the patch which may be fine for all I know.

Thanks Mattias.

Does it look better in the updated patch attached?

Note that `csv--skip-chars' and `csv-separator-regexp' are set in two
different places in the patch, the first time from a list of strings in
`csv-separators', and the second time from a single character in
`csv-set-separator'. Am I right in thinking that the use of
`regexp-quote' in the `csv-set-separator' case gives the right result?

-- Simen

[0001-Add-CSV-separator-guessing-functionality.patch (text/x-diff, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55315; Package emacs. (Mon, 09 May 2022 09:38:01 GMT) Full text and rfc822 format available.

Message #14 received at 55315 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Simen Heggestøyl <simenheg <at> runbox.com>
Cc: 55315 <at> debbugs.gnu.org
Subject: Re: bug#55315: [elpa/csv-mode] [PATCH] CSV separator guessing
Date: Mon, 9 May 2022 11:37:41 +0200
8 maj 2022 kl. 21.31 skrev Simen Heggestøyl <simenheg <at> runbox.com>:

> Am I right in thinking that the use of
> `regexp-quote' in the `csv-set-separator' case gives the right result?

Yes, I think so. `csv-set-separator` should probably escape the character in `csv--skip-chars`, however:

  (setq-local csv--skip-chars (format "^\n%c" sep))

should be

  (setq-local csv--skip-chars (format "^\n\\%c" sep))

I'm not sure if a separator can be chosen that needs escaping here but better be safe; who knows how the code will be used.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55315; Package emacs. (Mon, 09 May 2022 11:05:02 GMT) Full text and rfc822 format available.

Message #17 received at 55315 <at> debbugs.gnu.org (full text, mbox):

From: Simen Heggestøyl <simenheg <at> runbox.com>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: 55315 <at> debbugs.gnu.org
Subject: Re: bug#55315: [elpa/csv-mode] [PATCH] CSV separator guessing
Date: Mon, 09 May 2022 13:03:48 +0200
[Message part 1 (text/plain, inline)]
Mattias Engdegård <mattiase <at> acm.org> writes:

> 8 maj 2022 kl. 21.31 skrev Simen Heggestøyl <simenheg <at> runbox.com>:
>
>> Am I right in thinking that the use of
>> `regexp-quote' in the `csv-set-separator' case gives the right result?
>
> Yes, I think so. `csv-set-separator` should probably escape the character in `csv--skip-chars`, however:
>
>   (setq-local csv--skip-chars (format "^\n%c" sep))
>
> should be
>
>   (setq-local csv--skip-chars (format "^\n\\%c" sep))
>
> I'm not sure if a separator can be chosen that needs escaping here but
> better be safe; who knows how the code will be used.

Ah, thanks, I misread the docstring of `skip-chars-forward':

  (but not at the end of a range; quoting is never needed there)

I somehow misinterpreted that as quoting not being necessary at the end
of the string fed to `skip-chars-forward'.

Updated patch with your proposed fix attached.

[0001-Add-CSV-separator-guessing-functionality.patch (text/x-diff, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55315; Package emacs. (Mon, 09 May 2022 11:30:02 GMT) Full text and rfc822 format available.

Message #20 received at 55315 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Simen Heggestøyl <simenheg <at> runbox.com>
Cc: 55315 <at> debbugs.gnu.org
Subject: Re: bug#55315: [elpa/csv-mode] [PATCH] CSV separator guessing
Date: Mon, 9 May 2022 13:28:50 +0200
9 maj 2022 kl. 13.03 skrev Simen Heggestøyl <simenheg <at> runbox.com>:

> Updated patch with your proposed fix attached.

Thanks, looks fine with respect to the regexp and skip-set generation.
For the remainder of the patch (the vast bulk) you are probably more qualified to judge!

By the way, thanks for the reference to the CSV wrangling paper.





Reply sent to Simen Heggestøyl <simenheg <at> runbox.com>:
You have taken responsibility. (Thu, 12 May 2022 20:00:02 GMT) Full text and rfc822 format available.

Notification sent to Simen Heggestøyl <simenheg <at> runbox.com>:
bug acknowledged by developer. (Thu, 12 May 2022 20:00:02 GMT) Full text and rfc822 format available.

Message #25 received at 55315-done <at> debbugs.gnu.org (full text, mbox):

From: Simen Heggestøyl <simenheg <at> runbox.com>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: 55315-done <at> debbugs.gnu.org
Subject: Re: bug#55315: [elpa/csv-mode] [PATCH] CSV separator guessing
Date: Thu, 12 May 2022 21:59:32 +0200
Mattias Engdegård <mattiase <at> acm.org> writes:

> 9 maj 2022 kl. 13.03 skrev Simen Heggestøyl <simenheg <at> runbox.com>:
>
>> Updated patch with your proposed fix attached.
>
> Thanks, looks fine with respect to the regexp and skip-set generation.

Good! Thanks for taking another look. I've merged the patch.

-- Simen




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 10 Jun 2022 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 292 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.