GNU bug report logs - #20154
25.0.50; json-encode-string is too slow for large strings

Previous Next

Package: emacs;

Reported by: Dmitry Gutov <dgutov <at> yandex.ru>

Date: Fri, 20 Mar 2015 14:27:01 UTC

Severity: normal

Found in version 25.0.50

Done: Dmitry Gutov <dgutov <at> yandex.ru>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 20154 in the body.
You can then email your comments to 20154 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Fri, 20 Mar 2015 14:27:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Dmitry Gutov <dgutov <at> yandex.ru>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Fri, 20 Mar 2015 14:27:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: bug-gnu-emacs <at> gnu.org
Subject: 25.0.50; json-encode-string is too slow for large strings
Date: Fri, 20 Mar 2015 16:26:07 +0200
A 300Kb string takes 0.5s to encode on my machine. Example:

(defvar s (apply #'concat (cl-loop for i from 1 to 30000
                                   collect "0123456789\n")))

(length (json-encode-string s))

For a comparison, the built-in json module in my local Python
installation takes only 2ms to do that for the same string.

This is important for advanced code completion in general, because JSON
is a common transport format, and sending the contents of the current
buffer to the server is a common approach to avoid needlessly saving it
(and running associated hooks, etc).

And in this specific case, our JSON encoding speed is a bottleneck when
working with ycmd, the editor-agnostic code completion daemon extracted
from a popular Vim package:
https://github.com/company-mode/company-mode/issues/325#issuecomment-83120928

I've tried to reimplement this function using `replace-regexp-in-string'
or `re-search-forward' with a temp buffer, to minimize the number of
concatenations and `json-encode-char' calls in the fast case (all
characters are ASCII), but as long as characters that need to be encoded
(such as newlines) still occur throughout the contents of the string,
the speed improvement is nowhere near the acceptable level. Should it be
written in C?

In GNU Emacs 25.0.50.1 (x86_64-unknown-linux-gnu, GTK+ Version 3.12.2)
 of 2015-03-20 on axl
Repository revision: 8142fc97af742e083fb83e4d0470da59b123a467
Windowing system distributor `The X.Org Foundation', version 11.0.11601901
System Description:	Ubuntu 14.10




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Fri, 20 Mar 2015 14:35:01 GMT) Full text and rfc822 format available.

Message #8 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50;
 json-encode-string is too slow for large strings
Date: Fri, 20 Mar 2015 16:34:34 +0200
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> Date: Fri, 20 Mar 2015 16:26:07 +0200
> 
> A 300Kb string takes 0.5s to encode on my machine. Example:
> 
> (defvar s (apply #'concat (cl-loop for i from 1 to 30000
>                                    collect "0123456789\n")))
> 
> (length (json-encode-string s))
> 
> For a comparison, the built-in json module in my local Python
> installation takes only 2ms to do that for the same string.
> 
> This is important for advanced code completion in general, because JSON
> is a common transport format, and sending the contents of the current
> buffer to the server is a common approach to avoid needlessly saving it
> (and running associated hooks, etc).
> 
> And in this specific case, our JSON encoding speed is a bottleneck when
> working with ycmd, the editor-agnostic code completion daemon extracted
> from a popular Vim package:
> https://github.com/company-mode/company-mode/issues/325#issuecomment-83120928
> 
> I've tried to reimplement this function using `replace-regexp-in-string'
> or `re-search-forward' with a temp buffer, to minimize the number of
> concatenations and `json-encode-char' calls in the fast case (all
> characters are ASCII), but as long as characters that need to be encoded
> (such as newlines) still occur throughout the contents of the string,
> the speed improvement is nowhere near the acceptable level. Should it be
> written in C?

I suggest to start with a detailed profile of the current
implementation, because only then we will be able to talk
intelligently about what part(s) need(s) to be sped up.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Fri, 20 Mar 2015 14:44:01 GMT) Full text and rfc822 format available.

Message #11 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50; json-encode-string is too slow for
 large strings
Date: Fri, 20 Mar 2015 16:43:36 +0200
On 03/20/2015 04:34 PM, Eli Zaretskii wrote:

> I suggest to start with a detailed profile of the current
> implementation,

Any suggestions there? By the way, I've included the example, so you can 
also profile it yourself.

Here's the output of the built-in profiler, after using the current 
implementation:

CPU:

- command-execute                                   1147  79%
 - call-interactively                               1147  79%
  - funcall-interactively                           1082  74%
   - eval-last-sexp                                 1035  71%
    - elisp--eval-last-sexp                         1035  71%
     - eval                                         1035  71%
      - length                                      1035  71%
       - json-encode-string                         1035  71%
        - mapconcat                                  874  60%
           json-encode-char                          571  39%
   + execute-extended-command                         40   2%
   + previous-line                                     7   0%
  + byte-code                                         65   4%
- ...                                                298  20%
   Automatic GC                                      298  20%

Memory:

- command-execute                            255,362,537  99%
 - call-interactively                        255,362,537  99%
  - funcall-interactively                    255,349,159  99%
   - eval-last-sexp                          248,605,484  97%
    - elisp--eval-last-sexp                  248,605,484  97%
     - eval                                  217,011,432  84%
      - length                               217,011,432  84%
       - json-encode-string                  217,011,432  84%
        - mapconcat                           93,689,099  36%
           json-encode-char                   81,784,197  32%
          format                                     814   0%
       elisp--eval-last-sexp-print-value          11,954   0%
     + elisp--preceding-sexp                       2,048   0%
   + execute-extended-command                  6,743,643   2%
   + previous-line                                    32   0%
  + byte-code                                     13,378   0%
+ xselect-convert-to-string                          176   0%
  ...                                                  0   0%





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Fri, 20 Mar 2015 15:05:01 GMT) Full text and rfc822 format available.

Message #14 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50;
 json-encode-string is too slow for large strings
Date: Fri, 20 Mar 2015 17:03:57 +0200
> Date: Fri, 20 Mar 2015 16:43:36 +0200
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> CC: 20154 <at> debbugs.gnu.org
> 
> On 03/20/2015 04:34 PM, Eli Zaretskii wrote:
> 
> > I suggest to start with a detailed profile of the current
> > implementation,
> 
> Any suggestions there?

json-encode-char and mapconcat take most of the time, so it seems.

> By the way, I've included the example, so you can also profile it
> yourself.

Yes, I could.  What's your point, though?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Fri, 20 Mar 2015 15:21:02 GMT) Full text and rfc822 format available.

Message #17 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50; json-encode-string is too slow for
 large strings
Date: Fri, 20 Mar 2015 17:20:25 +0200
On 03/20/2015 05:03 PM, Eli Zaretskii wrote:

> Yes, I could.  What's your point, though?

That if asking the question takes the same time as doing the profiling 
yourself, the latter would be more efficient. I don't really mind, just 
puzzled.

> json-encode-char and mapconcat take most of the time, so it seems.

So it does. But here's an alternative implementation I tried:

(defun json-encode-big-string (str)
  (with-temp-buffer
    (insert str)
    (goto-char (point-min))
    (while (re-search-forward "[\"\\/\b\f\b\r\t]\\|[^ -~]" nil t)
      (replace-match (json-encode-char (char-after (match-beginning 0)))
                     t t))
    (format "\"%s\"" (buffer-string))))

It takes 0.15s here, which is still too long. Here's its profile.

CPU:

- command-execute                                   1245  95%
 - call-interactively                               1245  95%
  - funcall-interactively                           1184  90%
   - eval-last-sexp                                 1140  87%
    - elisp--eval-last-sexp                         1140  87%
     - eval                                         1140  87%
      - length                                      1140  87%
       - json-encode-big-string                     1140  87%
        - let                                       1140  87%
         - save-current-buffer                      1140  87%
          - unwind-protect                          1136  87%
           - progn                                  1136  87%
            - while                                  980  75%
             - replace-match                         332  25%
              - json-encode-char                     212  16%
                 char-after                            4   0%
              format                                   4   0%
   + execute-extended-command                         37   2%
   + previous-line                                     7   0%
  + byte-code                                         61   4%
+ ...                                                 57   4%

Memory:

- command-execute                             76,018,070 100%
 - call-interactively                         76,018,070 100%
  - funcall-interactively                     76,005,728  99%
   - eval-last-sexp                           69,257,352  91%
    - elisp--eval-last-sexp                   69,257,352  91%
     - eval                                   69,242,772  91%
      - length                                69,242,772  91%
       - json-encode-big-string               69,242,772  91%
        - let                                 69,242,772  91%
         - save-current-buffer                69,234,412  91%
          - unwind-protect                    69,040,810  90%
           - progn                            69,033,546  90%
            - while                           55,201,778  72%
             - replace-match                  17,829,052  23%
                json-encode-char              10,471,476  13%
              format                           2,640,256   3%
           generate-new-buffer                     8,360   0%
       elisp--eval-last-sexp-print-value          12,532   0%
     + elisp--preceding-sexp                       2,048   0%
   - execute-extended-command                  6,748,360   8%
    - command-execute                          6,685,480   8%
     - call-interactively                      6,685,480   8%
      - funcall-interactively                  6,685,464   8%
       + profiler-report                       6,681,851   8%
       + profiler-start                            3,613   0%
    + sit-for                                      3,320   0%
   + previous-line                                    16   0%
  + byte-code                                     12,342   0%
  ...                                                  0   0%





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Fri, 20 Mar 2015 16:04:01 GMT) Full text and rfc822 format available.

Message #20 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50;
 json-encode-string is too slow for large strings
Date: Fri, 20 Mar 2015 18:02:57 +0200
> Date: Fri, 20 Mar 2015 17:20:25 +0200
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> CC: 20154 <at> debbugs.gnu.org
> 
> On 03/20/2015 05:03 PM, Eli Zaretskii wrote:
> 
> > Yes, I could.  What's your point, though?
> 
> That if asking the question takes the same time as doing the profiling 
> yourself, the latter would be more efficient. I don't really mind, just 
> puzzled.

I have other things on my plate while I read email.  If my advice
bothers you, I can shut up in the future.

>  > json-encode-char and mapconcat take most of the time, so it seems.
> 
> So it does. But here's an alternative implementation I tried:
> 
> (defun json-encode-big-string (str)
>    (with-temp-buffer
>      (insert str)
>      (goto-char (point-min))
>      (while (re-search-forward "[\"\\/\b\f\b\r\t]\\|[^ -~]" nil t)
>        (replace-match (json-encode-char (char-after (match-beginning 0)))
>                       t t))
>      (format "\"%s\"" (buffer-string))))
> 
> It takes 0.15s here, which is still too long.

I suggest to rewrite json-encode-char, it does a lot of unnecessary
stuff, starting with the call to encode-char (which was needed in
Emacs 22 and before, but no more).  The call to rassoc is also
redundant, since you already have that covered in your regexp.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Fri, 20 Mar 2015 16:22:01 GMT) Full text and rfc822 format available.

Message #23 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50; json-encode-string is too slow for
 large strings
Date: Fri, 20 Mar 2015 18:21:46 +0200
On 03/20/2015 06:02 PM, Eli Zaretskii wrote:

> I have other things on my plate while I read email.  If my advice
> bothers you, I can shut up in the future.

Very well. Please continue with the advice.

> I suggest to rewrite json-encode-char, it does a lot of unnecessary
> stuff, starting with the call to encode-char (which was needed in
> Emacs 22 and before, but no more).  The call to rassoc is also
> redundant, since you already have that covered in your regexp.

Yes, I thought about that, but as the number of calls to 
`json-encode-char' must have decreased by 10 in the new version (only 
each 10th character needs to be encoded), and the runtime only decreased 
by 3 (or by 2, in a different example I have), the total improvement 
can't be dramatic enough even if `json-encode-char' is lightning-fast.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Fri, 20 Mar 2015 16:45:01 GMT) Full text and rfc822 format available.

Message #26 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50;
 json-encode-string is too slow for large strings
Date: Fri, 20 Mar 2015 18:44:35 +0200
> Date: Fri, 20 Mar 2015 18:21:46 +0200
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> CC: 20154 <at> debbugs.gnu.org
> 
> > I suggest to rewrite json-encode-char, it does a lot of unnecessary
> > stuff, starting with the call to encode-char (which was needed in
> > Emacs 22 and before, but no more).  The call to rassoc is also
> > redundant, since you already have that covered in your regexp.
> 
> Yes, I thought about that, but as the number of calls to 
> `json-encode-char' must have decreased by 10 in the new version (only 
> each 10th character needs to be encoded), and the runtime only decreased 
> by 3 (or by 2, in a different example I have), the total improvement 
> can't be dramatic enough even if `json-encode-char' is lightning-fast.

To see how much of the time is taken by json-encode-char, replace it
with something trivial, like 1+, and see what speedup you get.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Fri, 20 Mar 2015 16:53:02 GMT) Full text and rfc822 format available.

Message #29 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50; json-encode-string is too slow for
 large strings
Date: Fri, 20 Mar 2015 18:52:26 +0200
On 03/20/2015 06:44 PM, Eli Zaretskii wrote:

> To see how much of the time is taken by json-encode-char, replace it
> with something trivial, like 1+, and see what speedup you get.

Yep. Replacing the second definition with

(defun json-encode-big-string (str)
  (with-temp-buffer
    (insert str)
    (goto-char (point-min))
    (while (re-search-forward "[\"\\/\b\f\b\r\t]\\|[^ -~]" nil t)
      (replace-match "z" t t))
    (format "\"%s\"" (buffer-string))))

still makes it take ~100ms on the example string (as opposed to 2ms in 
the optimized Python implementation).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Fri, 20 Mar 2015 17:45:02 GMT) Full text and rfc822 format available.

Message #32 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50;
 json-encode-string is too slow for large strings
Date: Fri, 20 Mar 2015 19:44:37 +0200
> Date: Fri, 20 Mar 2015 18:52:26 +0200
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> CC: 20154 <at> debbugs.gnu.org
> 
> On 03/20/2015 06:44 PM, Eli Zaretskii wrote:
> 
> > To see how much of the time is taken by json-encode-char, replace it
> > with something trivial, like 1+, and see what speedup you get.
> 
> Yep. Replacing the second definition with
> 
> (defun json-encode-big-string (str)
>    (with-temp-buffer
>      (insert str)
>      (goto-char (point-min))
>      (while (re-search-forward "[\"\\/\b\f\b\r\t]\\|[^ -~]" nil t)
>        (replace-match "z" t t))
>      (format "\"%s\"" (buffer-string))))
> 
> still makes it take ~100ms on the example string (as opposed to 2ms in 
> the optimized Python implementation).

That's not what I see here.  I cannot get the time above 1 sec even
with a 1000 time longer input string, if I replace json-encode-char
with 1+.

So I think your 100ms is the constant overhead of some kind.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Fri, 20 Mar 2015 18:43:02 GMT) Full text and rfc822 format available.

Message #35 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50; json-encode-string is too slow for
 large strings
Date: Fri, 20 Mar 2015 20:42:14 +0200
On 03/20/2015 07:44 PM, Eli Zaretskii wrote:

> That's not what I see here.  I cannot get the time above 1 sec even
> with a 1000 time longer input string, if I replace json-encode-char
> with 1+.

What code exactly have you tried? You can't just replace 
json-encode-char with 1+. The former returns a string, the latter 
returns a number (or a char, I guess).

> So I think your 100ms is the constant overhead of some kind.

If you just changed the upper bound in the defvar init form (from 30000 
to something), I suspect you forgot to use C-M-x instead of C-x C-e, to 
update the actual value of the variable.

Making the string 10 times longer increases the runtime by ~5 here (0.1 
-> 0.5). Another 10x increase in length makes it run 4.3 seconds.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Fri, 20 Mar 2015 21:15:02 GMT) Full text and rfc822 format available.

Message #38 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50;
 json-encode-string is too slow for large strings
Date: Fri, 20 Mar 2015 23:14:26 +0200
> Date: Fri, 20 Mar 2015 20:42:14 +0200
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> CC: 20154 <at> debbugs.gnu.org
> 
> Making the string 10 times longer increases the runtime by ~5 here (0.1 
> -> 0.5). Another 10x increase in length makes it run 4.3 seconds.

So maybe writing this in C is the way to go.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Fri, 20 Mar 2015 22:03:02 GMT) Full text and rfc822 format available.

Message #41 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50; json-encode-string is too slow for
 large strings
Date: Sat, 21 Mar 2015 00:02:51 +0200
On 03/20/2015 11:14 PM, Eli Zaretskii wrote:

>> Making the string 10 times longer increases the runtime by ~5 here (0.1
>> -> 0.5). Another 10x increase in length makes it run 4.3 seconds.
>
> So maybe writing this in C is the way to go.

Maybe implementing `json-encode-string` itself in C isn't strictly 
necessary, or even particularly advantageous.

How about trying to optimize `replace-match' or 
`replace-regexp-in-string' (which are the main two approaches we can use 
to implement `json-encode-string') for the case of large input?

Take this example:

(setq s1 (apply #'concat (cl-loop for i from 1 to 30000
                                  collect "123456789\n"))
      s2 (apply #'concat (cl-loop for i from 1 to 15000
                                  collect "1234567890123456789\n")))

On my machine,

(replace-regexp-in-string "\n" "z" s1 t t)

takes ~0.13s, while

(replace-regexp-in-string "\n" "z" s2 t t)

clocks at ~0.08-0.10.

Which is, again, pretty slow by modern standards.

(And I've only now realized that the above function is implemented in 
Lisp; all the more reason to move it to C).

Replacing "z" with #'identity (so now we include a function call 
overhead) increases the averages to 0.15s and 0.10s respectively.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Fri, 20 Mar 2015 22:27:01 GMT) Full text and rfc822 format available.

Message #44 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50; json-encode-string is too slow for
 large strings
Date: Sat, 21 Mar 2015 00:26:03 +0200
As per your comment, this seems to be the best we can do for long 
strings, without diving into C:

(defun json-encode-string-1 (string)
  "Return a JSON representation of STRING."
  (with-temp-buffer
    (insert string)
    (goto-char (point-min))
    ;; Skip over ASCIIish printable characters.
    (while (re-search-forward "\\([\"\\/\b\f\n\r\t]\\)\\|[^ -~]" nil t)
      (replace-match
       (if (match-beginning 1)
           ;; Special JSON character (\n, \r, etc.).
           (format "\\%c" (char-after (match-beginning 0)))
         ;; Fallback: UCS code point in \uNNNN form.
         (format "\\u%04x" (char-after (match-beginning 0))))
       t t))
    (format "\"%s\"" (buffer-string))))

It brings the execution time down to ~0.14s here, on the same example.

And there'll need to be a fallback for short strings, because 
`with-temp-buffer' overhead is non-trivial.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sat, 21 Mar 2015 07:59:01 GMT) Full text and rfc822 format available.

Message #47 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50;
 json-encode-string is too slow for large strings
Date: Sat, 21 Mar 2015 09:58:43 +0200
> Date: Sat, 21 Mar 2015 00:02:51 +0200
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> CC: 20154 <at> debbugs.gnu.org
> 
> On 03/20/2015 11:14 PM, Eli Zaretskii wrote:
> 
> >> Making the string 10 times longer increases the runtime by ~5 here (0.1
> >> -> 0.5). Another 10x increase in length makes it run 4.3 seconds.
> >
> > So maybe writing this in C is the way to go.
> 
> Maybe implementing `json-encode-string` itself in C isn't strictly 
> necessary, or even particularly advantageous.

It depends on your requirements.  How fast would it need to run to
satisfy your needs?

> How about trying to optimize `replace-match' or 
> `replace-regexp-in-string' (which are the main two approaches we can use 
> to implement `json-encode-string') for the case of large input?
> 
> Take this example:
> 
> (setq s1 (apply #'concat (cl-loop for i from 1 to 30000
>                                    collect "123456789\n"))
>        s2 (apply #'concat (cl-loop for i from 1 to 15000
>                                    collect "1234567890123456789\n")))
> 
> On my machine,
> 
> (replace-regexp-in-string "\n" "z" s1 t t)
> 
> takes ~0.13s, while
> 
> (replace-regexp-in-string "\n" "z" s2 t t)
> 
> clocks at ~0.08-0.10.
> 
> Which is, again, pretty slow by modern standards.

You don't really need regexp replacement functions with all its
features here, do you?  What you need is a way to skip characters that
are "okay", then replace the character that is "not okay" with its
encoded form, then repeat.

So an alternative strategy would be to use 'skip-chars-forward' to
skip to the next locus where encoding is necessary, and 'append' to
construct the output string from the "okay" parts and the encoded "not
okay" part.  This bypasses most of the unneeded complications in
'replace-match' and 'replace-regexp-in-string'.  I don't know if that
will be faster, but I think it's worth trying.  For starters, how fast
can you iterate through the string with 'skip-chars-forward', stopping
at characters that need encoding, without actually encoding them, but
just consing the output string by appending the parts delimited by
places where 'skip-chars-forward' stopped?  That's the lower bound on
performance using this method.

> (And I've only now realized that the above function is implemented in 
> Lisp; all the more reason to move it to C).

I think the latest tendency is the opposite: move to Lisp everything
that doesn't need to be in C.  If some specific application needs more
speed than we can provide, the first thing I'd try is think of a new
primitive by abstracting your use case enough to be more useful than
just for JSON.

Of course, implementing the precise use case in C first is probably a
prerequisite, since it could turn out that the problem is somewhere
else, or that even in C you won't get the speed you want.

> Replacing "z" with #'identity (so now we include a function call 
> overhead) increases the averages to 0.15s and 0.10s respectively.

Sounds like the overhead of the Lisp interpreter is a significant
factor here, no?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sat, 21 Mar 2015 08:08:01 GMT) Full text and rfc822 format available.

Message #50 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50;
 json-encode-string is too slow for large strings
Date: Sat, 21 Mar 2015 10:07:22 +0200
> Date: Sat, 21 Mar 2015 00:26:03 +0200
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> CC: 20154 <at> debbugs.gnu.org
> 
> As per your comment, this seems to be the best we can do for long 
> strings, without diving into C:
> 
> (defun json-encode-string-1 (string)
>    "Return a JSON representation of STRING."
>    (with-temp-buffer
>      (insert string)
>      (goto-char (point-min))
>      ;; Skip over ASCIIish printable characters.
>      (while (re-search-forward "\\([\"\\/\b\f\n\r\t]\\)\\|[^ -~]" nil t)
>        (replace-match
>         (if (match-beginning 1)
>             ;; Special JSON character (\n, \r, etc.).
>             (format "\\%c" (char-after (match-beginning 0)))

Do you really need 'format' here?  Why not

    (concat "\\" (char-to-string (char-after (match-beginning 0))))

instead?

Or even simply insert these two parts at the match point, after
deleting the match.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sat, 21 Mar 2015 08:14:02 GMT) Full text and rfc822 format available.

Message #53 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: dgutov <at> yandex.ru
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50;
 json-encode-string is too slow for large strings
Date: Sat, 21 Mar 2015 10:12:55 +0200
> Date: Sat, 21 Mar 2015 09:58:43 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 20154 <at> debbugs.gnu.org
> 
> So an alternative strategy would be to use 'skip-chars-forward' to
> skip to the next locus where encoding is necessary, and 'append' to
                                                          ^^^^^^^^
Sorry, I meant 'concat', of course.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sat, 21 Mar 2015 20:01:02 GMT) Full text and rfc822 format available.

Message #56 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50; json-encode-string is too slow for
 large strings
Date: Sat, 21 Mar 2015 22:00:46 +0200
On 03/21/2015 09:58 AM, Eli Zaretskii wrote:

> It depends on your requirements.  How fast would it need to run to
> satisfy your needs?

In this case, the buffer contents are encoded to JSON at most once per 
keypress. So 50ms or below should be fast enough, especially since most 
files are smaller than that.

Of course, I'm sure there are use cases for fast JSON encoding/decoding 
of even bigger volumes of data, but they can probably wait until we have 
FFI.

> You don't really need regexp replacement functions with all its
> features here, do you?  What you need is a way to skip characters that
> are "okay", then replace the character that is "not okay" with its
> encoded form, then repeat.

It doesn't seem like regexp searching is the slow part: save for the GC 
pauses, looking for the non-matching regexp in the same string -

(replace-regexp-in-string "x" "z" s1 t t)

- only takes ~3ms.

And likewise, after changing them to use `concat' instead of `format', 
both alternative json-encode-string implementations that I have "encode" 
a numbers-only (without newlines) string of the same length in a few 
milliseconds. Again, save for the GC pauses, which can add 30-40ms.

> For starters, how fast
> can you iterate through the string with 'skip-chars-forward', stopping
> at characters that need encoding, without actually encoding them, but
> just consing the output string by appending the parts delimited by
> places where 'skip-chars-forward' stopped?  That's the lower bound on
> performance using this method.

70-90ms if we simply skip 0-9, even without nreverse-ing and 
concatenating. But the change in runtime after adding an (apply #'concat 
(nreverse res)) step doesn't look statistically insignificant. Here's 
the implementation I tried:

(defun foofoo (string)
  (with-temp-buffer
    (insert string)
    (goto-char (point-min))
    (let (res)
      (while (not (eobp))
        (let ((skipped (skip-chars-forward "0-9")))
          (push (buffer-substring (- (point) skipped) (point))
                res))
        (forward-char 1))
      res)))

But that actually goes down to 30ms if we don't accumulate the result.

> I think the latest tendency is the opposite: move to Lisp everything
> that doesn't need to be in C.

Yes, and often that's great, if we're dealing with some piece of UI 
infrastructure that only gets called at most a few times per command, 
with inputs of size we can anticipate in advance.

> If some specific application needs more
> speed than we can provide, the first thing I'd try is think of a new
> primitive by abstracting your use case enough to be more useful than
> just for JSON.

That's why I suggested to do that with `replace-regexp-in-string' first. 
That's a very common feature, and in Python and Ruby it's written in C. 
Ruby's calling convention is even pretty close (the replacement can be a 
string, or it can take a block, which is a kind of a function).

> Of course, implementing the precise use case in C first is probably a
> prerequisite, since it could turn out that the problem is somewhere
> else, or that even in C you won't get the speed you want.

A fast `replace-regexp-in-string' may not get us where I want, but it 
should get us close. It will still be generally useful, and it'll save 
us from having two `json-encode-string' implementations - for long and 
short strings.

>> Replacing "z" with #'identity (so now we include a function call
>> overhead) increases the averages to 0.15s and 0.10s respectively.
>
> Sounds like the overhead of the Lisp interpreter is a significant
> factor here, no?

Yes and no. Given the 50ms budget, I think we can live with it for now, 
when it's the only problem.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sat, 21 Mar 2015 20:27:02 GMT) Full text and rfc822 format available.

Message #59 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50;
 json-encode-string is too slow for large strings
Date: Sat, 21 Mar 2015 22:25:48 +0200
> Date: Sat, 21 Mar 2015 22:00:46 +0200
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> CC: 20154 <at> debbugs.gnu.org
> 
> On 03/21/2015 09:58 AM, Eli Zaretskii wrote:
> 
> > It depends on your requirements.  How fast would it need to run to
> > satisfy your needs?
> 
> In this case, the buffer contents are encoded to JSON at most once per 
> keypress. So 50ms or below should be fast enough, especially since most 
> files are smaller than that.

So each keypress you need to encode the whole buffer, including the
last keypress and all those before it?

I guess I don't really understand why each keypress should trigger
encoding of the whole buffer.

> > You don't really need regexp replacement functions with all its
> > features here, do you?  What you need is a way to skip characters that
> > are "okay", then replace the character that is "not okay" with its
> > encoded form, then repeat.
> 
> It doesn't seem like regexp searching is the slow part: save for the GC 
> pauses, looking for the non-matching regexp in the same string -
> 
> (replace-regexp-in-string "x" "z" s1 t t)
> 
> - only takes ~3ms.

Then a series of calls to replace-regexp-in-string, one each for every
one of the "special" characters, should get you close to your goal,
right?

> And likewise, after changing them to use `concat' instead of `format', 
> both alternative json-encode-string implementations that I have "encode" 
> a numbers-only (without newlines) string of the same length in a few 
> milliseconds. Again, save for the GC pauses, which can add 30-40ms.

So does this mean you have your solution?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sat, 21 Mar 2015 21:07:01 GMT) Full text and rfc822 format available.

Message #62 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>, Eli Zaretskii <eliz <at> gnu.org>
Cc: 20154 <at> debbugs.gnu.org
Subject: RE: bug#20154: 25.0.50; json-encode-string is too slow for large
 strings
Date: Sat, 21 Mar 2015 14:05:55 -0700 (PDT)
> > I think the latest tendency is the opposite: move to Lisp everything
> > that doesn't need to be in C.
> 
> Yes, and often that's great, if we're dealing with some piece of UI
> infrastructure that only gets called at most a few times per command,
> with inputs of size we can anticipate in advance.

(FYI, I'm not following this thread.)  I will just say that if you want
or need to have something like `json-encode-string' be coded in C for
speed, an alternative might be for the actual code to invoke a Lisp
function when bound to a variable, e.g., `json-encode-string-function'.

That is, it can be OK to define something like the default encoding
of JSON in C, but perhaps you can give users the possibility of
optionally providing their own encoding Lisp function as well.  What
would, I think, be too bad would be to make it impossible or difficult
for users to provide their own encoding function (without messing with
C and rebuilding Emacs).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sat, 21 Mar 2015 21:11:01 GMT) Full text and rfc822 format available.

Message #65 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50; json-encode-string is too slow for
 large strings
Date: Sat, 21 Mar 2015 23:09:57 +0200
On 03/21/2015 10:07 AM, Eli Zaretskii wrote:

> Do you really need 'format' here?  Why not
>
>      (concat "\\" (char-to-string (char-after (match-beginning 0))))
>
> instead?
>
> Or even simply insert these two parts at the match point, after
> deleting the match.

Yes, thanks. It gave a small improvement, by 5ms or so. Here's the 
updated definition (by the way, `json-special-chars' is still needed, to 
convert ?\n to ?n, and so on, and the performance hit is negligible).

(defun json-encode-string-1 (string)
  "Return a JSON representation of STRING."
  (with-temp-buffer
    (insert string)
    (goto-char (point-min))
    ;; Skip over ASCIIish printable characters.
    (while (re-search-forward "\\([\"\\/\b\f\n\r\t]\\)\\|[^ -~]" nil t)
      (let ((c (char-before)))
        (delete-region (1- (point)) (point))
        (if (match-beginning 1)
            ;; Special JSON character (\n, \r, etc.).
            (insert "\\" (car (rassoc c json-special-chars)))
          ;; Fallback: UCS code point in \uNNNN form.
          (insert (format "\\u%04x" c)))))
    (concat "\"" (buffer-string) "\"")))

Futher, it seems I didn't measure its performance well enough to begin 
with. The average time out of 10 runs comes down to 85ms.

Compare it to 150ms, from this implementation:

(defun json-encode-string-2 (string)
  "Return a JSON representation of STRING."
  (concat "\""
          (replace-regexp-in-string
           "\\([\"\\/\b\f\n\r\t]\\)\\|[^ -~]"
           (lambda (s)
             (if (match-beginning 1)
                 (format "\\%c" (car (rassoc (string-to-char s)
                                             json-special-chars)))
               (format "\\u%04x" (string-to-char s))))
           string t t)
          "\""))





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sat, 21 Mar 2015 21:27:01 GMT) Full text and rfc822 format available.

Message #68 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50; json-encode-string is too slow for
 large strings
Date: Sat, 21 Mar 2015 23:26:19 +0200
On 03/21/2015 10:25 PM, Eli Zaretskii wrote:

> So each keypress you need to encode the whole buffer, including the
> last keypress and all those before it?

Pretty much. Ycmd server uses caching heavily, so it's not bothered by 
the frequent requests. And when extracting it from the YCM Vim package, 
the author measured the transport overhead, saw it's negligible, and 
went with the "send everything" approach. Here's the blog post about it:

https://plus.google.com/+StrahinjaMarković/posts/Zmr5uf2jCHm

(He's saying there the json module used is pure Python; this part he's 
most likely mistaken about).

> I guess I don't really understand why each keypress should trigger
> encoding of the whole buffer.

It's not necessary, just the recommended workflow. The server can take 
it: 
https://github.com/company-mode/company-mode/issues/325#issuecomment-83154084, 
and this way the suggestions reach the user the soonest.

Or course, we can wait until Emacs is idle for a bit, but even so if 
encoding takes 100ms (never mind 500ms it takes now), that can create 
visible stutters where they don't have to be, if the user starts typing 
again in the middle of it.

>> (replace-regexp-in-string "x" "z" s1 t t)
>>
>> - only takes ~3ms.
>
> Then a series of calls to replace-regexp-in-string, one each for every
> one of the "special" characters, should get you close to your goal,
> right?

No no no. There are no "x" characters in s1. I just wanted to 
demonstrate that the regexp searching by itself is not a bottleneck, so 
`skip-chars-forward' isn't really warranted. As long as we're replacing 
an actual character present in the string, it takes well above 3ms.

>> And likewise, after changing them to use `concat' instead of `format',
>> both alternative json-encode-string implementations that I have "encode"
>> a numbers-only (without newlines) string of the same length in a few
>> milliseconds. Again, save for the GC pauses, which can add 30-40ms.
>
> So does this mean you have your solution?

No. An actual buffer has lots of newlines, which need to be encoded. 
Again, the above is about the speed of the regexp engine.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sat, 21 Mar 2015 21:33:01 GMT) Full text and rfc822 format available.

Message #71 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Drew Adams <drew.adams <at> oracle.com>, Eli Zaretskii <eliz <at> gnu.org>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50; json-encode-string is too slow for
 large strings
Date: Sat, 21 Mar 2015 23:32:49 +0200
On 03/21/2015 11:05 PM, Drew Adams wrote:

> (FYI, I'm not following this thread.)  I will just say that if you want
> or need to have something like `json-encode-string' be coded in C for
> speed, an alternative might be for the actual code to invoke a Lisp
> function when bound to a variable, e.g., `json-encode-string-function'.

Rather, `json-encode-function'.

But that's solving a different problem, one we'd be lucky to have.

> What
> would, I think, be too bad would be to make it impossible or difficult
> for users to provide their own encoding function (without messing with
> C and rebuilding Emacs).

Yes, without FFI in Emacs that's pretty useless.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sat, 21 Mar 2015 22:22:02 GMT) Full text and rfc822 format available.

Message #74 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Ivan Shmakov <ivan <at> siamics.net>
To: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50;
 json-encode-string is too slow for large strings 
Date: Sat, 21 Mar 2015 22:20:51 +0000
>>>>> Dmitry Gutov <dgutov <at> yandex.ru> writes:

[…]

 > Here's the updated definition (by the way, `json-special-chars' is
 > still needed, to convert ?\n to ?n, and so on, and the performance
 > hit is negligible).

	Perhaps a plain vector may fit there?

 > (defun json-encode-string-1 (string)
 >   "Return a JSON representation of STRING."
 >   (with-temp-buffer
 >     (insert string)
 >     (goto-char (point-min))
 >     ;; Skip over ASCIIish printable characters.
 >     (while (re-search-forward "\\([\"\\/\b\f\n\r\t]\\)\\|[^ -~]" nil t)
 >       (let ((c (char-before)))
 >         (delete-region (1- (point)) (point))
 >         (if (match-beginning 1)
 >             ;; Special JSON character (\n, \r, etc.).
 >             (insert "\\" (car (rassoc c json-special-chars)))
 >           ;; Fallback: UCS code point in \uNNNN form.
 >           (insert (format "\\u%04x" c)))))
 >     (concat "\"" (buffer-string) "\"")))

	FWIW, using replace-match in the loop seem to speed up the
	routine by another few percents.

    (while (re-search-forward "\\([\"\\/\b\f\n\r\t]\\)\\|[^ -~]" nil t)
      (let ((c (char-before)))
        (replace-match
         (if (match-beginning 1)
             ;; Special JSON character (\n, \r, etc.).
             (string ?\\ (car (rassq c json-special-chars)))
           ;; Fallback: UCS code point in \uNNNN form.
           (format "\\u%04x" c))
         t t)))

[…]

-- 
FSF associate member #7257  http://boycottsystemd.org/  … 3013 B6A0 230E 334A




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sat, 21 Mar 2015 23:38:02 GMT) Full text and rfc822 format available.

Message #77 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: 20154 <at> debbugs.gnu.org
Cc: Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#20154: 25.0.50; json-encode-string is too slow for
 large strings
Date: Sun, 22 Mar 2015 01:36:55 +0200
On 03/22/2015 12:20 AM, Ivan Shmakov wrote:

> 	Perhaps a plain vector may fit there?

Where?

> 	FWIW, using replace-match in the loop seem to speed up the
> 	routine by another few percents.

Indeed, it seems so, counter to Eli's advice earlier. Just by a bit.

Anyway, the small boost is nice to have, but the buffer-based 
implementation is actually worse than the current one on small strings 
(because of `with-temp-buffer'). So I don't think we can simply replace 
it. A fast `replace-regexp-in-string' implementation would make that 
possible.

P.S. Please keep the discussion participants in Cc, even if you prefer 
not to receive the copy email.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sun, 22 Mar 2015 14:53:02 GMT) Full text and rfc822 format available.

Message #80 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50; json-encode-string is too slow for
 large strings
Date: Sun, 22 Mar 2015 16:52:03 +0200
Here's the version I've arrived at:

(defun json-encode-string-3 (string)
  "Return a JSON representation of STRING."
  ;; Reimplement the meat of `replace-regexp-in-string', for
  ;; performance (bug#20154).
  (let ((l (length string))
        (start 0)
        (res (list "\"")))
    ;; Skip over ASCIIish printable characters.
    (while (string-match "[\"\\/\b\f\n\r\t]\\|[^ -~]" string start)
      (let* ((mb (match-beginning 0))
             (c (aref string mb))
             (special (rassoc c json-special-chars)))
        (push (substring string start mb) res)
        (push (if special
                  ;; Special JSON character (\n, \r, etc.).
                  (string ?\\ (car special))
                ;; Fallback: UCS code point in \uNNNN form.
                (format "\\u%04x" c))
              res)
        (setq start (1+ mb))))
    (push (substring string start l) res)
    (push "\"" res)
    (apply #'concat (nreverse res))))

A bit slower than the temp-buffer version (90ms vs 80ms), but 
consistently faster than the current version, even on small strings. 
Probably the best we can do in Lisp.

If no one has any better ideas, I'm going to install it.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sun, 22 Mar 2015 16:16:02 GMT) Full text and rfc822 format available.

Message #83 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Ivan Shmakov <ivan <at> siamics.net>
To: 20154 <at> debbugs.gnu.org, Dmitry Gutov <dgutov <at> yandex.ru>
Subject: Re: bug#20154: 25.0.50;
 json-encode-string is too slow for large strings 
Date: Sun, 22 Mar 2015 16:15:48 +0000
>>>>> Dmitry Gutov <dgutov <at> yandex.ru> writes:

 > (defun json-encode-string-3 (string)
 >   "Return a JSON representation of STRING."
 >   (let ((l (length string))
 >         (start 0)
 >         (res (list "\"")))
 >     ;; Skip over ASCIIish printable characters.
 >     (while (string-match "[\"\\/\b\f\n\r\t]\\|[^ -~]" string start)
 >       (let* ((mb (match-beginning 0))

	Why not ‘let’ mb above and use (while (setq mb (string-match …))
	…) here (instead of going through match-beginning)?

 >              (c (aref string mb))
 >              (special (rassoc c json-special-chars)))

	Is there a specific reason to prefer rassoc over rassq here?

 >         (push (substring string start mb) res)
 >         (push (if special
 >                   ;; Special JSON character (\n, \r, etc.).
 >                   (string ?\\ (car special))
 >                 ;; Fallback: UCS code point in \uNNNN form.
 >                 (format "\\u%04x" c))
 >               res)
 >         (setq start (1+ mb))))
 >     (push (substring string start l) res)
 >     (push "\"" res)
 >     (apply #'concat (nreverse res))))

	I guess you can (apply #'concat "\"" (substring …) (nreverse …))
	just as well, instead of pushing to the list just before getting
	rid of it.

[…]

 > Please keep the discussion participants in Cc, even if you prefer not
 > to receive the copy email.

	Curiously, per my experience, the practice of Cc:-ing the
	subscribers tends to be frowned upon when it comes to the lists
	dedicated to free software.  For the reasons I do not know or
	understand, Emacs lists are apparently an exception, though.

	I don’t seem to recall any issues with me trying to stick to the
	custom (of /not/ Cc:-ing) I’ve learned earlier; but if
	necessary, I surely can remember to Cc: those who so request.

-- 
FSF associate member #7257  http://boycottsystemd.org/  … 3013 B6A0 230E 334A




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sun, 22 Mar 2015 16:48:02 GMT) Full text and rfc822 format available.

Message #86 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50; json-encode-string is too slow for
 large strings
Date: Sun, 22 Mar 2015 18:47:24 +0200
On 03/22/2015 06:15 PM, Ivan Shmakov wrote:

> 	Why not ‘let’ mb above and use (while (setq mb (string-match …))
> 	…) here (instead of going through match-beginning)?

Good point, thanks. It wins a few milliseconds.

> 	Is there a specific reason to prefer rassoc over rassq here?

Not at all. Good call, though no performance improvement.

>   >     (push (substring string start l) res)
>   >     (push "\"" res)
>   >     (apply #'concat (nreverse res))))
>
> 	I guess you can (apply #'concat "\"" (substring …) (nreverse …))
> 	just as well, instead of pushing to the list just before getting
> 	rid of it.

Also good idea, but partially. That gets rid of the initial binding for 
`res', but those (substring ...) value and quote have to go to the end 
of the string. We can't put them as the last arguments to `apply'.

>   > Please keep the discussion participants in Cc, even if you prefer not
>   > to receive the copy email.
>
> 	Curiously, per my experience, the practice of Cc:-ing the
> 	subscribers tends to be frowned upon when it comes to the lists
> 	dedicated to free software.  For the reasons I do not know or
> 	understand, Emacs lists are apparently an exception, though.

To the best of my knowledge, debbugs only sends a copy to the bug's 
author, and there's no way to subscribe. So that excludes Eli (although 
he probably subscribes to all bugs anyway).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sun, 22 Mar 2015 16:51:02 GMT) Full text and rfc822 format available.

Message #89 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50;
 json-encode-string is too slow for large strings
Date: Sun, 22 Mar 2015 18:50:05 +0200
> Date: Sun, 22 Mar 2015 16:52:03 +0200
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> CC: 20154 <at> debbugs.gnu.org
> 
> (defun json-encode-string-3 (string)
>    "Return a JSON representation of STRING."
>    ;; Reimplement the meat of `replace-regexp-in-string', for
>    ;; performance (bug#20154).
>    (let ((l (length string))
>          (start 0)
>          (res (list "\"")))
>      ;; Skip over ASCIIish printable characters.
>      (while (string-match "[\"\\/\b\f\n\r\t]\\|[^ -~]" string start)
>        (let* ((mb (match-beginning 0))
>               (c (aref string mb))
>               (special (rassoc c json-special-chars)))

Did you try a 'cond' with specific characters instead of calling
rassoc every time?  The list in json-special-chars is not long, is it?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sun, 22 Mar 2015 16:52:01 GMT) Full text and rfc822 format available.

Message #92 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Ivan Shmakov <ivan <at> siamics.net>
Cc: dgutov <at> yandex.ru, 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50;
 json-encode-string is too slow for large strings
Date: Sun, 22 Mar 2015 18:51:00 +0200
> From: Ivan Shmakov <ivan <at> siamics.net>
> Date: Sun, 22 Mar 2015 16:15:48 +0000
> 
>  > Please keep the discussion participants in Cc, even if you prefer not
>  > to receive the copy email.
> 
> 	Curiously, per my experience, the practice of Cc:-ing the
> 	subscribers tends to be frowned upon when it comes to the lists
> 	dedicated to free software.  For the reasons I do not know or
> 	understand, Emacs lists are apparently an exception, though.

The lists that require you NOT to CC are a nuisance, IMO.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sun, 22 Mar 2015 17:11:02 GMT) Full text and rfc822 format available.

Message #95 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50; json-encode-string is too slow for
 large strings
Date: Sun, 22 Mar 2015 19:10:02 +0200
On 03/22/2015 06:50 PM, Eli Zaretskii wrote:

> Did you try a 'cond' with specific characters instead of calling
> rassoc every time?  The list in json-special-chars is not long, is it?

Tried that now, but it only made performance worse, by a few ms.

And anyway, I wouldn't want to duplicate json-special-chars, that list 
is also used in another function.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sun, 22 Mar 2015 17:32:01 GMT) Full text and rfc822 format available.

Message #98 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50;
 json-encode-string is too slow for large strings
Date: Sun, 22 Mar 2015 19:31:24 +0200
> Date: Sat, 21 Mar 2015 23:26:19 +0200
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> CC: 20154 <at> debbugs.gnu.org
> 
> On 03/21/2015 10:25 PM, Eli Zaretskii wrote:
> 
> > So each keypress you need to encode the whole buffer, including the
> > last keypress and all those before it?
> 
> Pretty much. Ycmd server uses caching heavily, so it's not bothered by 
> the frequent requests. And when extracting it from the YCM Vim package, 
> the author measured the transport overhead, saw it's negligible, and 
> went with the "send everything" approach. Here's the blog post about it:
> 
> https://plus.google.com/+StrahinjaMarković/posts/Zmr5uf2jCHm
> 
> (He's saying there the json module used is pure Python; this part he's 
> most likely mistaken about).
> 
> > I guess I don't really understand why each keypress should trigger
> > encoding of the whole buffer.
> 
> It's not necessary, just the recommended workflow. The server can take 
> it: 
> https://github.com/company-mode/company-mode/issues/325#issuecomment-83154084, 
> and this way the suggestions reach the user the soonest.

I understand why you _send_ everything, but not why you need to
_encode_ everything.  Why not encode only the new stuff?

> >> (replace-regexp-in-string "x" "z" s1 t t)
> >>
> >> - only takes ~3ms.
> >
> > Then a series of calls to replace-regexp-in-string, one each for every
> > one of the "special" characters, should get you close to your goal,
> > right?
> 
> No no no. There are no "x" characters in s1.

I know.  I meant something like

  (replace-regexp-in-string "\n" "\\n" s1 t t)
  (replace-regexp-in-string "\f" "\\f" s1 t t)

etc.  After all, the list of characters to be encoded is not very
long, is it?

> > So does this mean you have your solution?
> 
> No. An actual buffer has lots of newlines, which need to be encoded. 
> Again, the above is about the speed of the regexp engine.

But when you've encoded them once, you only need to encode the
additions, no?  If you can do this incrementally, the amount of work
for each keystroke will be much smaller, I think.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sun, 22 Mar 2015 17:44:01 GMT) Full text and rfc822 format available.

Message #101 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50;
 json-encode-string is too slow for large strings
Date: Sun, 22 Mar 2015 19:43:04 +0200
> Date: Sun, 22 Mar 2015 18:47:24 +0200
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> 
> To the best of my knowledge, debbugs only sends a copy to the bug's 
> author, and there's no way to subscribe. So that excludes Eli (although 
> he probably subscribes to all bugs anyway).

I subscribe to the list, actually.  But people who send messages
shouldn't rely on that, and not everyone subscribes anyway.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sun, 22 Mar 2015 18:14:01 GMT) Full text and rfc822 format available.

Message #104 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50; json-encode-string is too slow for
 large strings
Date: Sun, 22 Mar 2015 20:13:30 +0200
On 03/22/2015 07:31 PM, Eli Zaretskii wrote:

> I understand why you _send_ everything, but not why you need to
> _encode_ everything.  Why not encode only the new stuff?

That's the protocol. You're welcome to bring the question up with the 
author, but for now, as already described, there has been no need to 
complicate it, because Vim compiled with Python support can encode even 
a large buffer quickly enough.

>>> Then a series of calls to replace-regexp-in-string, one each for every
>>> one of the "special" characters, should get you close to your goal,
>>> right?

Actually, that wouldn't work anyway: aside from the special characters, 
JSON \\u1234 needs to encode any non-ASCII characters. Look at the 
"Fallback: UCS code point" comment.

> I meant something like
>
>    (replace-regexp-in-string "\n" "\\n" s1 t t)
>    (replace-regexp-in-string "\f" "\\f" s1 t t)
>
> etc.  After all, the list of characters to be encoded is not very
> long, is it?

One (replace-regexp-in-string "\n" "\\n" s1 t t) call already takes 
~100ms, which is more than the latest proposed json-encode-string 
implementation takes.

> But when you've encoded them once, you only need to encode the
> additions, no?  If you can do this incrementally, the amount of work
> for each keystroke will be much smaller, I think.

Sure, that's optimizable, with a sufficiently smart server (which ycmd 
currently isn't), and at the cost of some buffer state tracking and 
diffing logic.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sun, 22 Mar 2015 18:23:02 GMT) Full text and rfc822 format available.

Message #107 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Glenn Morris <rgm <at> gnu.org>
To: 20154 <at> debbugs.gnu.org
Cc: Dmitry Gutov <dgutov <at> yandex.ru>
Subject: Re: bug#20154: 25.0.50;
 json-encode-string is too slow for large strings
Date: Sun, 22 Mar 2015 14:22:04 -0400
Ivan Shmakov wrote:

> 	Curiously, per my experience, the practice of Cc:-ing the
> 	subscribers tends to be frowned upon when it comes to the lists
> 	dedicated to free software.  For the reasons I do not know or
> 	understand, Emacs lists are apparently an exception, though.

Both statements are contrary to my own experience.

> 	I don't seem to recall any issues with me trying to stick to the
> 	custom (of /not/ Cc:-ing) I've learned earlier;

Obviously _you_ won't see any issues.

> but if necessary, I surely can remember to Cc: those who so request.

People should not have to "request" this.

AFAIK the policy on GNU mailing lists has always been "reply-to-all, and
do not assume people are subscribed." Works fine for me. Mailman has
optional duplicate suppression, and so does any decent mail client (eg
Gnus).

This is especially true for a bug list, where there should be zero
expectation for any correspondent to be subscribed.

I'm actually slightly surprised that

https://www.gnu.org/prep/maintain/html_node/Mail.html

does not mention this.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sun, 22 Mar 2015 18:27:02 GMT) Full text and rfc822 format available.

Message #110 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50; json-encode-string is too slow for
 large strings
Date: Sun, 22 Mar 2015 20:26:37 +0200
On 03/22/2015 07:31 PM, Eli Zaretskii wrote:

> But when you've encoded them once, you only need to encode the
> additions, no?  If you can do this incrementally, the amount of work
> for each keystroke will be much smaller, I think.

It seems I've misunderstood you here, sorry.

The question of "why encode everything again" comes to down programmer's 
convenience, and not re-implementing parts of the JSON encoder.

At least until `json-encode' has a way to pass an already-encoded string 
verbatim, how else would you encode an alist like

      `(("file_data" .
         ((,full-path . (("contents" . ,file-contents)
                         ("filetypes" . ,file-types)))))
        ("filepath" . ,full-path)
        ("line_num" . ,line-num)
        ("column_num" . ,column-num))

to JSON, except by encoding everything again?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sun, 22 Mar 2015 18:34:01 GMT) Full text and rfc822 format available.

Message #113 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50;
 json-encode-string is too slow for large strings
Date: Sun, 22 Mar 2015 20:32:38 +0200
> Date: Sun, 22 Mar 2015 20:26:37 +0200
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> CC: 20154 <at> debbugs.gnu.org
> 
> The question of "why encode everything again" comes to down programmer's 
> convenience, and not re-implementing parts of the JSON encoder.
> 
> At least until `json-encode' has a way to pass an already-encoded string 
> verbatim, how else would you encode an alist like
> 
>        `(("file_data" .
>           ((,full-path . (("contents" . ,file-contents)
>                           ("filetypes" . ,file-types)))))
>          ("filepath" . ,full-path)
>          ("line_num" . ,line-num)
>          ("column_num" . ,column-num))
> 
> to JSON, except by encoding everything again?

Caveat: I'm probably missing something simple here, so excuse in
advance for asking stupid questions.

You said you need to encode everything on every keystroke, so I was
wondering why you couldn't encode just the new keystroke, and append
the result to what you already encoded earlier.  Then send everything
to the server, as it expects.  The problem is in encoding, not in
sending.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sun, 22 Mar 2015 19:04:01 GMT) Full text and rfc822 format available.

Message #116 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50; json-encode-string is too slow for
 large strings
Date: Sun, 22 Mar 2015 21:03:19 +0200
On 03/22/2015 08:32 PM, Eli Zaretskii wrote:

> You said you need to encode everything on every keystroke, so I was

That was a simplification: "every keystroke" is an approximate 
description of frequency. But anyway, not every keystroke inserts a 
character. Some delete them, some invoke various commands, which may do 
complex things to buffer contents.

> wondering why you couldn't encode just the new keystroke, and append
> the result to what you already encoded earlier.
> Then send everything
> to the server, as it expects.  The problem is in encoding, not in
> sending.

The server expects a certain JSON-encoded structure. You can't just 
append stuff at the end, that wouldn't be valid JSON.

I suppose you could first encode a "skeleton" alist, with some tagged 
values replacing the values that you'd expect to be big strings, and 
then substitute the values in before sending.

I'm sure you can imagine the (admittedly rare) edge cases when this will 
break, and it imposes new constraints on the organization of the code: 
where you could before just pass Emacs structures between functions to 
be encoded at the end just before sending, now you have to worry about 
the stuff above.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sun, 22 Mar 2015 19:16:02 GMT) Full text and rfc822 format available.

Message #119 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Ivan Shmakov <ivan <at> siamics.net>
To: 20154 <at> debbugs.gnu.org, Dmitry Gutov <dgutov <at> yandex.ru>
Subject: Re: bug#20154: 25.0.50;
 json-encode-string is too slow for large strings 
Date: Sun, 22 Mar 2015 19:15:18 +0000
>>>>> Dmitry Gutov <dgutov <at> yandex.ru> writes:
>>>>> On 03/22/2015 06:15 PM, Ivan Shmakov wrote:

[…]

 >>> (push (substring string start l) res)
 >>> (push "\"" res)
 >>> (apply #'concat (nreverse res))))

 >> I guess you can (apply #'concat "\"" (substring …) (nreverse …))
 >> just as well, instead of pushing to the list just before getting rid
 >> of it.

 > Also good idea, but partially.  That gets rid of the initial binding
 > for res', but those (substring ...) value and quote have to go to the
 > end of the string.  We can't put them as the last arguments to
 > `apply'.

	Indeed, I’ve misunderstood the code a bit.

[…]

 >> Curiously, per my experience, the practice of Cc:-ing the
 >> subscribers tends to be frowned upon when it comes to the lists
 >> dedicated to free software.  For the reasons I do not know or
 >> understand, Emacs lists are apparently an exception, though.

 > To the best of my knowledge, debbugs only sends a copy to the bug's
 > author,

	Not at all.

 > and there's no way to subscribe.

	It’s possible to subscribe to bug-gnu-emacs@.  It’s also
	possible to subscribe to nntp://news.gmane.org/gmane.emacs.bugs/
	instead; that way only the messages requested by the user agent
	(assuming one which does support both mail /and/ news) will
	actually be transferred.  No clutter to the mailbox, too.

	Moreover, unless I be mistaken, that’s the limitation of the
	debbugs.gnu.org instance – the software itself /does/ allow for
	such subscription.

 > So that excludes Eli (although he probably subscribes to all bugs
 > anyway).

	Yes.

-- 
FSF associate member #7257  http://boycottsystemd.org/  … 3013 B6A0 230E 334A




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Sun, 22 Mar 2015 22:58:02 GMT) Full text and rfc822 format available.

Message #122 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50; json-encode-string is too slow for
 large strings
Date: Mon, 23 Mar 2015 00:57:19 +0200
On 03/20/2015 06:02 PM, Eli Zaretskii wrote:

> I suggest to rewrite json-encode-char, it does a lot of unnecessary
> stuff, starting with the call to encode-char (which was needed in
> Emacs 22 and before, but no more).

Continuing this line of thought, should we also remove the alias for 
`decode-char', as well as its one use in `json-read-escaped-char'?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Mon, 23 Mar 2015 15:38:02 GMT) Full text and rfc822 format available.

Message #125 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50;
 json-encode-string is too slow for large strings
Date: Mon, 23 Mar 2015 17:37:00 +0200
> Date: Mon, 23 Mar 2015 00:57:19 +0200
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> CC: 20154 <at> debbugs.gnu.org
> 
> On 03/20/2015 06:02 PM, Eli Zaretskii wrote:
> 
> > I suggest to rewrite json-encode-char, it does a lot of unnecessary
> > stuff, starting with the call to encode-char (which was needed in
> > Emacs 22 and before, but no more).
> 
> Continuing this line of thought, should we also remove the alias for 
> `decode-char', as well as its one use in `json-read-escaped-char'?

Yes, I recommend that.




Reply sent to Dmitry Gutov <dgutov <at> yandex.ru>:
You have taken responsibility. (Tue, 07 Apr 2015 13:32:04 GMT) Full text and rfc822 format available.

Notification sent to Dmitry Gutov <dgutov <at> yandex.ru>:
bug acknowledged by developer. (Tue, 07 Apr 2015 13:32:06 GMT) Full text and rfc822 format available.

Message #130 received at 20154-done <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 20154-done <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50; json-encode-string is too slow for
 large strings
Date: Tue, 07 Apr 2015 16:31:47 +0300
On 03/23/2015 05:37 PM, Eli Zaretskii wrote:

> Yes, I recommend that.

Thanks.

It appears there's not much that can be done in Lisp. The current speed 
is much better, although it could still use improvement.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Mon, 20 Apr 2015 22:21:02 GMT) Full text and rfc822 format available.

Message #133 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Ted Zlatanov <tzz <at> lifelogs.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50;
 json-encode-string is too slow for large strings
Date: Mon, 20 Apr 2015 18:20:32 -0400
On Sat, 21 Mar 2015 00:02:51 +0200 Dmitry Gutov <dgutov <at> yandex.ru> wrote: 

DG> Maybe implementing `json-encode-string` itself in C isn't strictly
DG> necessary, or even particularly advantageous.

Absolutely, implementing it ourselves is dumb. Because it's been *done
already* many times, in libjson in particular.

DG> How about trying to optimize `replace-match' or
DG> `replace-regexp-in-string' (which are the main two approaches we can
DG> use to implement `json-encode-string') for the case of large input?

I think all the time and effort spent on that would be better spent on
the FFI work, which would also enable libyaml integration and many other
improvements.

Ted




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Mon, 20 Apr 2015 22:43:02 GMT) Full text and rfc822 format available.

Message #136 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Ted Zlatanov <tzz <at> lifelogs.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50; json-encode-string is too slow for
 large strings
Date: Tue, 21 Apr 2015 01:41:54 +0300
On 04/21/2015 01:20 AM, Ted Zlatanov wrote:

> Absolutely, implementing it ourselves is dumb. Because it's been *done
> already* many times, in libjson in particular.

Is libjson available on all our target platforms?

> I think all the time and effort spent on that would be better spent on
> the FFI work, which would also enable libyaml integration and many other
> improvements.

A faster `replace-regexp-in-string' could also mean a slight speed 
improvement in many different places. And that project would be like two 
orders of magnitude simpler than FFI.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20154; Package emacs. (Mon, 20 Apr 2015 23:12:02 GMT) Full text and rfc822 format available.

Message #139 received at 20154 <at> debbugs.gnu.org (full text, mbox):

From: Ted Zlatanov <tzz <at> lifelogs.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 20154 <at> debbugs.gnu.org
Subject: Re: bug#20154: 25.0.50;
 json-encode-string is too slow for large strings
Date: Mon, 20 Apr 2015 19:11:05 -0400
On Tue, 21 Apr 2015 01:41:54 +0300 Dmitry Gutov <dgutov <at> yandex.ru> wrote: 

DG> On 04/21/2015 01:20 AM, Ted Zlatanov wrote:
>> Absolutely, implementing it ourselves is dumb. Because it's been *done
>> already* many times, in libjson in particular.

DG> Is libjson available on all our target platforms?

I think it's available on the free ones, but I'm not positive about
*all* of them. It's under the GPL, in any case, and I think in 2015 we
can safely assume any platform of strategic interest to the GNU project
will have it.  Do you know of any counterexamples?

Besides the relatively large performance gains from libjson, please
don't disregard the significant work already sunk into making sure it is
bug-free.

>> I think all the time and effort spent on that would be better spent on
>> the FFI work, which would also enable libyaml integration and many other
>> improvements.

DG> A faster `replace-regexp-in-string' could also mean a slight speed
DG> improvement in many different places. And that project would be like
DG> two orders of magnitude simpler than FFI.

I didn't say it wouldn't be simpler to do things as this thread proposes.

I am also not saying improvements in `replace-regexp-in-string' are a
waste of time. I am specifically talking about making them for the sake
of speeding up the encoding of JSON data.

Ted




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 19 May 2015 11:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 8 years and 355 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.