GNU bug report logs - #46881
28.0.50; pdumper dumping causes way too many syscalls

Previous Next

Package: emacs;

Reported by: Pip Cet <pipcet <at> gmail.com>

Date: Tue, 2 Mar 2021 20:35:01 UTC

Severity: normal

Found in version 28.0.50

Done: Mattias Engdegård <mattiase <at> acm.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 46881 in the body.
You can then email your comments to 46881 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Tue, 02 Mar 2021 20:35:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Pip Cet <pipcet <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Tue, 02 Mar 2021 20:35:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 28.0.50; pdumper dumping causes way too many syscalls
Date: Tue, 2 Mar 2021 20:33:42 +0000
Playing around with the WebAssembly port, I noticed that pdumper, in
creating the dump file, makes way too many syscalls: it uses
emacs_write(), not fwrite(), so these calls translate to actual
syscalls and context switches. On immature systems (or in special
circumstances like a device mounted synchronously), they might
actually cause a hardware write for each syscall, which would wear out
flash quickly and be generally wasteful.

I've looked into the problem, and it seems easy to solve and worth it
in terms of debuggability and performance.

Patch will be attached once this has a bug number.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Tue, 02 Mar 2021 20:46:02 GMT) Full text and rfc822 format available.

Message #8 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: 46881 <at> debbugs.gnu.org
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Tue, 2 Mar 2021 20:45:04 +0000
[Message part 1 (text/plain, inline)]
On Tue, Mar 2, 2021 at 8:35 PM Pip Cet <pipcet <at> gmail.com> wrote:
> I've looked into the problem, and it seems easy to solve and worth it
> in terms of debuggability and performance.

Very rough benchmarks, but this seems to be clearly worth it:

Performance:
With patch:
real    0m3.861s
user    0m3.776s
sys    0m0.085s

Without patch:
real    0m7.001s
user    0m4.476s
sys    0m2.511s

Number of syscalls:
With patch: 415442
Without patch: 2028307

> Patch will be attached once this has a bug number.

And here's the patch. Testing would be very appreciated.

I'm unsure about the precise usage of dump_off vs ptrdiff_t here; I
don't think it matters, but suggestions, nitpicks, and comments, on
this or any other aspect, would be very appreciated.

Pip
[0001-Prepare-pdumper-dump-file-in-memory-write-it-in-one-.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Tue, 02 Mar 2021 21:09:01 GMT) Full text and rfc822 format available.

Message #11 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Alan Third <alan <at> idiocy.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 46881 <at> debbugs.gnu.org
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Tue, 2 Mar 2021 21:07:52 +0000
On Tue, Mar 02, 2021 at 08:45:04PM +0000, Pip Cet wrote:
> On Tue, Mar 2, 2021 at 8:35 PM Pip Cet <pipcet <at> gmail.com> wrote:
> > I've looked into the problem, and it seems easy to solve and worth it
> > in terms of debuggability and performance.
> 
> Very rough benchmarks, but this seems to be clearly worth it:
> 
> Performance:
> With patch:
> real    0m3.861s
> user    0m3.776s
> sys    0m0.085s
> 
> Without patch:
> real    0m7.001s
> user    0m4.476s
> sys    0m2.511s
> 
> Number of syscalls:
> With patch: 415442
> Without patch: 2028307

My quick test on macOS by doing:

rm src/*.pdmp
time make

sees it going from ~26s without patch to ~10s with patch, so a
considerable improvement.

> > Patch will be attached once this has a bug number.
> 
> And here's the patch. Testing would be very appreciated.

It appears to work fine here, but I don't know if there's anything
specific to test other than just running Emacs.
-- 
Alan Third




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Wed, 03 Mar 2021 05:52:02 GMT) Full text and rfc822 format available.

Message #14 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>, Daniel Colascione <dancol <at> dancol.org>,
 Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 46881 <at> debbugs.gnu.org
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Wed, 03 Mar 2021 07:51:05 +0200
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Tue, 2 Mar 2021 20:45:04 +0000
> 
> On Tue, Mar 2, 2021 at 8:35 PM Pip Cet <pipcet <at> gmail.com> wrote:
> > I've looked into the problem, and it seems easy to solve and worth it
> > in terms of debuggability and performance.
> 
> Very rough benchmarks, but this seems to be clearly worth it:
> 
> Performance:
> With patch:
> real    0m3.861s
> user    0m3.776s
> sys    0m0.085s
> 
> Without patch:
> real    0m7.001s
> user    0m4.476s
> sys    0m2.511s
> 
> Number of syscalls:
> With patch: 415442
> Without patch: 2028307
> 
> > Patch will be attached once this has a bug number.
> 
> And here's the patch. Testing would be very appreciated.
> 
> I'm unsure about the precise usage of dump_off vs ptrdiff_t here; I
> don't think it matters, but suggestions, nitpicks, and comments, on
> this or any other aspect, would be very appreciated.

> From 92ee138852b34ede2f43dd7f93f310fc746bb3bf Mon Sep 17 00:00:00 2001
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Tue, 2 Mar 2021 20:38:23 +0000
> Subject: [PATCH] Prepare pdumper dump file in memory, write it in one go
>  (Bug#46881)
> 
> * src/pdumper.c (struct dump_context): Add buf, buf_size, max_offset fields.
> (grow_buffer): New function.
> (dump_write): Use memcpy, not an actual emacs_write.
> (dump_seek): Keep track of maximum seen offset.
> (Fdump_emacs_portable): Write out the file contents when done.
> ---
>  src/pdumper.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/src/pdumper.c b/src/pdumper.c
> index 337742fda4ade..62ddad8ee5e34 100644
> --- a/src/pdumper.c
> +++ b/src/pdumper.c
> @@ -473,6 +473,10 @@ dump_fingerprint (char const *label,
>  {
>    /* Header we'll write to the dump file when done.  */
>    struct dump_header header;
> +  /* Data that will be written to the dump file.  */
> +  void *buf;
> +  ptrdiff_t buf_size;
> +  ptrdiff_t max_offset;
>  
>    Lisp_Object old_purify_flag;
>    Lisp_Object old_post_gc_hook;
> @@ -581,6 +585,13 @@ dump_fingerprint (char const *label,
>  
>  /* Dump file creation */
>  
> +static void dump_grow_buffer (struct dump_context *ctx)
> +{
> +  ctx->buf = xrealloc (ctx->buf, ctx->buf_size = (ctx->buf_size ?
> +						  (ctx->buf_size * 2)
> +						  : 1024 * 1024));
> +}
> +
>  static dump_off dump_object (struct dump_context *ctx, Lisp_Object object);
>  static dump_off dump_object_for_offset (struct dump_context *ctx,
>  					Lisp_Object object);
> @@ -747,8 +758,9 @@ dump_write (struct dump_context *ctx, const void *buf, dump_off nbyte)
>    eassert (nbyte == 0 || buf != NULL);
>    eassert (ctx->obj_offset == 0);
>    eassert (ctx->flags.dump_object_contents);
> -  if (emacs_write (ctx->fd, buf, nbyte) < nbyte)
> -    report_file_error ("Could not write to dump file", ctx->dump_filename);
> +  while (ctx->offset + nbyte > ctx->buf_size)
> +    dump_grow_buffer (ctx);
> +  memcpy ((char *)ctx->buf + ctx->offset, buf, nbyte);
>    ctx->offset += nbyte;
>  }
>  
> @@ -828,6 +840,8 @@ dump_tailq_pop (struct dump_tailq *tailq)
>  static void
>  dump_seek (struct dump_context *ctx, dump_off offset)
>  {
> +  if (ctx->max_offset < ctx->offset)
> +    ctx->max_offset = ctx->offset;
>    eassert (ctx->obj_offset == 0);
>    if (lseek (ctx->fd, offset, SEEK_SET) < 0)
>      report_file_error ("Setting file position",
> @@ -4159,6 +4173,8 @@ DEFUN ("dump-emacs-portable",
>    ctx->header.magic[0] = dump_magic[0];
>    dump_seek (ctx, 0);
>    dump_write (ctx, &ctx->header, sizeof (ctx->header));
> +  if (emacs_write (ctx->fd, ctx->buf, ctx->max_offset) < ctx->max_offset)
> +    report_file_error ("Could not write to dump file", ctx->dump_filename);
>  
>    dump_off
>      header_bytes = header_end - header_start,
> -- 
> 2.30.1

Thanks.

Daniel, Paul: any comments?  In particular, is it safe to allocate
large amounts of memory off the heap while dumping?  A couple of
places in pdumper.c says some parts of code should call malloc.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Wed, 03 Mar 2021 07:12:01 GMT) Full text and rfc822 format available.

Message #17 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Alan Third <alan <at> idiocy.org>, Pip Cet <pipcet <at> gmail.com>
Cc: 46881 <at> debbugs.gnu.org
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Wed, 3 Mar 2021 07:10:28 +0000
> My quick test on macOS by doing:
>
> rm src/*.pdmp
> time make
>
> sees it going from ~26s without patch to ~10s with patch, so a
> considerable improvement.

Thanks for testing!

> > > Patch will be attached once this has a bug number.
> >
> > And here's the patch. Testing would be very appreciated.
>
> It appears to work fine here, but I don't know if there's anything
> specific to test other than just running Emacs.

I suspect there may be problems on systems with very little memory. Do
you have an easy way to determine maximum resident size (on Debian
GNU/Linux, /bin/time works)? It would be interesting to see if that's
actually different.

Thanks again!
Pip




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Wed, 03 Mar 2021 07:37:02 GMT) Full text and rfc822 format available.

Message #20 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 46881 <at> debbugs.gnu.org, Daniel Colascione <dancol <at> dancol.org>,
 Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Wed, 3 Mar 2021 07:35:45 +0000
[Message part 1 (text/plain, inline)]
On Wed, Mar 3, 2021 at 5:51 AM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Tue, 2 Mar 2021 20:45:04 +0000
> >
> > On Tue, Mar 2, 2021 at 8:35 PM Pip Cet <pipcet <at> gmail.com> wrote:
> > > I've looked into the problem, and it seems easy to solve and worth it
> > > in terms of debuggability and performance.

Since debuggability is such a concern, we probably shouldn't leak the
buffer memory. Revised patch attached. (This patch also removes the
lseek() syscalls; while not quite as numerous as the read() ones,
those did clutter up straces here).

> In particular, is it safe to allocate
> large amounts of memory off the heap while dumping?

Even if it isn't, we'd still be faster re-running the dump after
growing the dumper image than the current approach is.

>A couple of
> places in pdumper.c says some parts of code should call malloc.

IIUC, the prohibition on calling malloc, if it is still a concern,
applies only when loading the dump, not while writing it.

My main concern is the possibility of a partly-written dump file,
since we no longer turn "!UMPEDGNUEMACS" into "DUMPEDGNUEMACS" after
the dump. Maybe it would make sense to restore that feature?

Pip
[0001-Prepare-pdumper-dump-file-in-memory-write-it-in-one-.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Wed, 03 Mar 2021 15:10:02 GMT) Full text and rfc822 format available.

Message #23 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 46881 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>,
 Daniel Colascione <dancol <at> dancol.org>, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Wed, 03 Mar 2021 16:09:15 +0100
Pip Cet <pipcet <at> gmail.com> writes:

> Since debuggability is such a concern, we probably shouldn't leak the
> buffer memory. Revised patch attached. (This patch also removes the
> lseek() syscalls; while not quite as numerous as the read() ones,
> those did clutter up straces here).

I've tried the patch on a couple of systems here, and the resulting
Emacs works fine (as expected), and the pdumping is significantly faster
here, too.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Wed, 03 Mar 2021 19:36:02 GMT) Full text and rfc822 format available.

Message #26 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pip Cet <pipcet <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>
Cc: 46881 <at> debbugs.gnu.org, Daniel Colascione <dancol <at> dancol.org>
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Wed, 3 Mar 2021 11:35:31 -0800
On 3/2/21 11:35 PM, Pip Cet wrote:
> IIUC, the prohibition on calling malloc, if it is still a concern,
> applies only when loading the dump, not while writing it.

That's my understanding as well.

> My main concern is the possibility of a partly-written dump file,
> since we no longer turn "!UMPEDGNUEMACS" into "DUMPEDGNUEMACS" after
> the dump. Maybe it would make sense to restore that feature?

Wouldn't hurt, though I'd make it low priority.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Wed, 03 Mar 2021 19:59:02 GMT) Full text and rfc822 format available.

Message #29 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Alan Third <alan <at> idiocy.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 46881 <at> debbugs.gnu.org
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Wed, 3 Mar 2021 19:57:59 +0000
On Wed, Mar 03, 2021 at 07:10:28AM +0000, Pip Cet wrote:
> > My quick test on macOS by doing:
> >
> > rm src/*.pdmp
> > time make
> >
> > sees it going from ~26s without patch to ~10s with patch, so a
> > considerable improvement.
> 
> Thanks for testing!
> 
> > > > Patch will be attached once this has a bug number.
> > >
> > > And here's the patch. Testing would be very appreciated.
> >
> > It appears to work fine here, but I don't know if there's anything
> > specific to test other than just running Emacs.
> 
> I suspect there may be problems on systems with very little memory. Do
> you have an easy way to determine maximum resident size (on Debian
> GNU/Linux, /bin/time works)? It would be interesting to see if that's
> actually different.

I tried using time -l the same as above and both came  out with
roughly the same values, but I suspect that's probably just me getting
the values for running make.

Is there a better way of testing dumping?
-- 
Alan Third




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Thu, 04 Mar 2021 07:27:01 GMT) Full text and rfc822 format available.

Message #32 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Alan Third <alan <at> idiocy.org>, Pip Cet <pipcet <at> gmail.com>,
 46881 <at> debbugs.gnu.org
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Thu, 4 Mar 2021 07:25:13 +0000
On Wed, Mar 3, 2021 at 7:58 PM Alan Third <alan <at> idiocy.org> wrote:
> On Wed, Mar 03, 2021 at 07:10:28AM +0000, Pip Cet wrote:
> > > My quick test on macOS by doing:
> > >
> > > rm src/*.pdmp
> > > time make
> > >
> > > sees it going from ~26s without patch to ~10s with patch, so a
> > > considerable improvement.
> >
> > Thanks for testing!
> >
> > > > > Patch will be attached once this has a bug number.
> > > >
> > > > And here's the patch. Testing would be very appreciated.
> > >
> > > It appears to work fine here, but I don't know if there's anything
> > > specific to test other than just running Emacs.
> >
> > I suspect there may be problems on systems with very little memory. Do
> > you have an easy way to determine maximum resident size (on Debian
> > GNU/Linux, /bin/time works)? It would be interesting to see if that's
> > actually different.
>
> I tried using time -l the same as above and both came  out with
> roughly the same values, but I suspect that's probably just me getting
> the values for running make.

Thanks.

> Is there a better way of testing dumping?

I guess you could run "time  ./temacs --batch  -l loadup
--temacs=pbootstrap" directly (in src/)...

I realize that's not an answer to your question, but IME, dumping bugs
will often lead to immediate failures to bootstrap, whereas GC bugs
are tricky and don't. Since this bug doesn't affect GC in any way, I
think we've done enough initial testing.

Pip




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Thu, 04 Mar 2021 22:27:02 GMT) Full text and rfc822 format available.

Message #35 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Colascione <dancol <at> dancol.org>
To: Eli Zaretskii <eliz <at> gnu.org>, Pip Cet <pipcet <at> gmail.com>,
 Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 46881 <at> debbugs.gnu.org
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Thu, 4 Mar 2021 17:26:32 -0500
On 3/3/21 12:51 AM, Eli Zaretskii wrote:

>> From: Pip Cet <pipcet <at> gmail.com>
>> Date: Tue, 2 Mar 2021 20:45:04 +0000
>>
>> On Tue, Mar 2, 2021 at 8:35 PM Pip Cet <pipcet <at> gmail.com> wrote:
>>> I've looked into the problem, and it seems easy to solve and worth it
>>> in terms of debuggability and performance.
>> Very rough benchmarks, but this seems to be clearly worth it:
>>
>> Performance:
>> With patch:
>> real    0m3.861s
>> user    0m3.776s
>> sys    0m0.085s
>>
>> Without patch:
>> real    0m7.001s
>> user    0m4.476s
>> sys    0m2.511s
>>
>> Number of syscalls:
>> With patch: 415442
>> Without patch: 2028307
>>
>>> Patch will be attached once this has a bug number.
>> And here's the patch. Testing would be very appreciated.
>>
>> I'm unsure about the precise usage of dump_off vs ptrdiff_t here; I
>> don't think it matters, but suggestions, nitpicks, and comments, on
>> this or any other aspect, would be very appreciated.
>>  From 92ee138852b34ede2f43dd7f93f310fc746bb3bf Mon Sep 17 00:00:00 2001
>> From: Pip Cet <pipcet <at> gmail.com>
>> Date: Tue, 2 Mar 2021 20:38:23 +0000
>> Subject: [PATCH] Prepare pdumper dump file in memory, write it in one go
>>   (Bug#46881)
>>
>> * src/pdumper.c (struct dump_context): Add buf, buf_size, max_offset fields.
>> (grow_buffer): New function.
>> (dump_write): Use memcpy, not an actual emacs_write.
>> (dump_seek): Keep track of maximum seen offset.
>> (Fdump_emacs_portable): Write out the file contents when done.
>> ---
>>   src/pdumper.c | 20 ++++++++++++++++++--
>>   1 file changed, 18 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/pdumper.c b/src/pdumper.c
>> index 337742fda4ade..62ddad8ee5e34 100644
>> --- a/src/pdumper.c
>> +++ b/src/pdumper.c
>> @@ -473,6 +473,10 @@ dump_fingerprint (char const *label,
>>   {
>>     /* Header we'll write to the dump file when done.  */
>>     struct dump_header header;
>> +  /* Data that will be written to the dump file.  */
>> +  void *buf;
>> +  ptrdiff_t buf_size;
>> +  ptrdiff_t max_offset;
>>   
>>     Lisp_Object old_purify_flag;
>>     Lisp_Object old_post_gc_hook;
>> @@ -581,6 +585,13 @@ dump_fingerprint (char const *label,
>>   
>>   /* Dump file creation */
>>   
>> +static void dump_grow_buffer (struct dump_context *ctx)
>> +{
>> +  ctx->buf = xrealloc (ctx->buf, ctx->buf_size = (ctx->buf_size ?
>> +						  (ctx->buf_size * 2)
>> +						  : 1024 * 1024));
>> +}
>> +
>>   static dump_off dump_object (struct dump_context *ctx, Lisp_Object object);
>>   static dump_off dump_object_for_offset (struct dump_context *ctx,
>>   					Lisp_Object object);
>> @@ -747,8 +758,9 @@ dump_write (struct dump_context *ctx, const void *buf, dump_off nbyte)
>>     eassert (nbyte == 0 || buf != NULL);
>>     eassert (ctx->obj_offset == 0);
>>     eassert (ctx->flags.dump_object_contents);
>> -  if (emacs_write (ctx->fd, buf, nbyte) < nbyte)
>> -    report_file_error ("Could not write to dump file", ctx->dump_filename);
>> +  while (ctx->offset + nbyte > ctx->buf_size)
>> +    dump_grow_buffer (ctx);
>> +  memcpy ((char *)ctx->buf + ctx->offset, buf, nbyte);
>>     ctx->offset += nbyte;
>>   }
>>   
>> @@ -828,6 +840,8 @@ dump_tailq_pop (struct dump_tailq *tailq)
>>   static void
>>   dump_seek (struct dump_context *ctx, dump_off offset)
>>   {
>> +  if (ctx->max_offset < ctx->offset)
>> +    ctx->max_offset = ctx->offset;
>>     eassert (ctx->obj_offset == 0);
>>     if (lseek (ctx->fd, offset, SEEK_SET) < 0)
>>       report_file_error ("Setting file position",
>> @@ -4159,6 +4173,8 @@ DEFUN ("dump-emacs-portable",
>>     ctx->header.magic[0] = dump_magic[0];
>>     dump_seek (ctx, 0);
>>     dump_write (ctx, &ctx->header, sizeof (ctx->header));
>> +  if (emacs_write (ctx->fd, ctx->buf, ctx->max_offset) < ctx->max_offset)
>> +    report_file_error ("Could not write to dump file", ctx->dump_filename);
>>   
>>     dump_off
>>       header_bytes = header_end - header_start,
>> -- 
>> 2.30.1
> Thanks.
>
> Daniel, Paul: any comments?  In particular, is it safe to allocate
> large amounts of memory off the heap while dumping?  A couple of
> places in pdumper.c says some parts of code should call malloc.

It looks fine, but wouldn't dumping to a FILE* (with internal buffering) 
do the same basic thing in a simpler way? There aren't any particular 
constraints on the environment _during_ the dump: we even make new lisp 
objects. It's when loading the dump, early in initialization, that you 
have to be careful.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Fri, 05 Mar 2021 02:31:01 GMT) Full text and rfc822 format available.

Message #38 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Daniel Colascione <dancol <at> dancol.org>
Cc: 46881 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>,
 Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Fri, 5 Mar 2021 02:30:13 +0000
On Thu, Mar 4, 2021 at 10:26 PM Daniel Colascione <dancol <at> dancol.org> wrote:
> On 3/3/21 12:51 AM, Eli Zaretskii wrote:
> > Daniel, Paul: any comments?  In particular, is it safe to allocate
> > large amounts of memory off the heap while dumping?  A couple of
> > places in pdumper.c says some parts of code should call malloc.
>
> It looks fine, but wouldn't dumping to a FILE* (with internal buffering)
> do the same basic thing in a simpler way?

I initially set out to do that, but decided against it. We don't just
write sequentially (when FILE I/O helps, a little), we also have the
seek-and-fixup phase, and it didn't seem any simpler at that point..

> There aren't any particular
> constraints on the environment _during_ the dump: we even make new lisp
> objects. It's when loading the dump, early in initialization, that you
> have to be careful.

Thanks!

Pip




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Fri, 05 Mar 2021 07:20:01 GMT) Full text and rfc822 format available.

Message #41 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 46881 <at> debbugs.gnu.org, dancol <at> dancol.org, eggert <at> cs.ucla.edu
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Fri, 05 Mar 2021 09:19:00 +0200
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Fri, 5 Mar 2021 02:30:13 +0000
> Cc: Eli Zaretskii <eliz <at> gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>, 46881 <at> debbugs.gnu.org
> 
> > It looks fine, but wouldn't dumping to a FILE* (with internal buffering)
> > do the same basic thing in a simpler way?
> 
> I initially set out to do that, but decided against it. We don't just
> write sequentially (when FILE I/O helps, a little), we also have the
> seek-and-fixup phase, and it didn't seem any simpler at that point..

I'm not sure I understand: what's wrong with fseek?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Fri, 05 Mar 2021 07:40:01 GMT) Full text and rfc822 format available.

Message #44 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 46881 <at> debbugs.gnu.org, Daniel Colascione <dancol <at> dancol.org>,
 eggert <at> cs.ucla.edu
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Fri, 5 Mar 2021 07:38:27 +0000
On Fri, Mar 5, 2021 at 7:19 AM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Fri, 5 Mar 2021 02:30:13 +0000
> > Cc: Eli Zaretskii <eliz <at> gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>, 46881 <at> debbugs.gnu.org
> >
> > > It looks fine, but wouldn't dumping to a FILE* (with internal buffering)
> > > do the same basic thing in a simpler way?
> >
> > I initially set out to do that, but decided against it. We don't just
> > write sequentially (when FILE I/O helps, a little), we also have the
> > seek-and-fixup phase, and it didn't seem any simpler at that point..
>
> I'm not sure I understand: what's wrong with fseek?

Nothing, assuming you're fine with the current performance. Many libcs
aren't going to be smart enough to avoid I/O when you fseek through a
"large" file and write a word here and there, and my suspicion is that
would include glibc.

Also, we're not currently using fseek-and-write anywhere in Emacs.

We're talking about a file which Emacs is going to have to keep in
memory anyway, when reading the dump. The only case in which there
might be a problem is if the build machine has significantly less
available memory than the machine we intend to run on, and I just
don't think that's going to happen.

Pip




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Fri, 05 Mar 2021 07:56:01 GMT) Full text and rfc822 format available.

Message #47 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 46881 <at> debbugs.gnu.org, dancol <at> dancol.org, eggert <at> cs.ucla.edu
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Fri, 05 Mar 2021 09:54:41 +0200
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Fri, 5 Mar 2021 07:38:27 +0000
> Cc: Daniel Colascione <dancol <at> dancol.org>, eggert <at> cs.ucla.edu, 46881 <at> debbugs.gnu.org
> 
> > I'm not sure I understand: what's wrong with fseek?
> 
> Nothing, assuming you're fine with the current performance. Many libcs
> aren't going to be smart enough to avoid I/O when you fseek through a
> "large" file and write a word here and there, and my suspicion is that
> would include glibc.

Could we benchmark the two implementations instead of acting on
suspicions?

In general, I'd prefer not to reinvent the wheel, and trust modern
libc's that they are efficient enough in handling buffered streams,
unless we have hard evidence to the contrary.  If nothing else, it
would prevent people asking, like Daniel did, why didn't we use stdio
in the first place.

> Also, we're not currently using fseek-and-write anywhere in Emacs.

I don't see why this would be important.  Since we open the file in
binary mode, fseek should work correctly even on non-Posix systems.
Am I missing something?

> We're talking about a file which Emacs is going to have to keep in
> memory anyway, when reading the dump. The only case in which there
> might be a problem is if the build machine has significantly less
> available memory than the machine we intend to run on, and I just
> don't think that's going to happen.

You are thinking about memory consumption, while I am thinking how to
avoid implementing our own private buffered streams.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Fri, 05 Mar 2021 09:36:01 GMT) Full text and rfc822 format available.

Message #50 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 46881 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>, eggert <at> cs.ucla.edu
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Fri, 05 Mar 2021 10:35:26 +0100
On Mär 05 2021, Pip Cet wrote:

> We're talking about a file which Emacs is going to have to keep in
> memory anyway, when reading the dump.

While reading the dump, you only have the data once, and you don't have
to realloc the whole data.

Andreas.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Fri, 05 Mar 2021 09:43:02 GMT) Full text and rfc822 format available.

Message #53 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Andreas Schwab <schwab <at> linux-m68k.org>
Cc: 46881 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>, eggert <at> cs.ucla.edu
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Fri, 5 Mar 2021 09:41:47 +0000
On Fri, Mar 5, 2021 at 9:35 AM Andreas Schwab <schwab <at> linux-m68k.org> wrote:
> On Mär 05 2021, Pip Cet wrote:
>
> > We're talking about a file which Emacs is going to have to keep in
> > memory anyway, when reading the dump.
>
> While reading the dump, you only have the data once, and you don't have
> to realloc the whole data.

Correct. I think a build memory usage of 28 MB for a 10 MB dump file
is something we can live with...

Pip




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Fri, 05 Mar 2021 09:56:02 GMT) Full text and rfc822 format available.

Message #56 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 46881 <at> debbugs.gnu.org, Daniel Colascione <dancol <at> dancol.org>,
 eggert <at> cs.ucla.edu
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Fri, 5 Mar 2021 09:54:32 +0000
On Fri, Mar 5, 2021 at 7:55 AM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Fri, 5 Mar 2021 07:38:27 +0000
> > Cc: Daniel Colascione <dancol <at> dancol.org>, eggert <at> cs.ucla.edu, 46881 <at> debbugs.gnu.org
> >
> > > I'm not sure I understand: what's wrong with fseek?
> >
> > Nothing, assuming you're fine with the current performance. Many libcs
> > aren't going to be smart enough to avoid I/O when you fseek through a
> > "large" file and write a word here and there, and my suspicion is that
> > would include glibc.
>
> Could we benchmark the two implementations instead of acting on
> suspicions?

Sure.

My patch:

real    0m1.988s
user    0m1.916s
sys    0m0.073s

fwrite-based patch:

real    0m3.576s
user    0m2.571s
sys    0m1.006s

This is as I expected: glibc just isn't doing a very good job for this
buffered stream.

> In general, I'd prefer not to reinvent the wheel, and trust modern
> libc's that they are efficient enough in handling buffered streams,
> unless we have hard evidence to the contrary.

We do, now.

> If nothing else, it
> would prevent people asking, like Daniel did, why didn't we use stdio
> in the first place.

I think it's a very good question (in fact, the brach I'm working on
is called pdumper-fwrite because I decided only after creating it that
all the seeking would hurt performance too much). I'll try including a
comment explaining why.

> > Also, we're not currently using fseek-and-write anywhere in Emacs.
>
> I don't see why this would be important.

Because the stream returned by emacs_fopen might not be generally seekable?

> Since we open the file in
> binary mode, fseek should work correctly even on non-Posix systems.

I guess I should have used emacs_fopen :-)

> > We're talking about a file which Emacs is going to have to keep in
> > memory anyway, when reading the dump. The only case in which there
> > might be a problem is if the build machine has significantly less
> > available memory than the machine we intend to run on, and I just
> > don't think that's going to happen.
>
> You are thinking about memory consumption, while I am thinking how to
> avoid implementing our own private buffered streams.

By preparing the data in memory and writing it in one go, which
doesn't require any of the major complications of implementing
buffered streams.

Pip




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Fri, 05 Mar 2021 10:24:01 GMT) Full text and rfc822 format available.

Message #59 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Andrea Corallo <akrl <at> sdf.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 46881 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>, eggert <at> cs.ucla.edu
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Fri, 05 Mar 2021 10:23:45 +0000
Pip Cet <pipcet <at> gmail.com> writes:

[...]

> By preparing the data in memory and writing it in one go, which
> doesn't require any of the major complications of implementing
> buffered streams.

Preparing data in memory might also be seen as a small step in the
direction of having pdumper as a generic de/serializer.  IMO would be
helpful for a number of tasks (native compiler included).

  Andrea




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Fri, 05 Mar 2021 12:08:01 GMT) Full text and rfc822 format available.

Message #62 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 46881 <at> debbugs.gnu.org, dancol <at> dancol.org, eggert <at> cs.ucla.edu
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Fri, 05 Mar 2021 14:06:55 +0200
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Fri, 5 Mar 2021 09:54:32 +0000
> Cc: Daniel Colascione <dancol <at> dancol.org>, eggert <at> cs.ucla.edu, 46881 <at> debbugs.gnu.org
> 
> My patch:
> 
> real    0m1.988s
> user    0m1.916s
> sys    0m0.073s
> 
> fwrite-based patch:
> 
> real    0m3.576s
> user    0m2.571s
> sys    0m1.006s

30% slowdown and 1.5 sec absolute time difference doesn't sound bad
enough to me to justify a homemade solution.  I say let's go with
stdio.

> > > Also, we're not currently using fseek-and-write anywhere in Emacs.
> >
> > I don't see why this would be important.
> 
> Because the stream returned by emacs_fopen might not be generally seekable?

I don't see how that could happen.

> > Since we open the file in
> > binary mode, fseek should work correctly even on non-Posix systems.
> 
> I guess I should have used emacs_fopen :-)

Yes, of course.  Especially as with fopen there are problems with
non-ASCII file names on MS-Windows.

> By preparing the data in memory and writing it in one go, which
> doesn't require any of the major complications of implementing
> buffered streams.

There are no complications I can see, not in our sources.  (And you
don't actually write it in one go anyway, see emacs_full_write.)

So let's go with the stdio solution, please.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Fri, 05 Mar 2021 12:50:02 GMT) Full text and rfc822 format available.

Message #65 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 46881 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, Pip Cet <pipcet <at> gmail.com>
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Fri, 05 Mar 2021 13:49:42 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

> 30% slowdown and 1.5 sec absolute time difference doesn't sound bad
> enough to me to justify a homemade solution.  I say let's go with
> stdio.

Seems significant to me -- we're building Emacs a lot, and this bit
can't be parallelised.  And the savings in electricity alone should make
us go for the most efficient solution.

There doesn't seem to be any significant drawbacks to doing it the
efficient way, either.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Fri, 05 Mar 2021 13:17:02 GMT) Full text and rfc822 format available.

Message #68 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 46881 <at> debbugs.gnu.org, Daniel Colascione <dancol <at> dancol.org>,
 eggert <at> cs.ucla.edu
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Fri, 5 Mar 2021 13:16:14 +0000
On Fri, Mar 5, 2021 at 12:07 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Fri, 5 Mar 2021 09:54:32 +0000
> > Cc: Daniel Colascione <dancol <at> dancol.org>, eggert <at> cs.ucla.edu, 46881 <at> debbugs.gnu.org
> >
> > My patch:
> >
> > real    0m1.988s
> > user    0m1.916s
> > sys    0m0.073s
> >
> > fwrite-based patch:
> >
> > real    0m3.576s
> > user    0m2.571s
> > sys    0m1.006s
>
> 30% slowdown and 1.5 sec absolute time difference doesn't sound bad
> enough to me to

It's a 30% slowdown of the entire dump process, including the
CPU-intensive part which loads Emacs. I think you get a better idea of
the performance difference from the "sys" numbers above.

And the absolute time difference is more than that, because Emacs is
dumped twice during each build; the first dump file is about 2.5 times
the size of the ultimate dump file, so my guess (as I said before,
unfortunately Intel decided to make this system not have a predictable
CPU clock, so I can't really run good benchmarks) is we're talking
about 4.5 seconds here.

> justify a homemade solution.

"Create a buffer in memory and do all the IO at once" is such an old
solution that even the GNU Coding Standards explicitly recommend it
(albeit for input files):

You could keep the entire input file in memory and scan it there
instead of using stdio

>I say let's go with stdio.

Maybe setbuffer(3) could help us here? I could run some benchmarks for
that if the idea isn't out of the question.

> > > > Also, we're not currently using fseek-and-write anywhere in Emacs.
> > >
> > > I don't see why this would be important.
> >
> > Because the stream returned by emacs_fopen might not be generally seekable?
>
> I don't see how that could happen.

It has, to me, but I'm willing to accept I did some inadvisable things first.

> > By preparing the data in memory and writing it in one go, which
> > doesn't require any of the major complications of implementing
> > buffered streams.
>
> There are no complications I can see, not in our sources.  (And you
> don't actually write it in one go anyway, see emacs_full_write.)

Er, precisely. I was the one saying there are no complications, so we
shouldn't let the idea of "implementing our own buffered streams"
scare us, because that is a complicated project but it's also not what
we are doing.

> So let's go with the stdio solution, please.

Should I add a sync after every seek to make absolutely certain,
rather than merely likely, this will destroy someone's flash chip one
day?

Pip




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Fri, 05 Mar 2021 13:25:01 GMT) Full text and rfc822 format available.

Message #71 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 46881 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, pipcet <at> gmail.com
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Fri, 05 Mar 2021 15:23:55 +0200
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: Pip Cet <pipcet <at> gmail.com>,  46881 <at> debbugs.gnu.org,  eggert <at> cs.ucla.edu
> Date: Fri, 05 Mar 2021 13:49:42 +0100
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > 30% slowdown and 1.5 sec absolute time difference doesn't sound bad
> > enough to me to justify a homemade solution.  I say let's go with
> > stdio.
> 
> Seems significant to me -- we're building Emacs a lot, and this bit
> can't be parallelised.  And the savings in electricity alone should make
> us go for the most efficient solution.
> 
> There doesn't seem to be any significant drawbacks to doing it the
> efficient way, either.

<Shrug> Fine, let's do it that way, then.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Fri, 05 Mar 2021 14:04:02 GMT) Full text and rfc822 format available.

Message #74 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 46881 <at> debbugs.gnu.org, Daniel Colascione <dancol <at> dancol.org>,
 eggert <at> cs.ucla.edu
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Fri, 5 Mar 2021 14:02:33 +0000
On Fri, Mar 5, 2021 at 1:16 PM Pip Cet <pipcet <at> gmail.com> wrote:
> >I say let's go with stdio.
>
> Maybe setbuffer(3) could help us here? I could run some benchmarks for
> that if the idea isn't out of the question.

Actually, calling setbuffer() with a large buffer will make glibc
reread the entire file on every fseek(), rendering the dump so slow I
gave up and interrupted it.

However, there's open_memstream: that would have the added advantage
of actually making glibc write out the entire file in one go, so it
seems to shave a few extra milliseconds off the build.

(However however, glibc's memstreams are somewhat broken. I'll file a
bug report if there isn't one already...)

Still, that way we could use stdio today, leave the buffering to
glibc, and hopefully be able to switch trivially to a "open this file
but keep it in memory" combined memstream/FILE* one day.

Pip




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Fri, 05 Mar 2021 14:14:01 GMT) Full text and rfc822 format available.

Message #77 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Colascione <dancol <at> dancol.org>
To: Pip Cet <pipcet <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>
Cc: 46881 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Fri, 5 Mar 2021 09:13:49 -0500
On 3/5/21 9:02 AM, Pip Cet wrote:

> On Fri, Mar 5, 2021 at 1:16 PM Pip Cet <pipcet <at> gmail.com> wrote:
>>> I say let's go with stdio.
>> Maybe setbuffer(3) could help us here? I could run some benchmarks for
>> that if the idea isn't out of the question.
> Actually, calling setbuffer() with a large buffer will make glibc
> reread the entire file on every fseek(), rendering the dump so slow I
> gave up and interrupted it.
>
> However, there's open_memstream: that would have the added advantage
> of actually making glibc write out the entire file in one go, so it
> seems to shave a few extra milliseconds off the build.
>
> (However however, glibc's memstreams are somewhat broken. I'll file a
> bug report if there isn't one already...)
>
> Still, that way we could use stdio today, leave the buffering to
> glibc, and hopefully be able to switch trivially to a "open this file
> but keep it in memory" combined memstream/FILE* one day.

You could also use fopencookie to make your own stdio stream that does 
the right thing while still using the stdio abstraction in the part of 
the code actually doing the writing.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Fri, 05 Mar 2021 14:57:02 GMT) Full text and rfc822 format available.

Message #80 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 46881 <at> debbugs.gnu.org, dancol <at> dancol.org, eggert <at> cs.ucla.edu
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Fri, 05 Mar 2021 16:55:43 +0200
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Fri, 5 Mar 2021 14:02:33 +0000
> Cc: Daniel Colascione <dancol <at> dancol.org>, eggert <at> cs.ucla.edu, 46881 <at> debbugs.gnu.org
> 
> However, there's open_memstream

That's glibc-only, AFAIK.  Not portable enough for us.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Fri, 05 Mar 2021 15:14:02 GMT) Full text and rfc822 format available.

Message #83 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 46881 <at> debbugs.gnu.org, Daniel Colascione <dancol <at> dancol.org>,
 eggert <at> cs.ucla.edu
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Fri, 5 Mar 2021 15:12:31 +0000
[Message part 1 (text/plain, inline)]
On Fri, Mar 5, 2021 at 2:56 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Fri, 5 Mar 2021 14:02:33 +0000
> > Cc: Daniel Colascione <dancol <at> dancol.org>, eggert <at> cs.ucla.edu, 46881 <at> debbugs.gnu.org
> >
> > However, there's open_memstream
>
> That's glibc-only, AFAIK.  Not portable enough for us.

POSIX.1-2008. Not portable enough to require, certainly, but portable
enough to use?

Pip
[0001-Use-stdio-memstreams-if-available-for-pdumper-Bug-46.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Tue, 15 Jun 2021 09:27:01 GMT) Full text and rfc822 format available.

Message #86 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Pip Cet <pipcet <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>,
 46881 <at> debbugs.gnu.org, Daniel Colascione <dancol <at> dancol.org>,
 Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Tue, 15 Jun 2021 11:25:47 +0200
Any reason not to apply the patch from https://debbugs.gnu.org/cgi/bugreport.cgi?bug=46881#20 right away? I've been using it locally for quite some time with very good results.

There seems to be a consensus for it, and while there may be even better solutions, having this in place now won't hurt.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Tue, 15 Jun 2021 12:59:02 GMT) Full text and rfc822 format available.

Message #89 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Colascione <dancol <at> dancol.org>
To: Mattias Engdegård <mattiase <at> acm.org>,
 Pip Cet <pipcet <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>,
 46881 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Tue, 15 Jun 2021 05:58:26 -0700
On 6/15/21 2:25 AM, Mattias Engdegård wrote:

> Any reason not to apply the patch from https://debbugs.gnu.org/cgi/bugreport.cgi?bug=46881#20 right away? I've been using it locally for quite some time with very good results.
>
> There seems to be a consensus for it, and while there may be even better solutions, having this in place now won't hurt.
I thought we had consensus for an fwrite based approach?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Tue, 15 Jun 2021 13:07:01 GMT) Full text and rfc822 format available.

Message #92 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Daniel Colascione <dancol <at> dancol.org>, Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 46881 <at> debbugs.gnu.org, mattiase <at> acm.org, eggert <at> cs.ucla.edu,
 pipcet <at> gmail.com
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Tue, 15 Jun 2021 16:06:28 +0300
> From: Daniel Colascione <dancol <at> dancol.org>
> Date: Tue, 15 Jun 2021 05:58:26 -0700
> 
> On 6/15/21 2:25 AM, Mattias Engdegård wrote:
> 
> > Any reason not to apply the patch from https://debbugs.gnu.org/cgi/bugreport.cgi?bug=46881#20 right away? I've been using it locally for quite some time with very good results.
> >
> > There seems to be a consensus for it, and while there may be even better solutions, having this in place now won't hurt.
> I thought we had consensus for an fwrite based approach?

That's what I preferred (still do), but Lars thought that the slightly
faster version without stdio was preferable.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Tue, 15 Jun 2021 13:19:02 GMT) Full text and rfc822 format available.

Message #95 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 46881 <at> debbugs.gnu.org, mattiase <at> acm.org,
 Daniel Colascione <dancol <at> dancol.org>, pipcet <at> gmail.com, eggert <at> cs.ucla.edu
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Tue, 15 Jun 2021 15:17:51 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

> That's what I preferred (still do), but Lars thought that the slightly
> faster version without stdio was preferable.

Well, it's 30% faster in a single-threaded phase of the compilation, so
it seems like a significant improvement to me.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Tue, 15 Jun 2021 13:27:02 GMT) Full text and rfc822 format available.

Message #98 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Colascione <dancol <at> dancol.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>, Eli Zaretskii <eliz <at> gnu.org>
Cc: 46881 <at> debbugs.gnu.org, mattiase <at> acm.org, eggert <at> cs.ucla.edu,
 pipcet <at> gmail.com
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Tue, 15 Jun 2021 06:25:50 -0700

On June 15, 2021 6:18:11 AM Lars Ingebrigtsen <larsi <at> gnus.org> wrote:

> Eli Zaretskii <eliz <at> gnu.org> writes:
>
>> That's what I preferred (still do), but Lars thought that the slightly
>> faster version without stdio was preferable.
>
> Well, it's 30% faster in a single-threaded phase of the compilation, so
> it seems like a significant improvement to me.
>
> --
> (domestic pets only, the antidote for overdose, milk.)
>   bloggy blog: http://lars.ingebrigtsen.n

The in-memory version is 30% faster than the FILE* version or 30% faster 
than the current write () version?






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Tue, 15 Jun 2021 13:31:02 GMT) Full text and rfc822 format available.

Message #101 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Daniel Colascione <dancol <at> dancol.org>
Cc: 46881 <at> debbugs.gnu.org, mattiase <at> acm.org, larsi <at> gnus.org, eggert <at> cs.ucla.edu,
 pipcet <at> gmail.com
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Tue, 15 Jun 2021 16:30:36 +0300
> From: Daniel Colascione <dancol <at> dancol.org>
> CC: <mattiase <at> acm.org>, <pipcet <at> gmail.com>, <46881 <at> debbugs.gnu.org>, <eggert <at> cs.ucla.edu>
> Date: Tue, 15 Jun 2021 06:25:50 -0700
> 
> >> That's what I preferred (still do), but Lars thought that the slightly
> >> faster version without stdio was preferable.
> >
> > Well, it's 30% faster in a single-threaded phase of the compilation, so
> > it seems like a significant improvement to me.
> >
> > --
> > (domestic pets only, the antidote for overdose, milk.)
> >   bloggy blog: http://lars.ingebrigtsen.n
> 
> The in-memory version is 30% faster than the FILE* version or 30% faster 
> than the current write () version?

The former, see

  https://debbugs.gnu.org/cgi/bugreport.cgi?bug=46881#56




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Tue, 15 Jun 2021 15:33:01 GMT) Full text and rfc822 format available.

Message #104 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 46881 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>,
 Daniel Colascione <dancol <at> dancol.org>, pipcet <at> gmail.com, eggert <at> cs.ucla.edu
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Tue, 15 Jun 2021 17:32:12 +0200
15 juni 2021 kl. 15.17 skrev Lars Ingebrigtsen <larsi <at> gnus.org>:

> Well, it's 30% faster in a single-threaded phase of the compilation, so
> it seems like a significant improvement to me.

The patch seems to be good in all respects -- simple, low risk, better performance, no compatibility trouble. And nothing prevents us from replacing it with something better later on, should the need arise.

All set then?





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Tue, 15 Jun 2021 22:45:02 GMT) Full text and rfc822 format available.

Message #107 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Colascione <dancol <at> dancol.org>
To: Mattias Engdegård <mattiase <at> acm.org>,
 Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 46881 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>, eggert <at> cs.ucla.edu,
 pipcet <at> gmail.com
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Tue, 15 Jun 2021 15:44:29 -0700
On 6/15/21 8:32 AM, Mattias Engdegård wrote:
> 15 juni 2021 kl. 15.17 skrev Lars Ingebrigtsen <larsi <at> gnus.org>:
>> Well, it's 30% faster in a single-threaded phase of the compilation, so
>> it seems like a significant improvement to me.
> The patch seems to be good in all respects -- simple, low risk, better performance, no compatibility trouble. And nothing prevents us from replacing it with something better later on, should the need arise.
>
> All set then?
>
Okay. I'm convinced.






Reply sent to Mattias Engdegård <mattiase <at> acm.org>:
You have taken responsibility. (Wed, 16 Jun 2021 08:01:01 GMT) Full text and rfc822 format available.

Notification sent to Pip Cet <pipcet <at> gmail.com>:
bug acknowledged by developer. (Wed, 16 Jun 2021 08:01:02 GMT) Full text and rfc822 format available.

Message #112 received at 46881-done <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Daniel Colascione <dancol <at> dancol.org>
Cc: 46881-done <at> debbugs.gnu.org, Lars Ingebrigtsen <larsi <at> gnus.org>,
 Paul Eggert <eggert <at> cs.ucla.edu>, Eli Zaretskii <eliz <at> gnu.org>,
 Pip Cet <pipcet <at> gmail.com>
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Wed, 16 Jun 2021 10:00:27 +0200
16 juni 2021 kl. 00.44 skrev Daniel Colascione <dancol <at> dancol.org>:

> Okay. I'm convinced.

All right, pushed, and closing this bug.
Thanks, Pip!





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Wed, 16 Jun 2021 08:15:01 GMT) Full text and rfc822 format available.

Message #115 received at 46881-done <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: 46881-done <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>,
 Daniel Colascione <dancol <at> dancol.org>, Pip Cet <pipcet <at> gmail.com>,
 Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Wed, 16 Jun 2021 10:14:22 +0200
Mattias Engdegård <mattiase <at> acm.org> writes:

> All right, pushed, and closing this bug.
> Thanks, Pip!

Great; thanks.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Wed, 16 Jun 2021 08:18:01 GMT) Full text and rfc822 format available.

Message #118 received at 46881-done <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: 46881-done <at> debbugs.gnu.org, Lars Ingebrigtsen <larsi <at> gnus.org>,
 Daniel Colascione <dancol <at> dancol.org>, Eli Zaretskii <eliz <at> gnu.org>,
 Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Wed, 16 Jun 2021 08:16:41 +0000
On Wed, Jun 16, 2021 at 8:00 AM Mattias Engdegård <mattiase <at> acm.org> wrote:
>
> 16 juni 2021 kl. 00.44 skrev Daniel Colascione <dancol <at> dancol.org>:
>
> > Okay. I'm convinced.
>
> All right, pushed, and closing this bug.
> Thanks, Pip!

Thank you for following up on this! Sorry Emacs kind of lost priority
for me right now, but I'll get back to it...




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46881; Package emacs. (Wed, 16 Jun 2021 14:14:01 GMT) Full text and rfc822 format available.

Message #121 received at 46881 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 46881 <at> debbugs.gnu.org
Subject: Re: bug#46881: 28.0.50; pdumper dumping causes way too many syscalls
Date: Wed, 16 Jun 2021 10:13:01 -0400
Pip Cet [2021-03-02 20:33:42] wrote:
> Playing around with the WebAssembly port, I noticed that pdumper, in
> creating the dump file, makes way too many syscalls: it uses
> emacs_write(), not fwrite(), so these calls translate to actual
> syscalls and context switches. On immature systems (or in special
> circumstances like a device mounted synchronously),

Thanks for this patch.  For the little story, this inefficiency showed
up on one of my Thinkpads running GNU/Linux with a plain
old ext4 partition mounted in the most standard way (no synchronous
mount or other funny business):

https://serverfault.com/questions/996495/writes-throttled-to-500kb-s

The way this manifested itself is that after some uptime individual
writes to the SSD became very slow.  For most operations, this was
completely invisible, but it was quite noticeable during Emacs's dump
which sometimes took several minutes (while all the rest of the
compilation (including the "load" part of the dump)) progressed at
(apparently) usual speeds.


        Stefan





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 15 Jul 2021 11:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 258 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.