GNU bug report logs - #54586
dd conv options doc

Previous Next

Package: coreutils;

Reported by: Karl Berry <karl <at> freefriends.org>

Date: Sat, 26 Mar 2022 20:30:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 54586 in the body.
You can then email your comments to 54586 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#54586; Package coreutils. (Sat, 26 Mar 2022 20:30:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Karl Berry <karl <at> freefriends.org>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Sat, 26 Mar 2022 20:30:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Karl Berry <karl <at> freefriends.org>
To: bug-coreutils <at> gnu.org
Subject: dd conv options doc
Date: Sat, 26 Mar 2022 14:29:42 -0600
The dd Texinfo doc says, for the conv= option
(https://gnu.org/s/coreutils/manual/html_node/dd-invocation.html)

     'fdatasync'
          Synchronize output data just before finishing.  This forces a
          physical write of output data.

     'fsync'
          Synchronize output data and metadata just before finishing.
          This forces a physical write of output data and metadata.

Weirdly, these descriptions are inducing quite a bit of FUD in me.

Why would I ever want the writes to be incomplete after running dd?
Seems like that is dd's whole purpose.

Well, I suppose it is too late to make such a radical change as forcing
a final sync. In which case I suggest adding another sentence along the
lines of "If these options are not specified, the data will be
physically written when the system schedules the syncs, ordinarily every
few seconds" (correct?). "You can also manually sync the output
filesystem yourself afterwards (xref sync)." Otherwise it feels
uncertain when or whether the data will be physically written, or how to
look into it further.

As for "metadata", what does dd have to do with metadata?  My wild guess
is that this is referring to filesystem metadata, not anything about dd
specifically. Whatever the case, I suggest adding a word or two to the
doc to give a clue.

Further, why would I want data to be synced and not metadata? Seems like
fdatasync and fsync should both do both; or at least document that
normally they'd be used together. Or, if there is a real-life case where
a user would want one and not the other, how about documenting that? My
imagination is failing me, but presumably these seemingly-undesirable
options were invented for a reason.

BTW, I came across these options on a random page discussing dumping a
.iso to a USB drive; the example was
  dd if=foo.iso of=/dev/sde conv=fdatasync
.. seems now like fsync should also have been given, for certainty.
--thanks, karl.





Information forwarded to bug-coreutils <at> gnu.org:
bug#54586; Package coreutils. (Mon, 04 Apr 2022 20:25:02 GMT) Full text and rfc822 format available.

Message #8 received at 54586 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: Karl Berry <karl <at> freefriends.org>
Cc: 54586 <at> debbugs.gnu.org
Subject: Re: bug#54586: dd conv options doc
Date: Mon, 4 Apr 2022 14:24:25 -0600
Karl Berry wrote:
>      'fdatasync'
>           Synchronize output data just before finishing.  This forces a
>           physical write of output data.
>
>      'fsync'
>           Synchronize output data and metadata just before finishing.
>           This forces a physical write of output data and metadata.
>
> Weirdly, these descriptions are inducing quite a bit of FUD in me.
>
> Why would I ever want the writes to be incomplete after running dd?
> Seems like that is dd's whole purpose.

Yes.  FUD.  The writes are not incomplete.  It is no different than
any other write.

    echo "Hello, World!" > file1

Is that write complete?  It's no different.  If one is incomplete then
so is the other.  Note that the documentation does not say
"incomplete" but says "physical write".  As in, chiseled into stone.

The dd utility exists with a plethora of low level options not
typically available in other utilities.  Other utilities such as cp
for example.  That is one of the distinguishing features making dd
useful in a very large number of cases when otherwise we would use cp,
rsync, or one of the others.  Very low level control of option flags.
But just because options exist does not mean they should always be
used.  Most of the time they should not be used.

> Well, I suppose it is too late to make such a radical change as forcing
> a final sync.

Please, no.  Opposing this is the motivation for me writing this
response.  Things are wastefully slow already due to the number of
fsync() calls now coded into everywhere all over the place.  Other
programs.  Not referring to the coreutils here.  Let's not make the
problem worse by adding them where they are not desired.  And that is
why it is an option to dd and not on by default.  In those specific
cases where it is useful then it can be specified as an option.  dd is
exposing the interface for when it is useful.

As a practical matter I think with GNU dd's extensions that I never
ever use conv=fsync or conv=fdatasync but instead would always in
those same cases use oflag=direct,sync.  Such as when writing a
removable storage device like a USB drive, that I subsequently will
want to remove.  There is no benefit to caching the data since it will
be invalidated immediately.  Not using buffer cache avoids flushing
some other data that would be useful to keep in file system buffer
cache.  When the write is done then the removable media can be
removed.  This avoids needing to run sync explicitly.  Which sync's
*everything*.

> In which case I suggest adding another sentence along the lines of
> "If these options are not specified, the data will be physically
> written when the system schedules the syncs, ordinarily every few
> seconds" (correct?).

Yes.  However the behavior might vary slightly between the different
kernels such as Linux kernel, BSD kernel, or even HP-UX kernel.
Therefore the documentation of it is kernel specific.  Even if all of
the kernels operated similarly.

> "You can also manually sync the output filesystem yourself
> afterwards (xref sync)." Otherwise it feels uncertain when or
> whether the data will be physically written, or how to look into it
> further.

Generally this is a task that the operating system should be handling.
The programmer taking explicit control defeating the cache is almost
always going to be less efficient at it than the operating system.

However as you later mention writing an image to a removable storage
device like a USB thumbdrive needs to have the data flushed through
before removing the device.  GNU dd is good for this as I will
describe below but otherwise yes a "sync" (either the standalone or
the oflag) would be needed to ensure that the data has been flushed
through.

> As for "metadata", what does dd have to do with metadata?  My wild guess
> is that this is referring to filesystem metadata, not anything about dd
> specifically. Whatever the case, I suggest adding a word or two to the
> doc to give a clue.

It's not dd's fault.  The OS created it first!  It's a property given
meaning by the OS.  The OS defines the option flags.  The dd utility
is simply a thin layer giving access to the OS file option flags.

> Further, why would I want data to be synced and not metadata? Seems like
> fdatasync and fsync should both do both; or at least document that
> normally they'd be used together. Or, if there is a real-life case where
> a user would want one and not the other, how about documenting that? My
> imagination is failing me, but presumably these seemingly-undesirable
> options were invented for a reason.

The fdatasync() man page provides the information.

    The aim of fdatasync() is to reduce disk activity for applications
    that do not require all metadata to be synchronized with the disk.

In short fdatasync() is less heavy than fsync().

> BTW, I came across these options on a random page discussing dumping a
> .iso to a USB drive; the example was
>   dd if=foo.iso of=/dev/sde conv=fdatasync
> .. seems now like fsync should also have been given, for certainty.

For completely portable use one can only write the data and then call
sync afterward and then remove the removable storage after the sync
completes.  I don't know of any better fully portable way.  It's
silent if there are no errors.  Depending upon the speed of the
destination it might be tens of minutes before it completes.

    dd if=someimage.img of=/dev/sdX obs=16M
    sync

Where /dev/sdX is the device path name of the destination.  Always be
very careful to ensure the correct destination name.  Do not overwrite
the wrong target destination.  Doing so could destroy your system.

For writing images to USB with GNU dd and the Linux kernel I prefer
This following combination.  It's the most friendly with very good
user feedback.

    pv someimage.img | dd of=/dev/sdX obs=16M oflag=direct,sync

Then use of pv wil provide a nice progress notification.  Check it out!

    4.31GiB 0:08:13 [8.94MiB/s] [==============================================>] 100%

The main points being to use a output buffer size large enough to be
efficient but small enough such that regular notification of progress
is reported to the user.  If it is too large then the progress
reporting will be too "chunky".  Ideally it will be a multiple of the
internal flash NAND write block size.  Which we can't know and can only
take a guess.

To keep this entirely within GNU dd there is the new status=progress
option.

    $ dd if=someimage.img of=/dev/sdX obs=16M oflag=direct status=progress
    426349056 bytes (426 MB, 407 MiB) copied, 3 s, 142 MB/s
    ...

Honestly though it isn't anywhere near as nice as the progress report
from pv and I always use pv+dd for this task.  Give it a try! :-)

Bob




Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Thu, 07 Jul 2022 04:48:02 GMT) Full text and rfc822 format available.

Notification sent to Karl Berry <karl <at> freefriends.org>:
bug acknowledged by developer. (Thu, 07 Jul 2022 04:48:02 GMT) Full text and rfc822 format available.

Message #13 received at 54586-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Karl Berry <karl <at> freefriends.org>
Cc: 54586-done <at> debbugs.gnu.org
Subject: Re: bug#54586: dd conv options doc
Date: Wed, 6 Jul 2022 23:47:36 -0500
[Message part 1 (text/plain, inline)]
On 3/26/22 15:29, Karl Berry wrote:
> why would I want data to be synced and not metadata?

Performance, in apps that don't care about the metadata. Admittedly for 
dd the use case is rare; it's mostly present so that dd exports all the 
open flags to the user.

I installed the attached to try to document this better.
[0001-dd-doc-improvement-Bug-54586.patch (text/x-patch, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 04 Aug 2022 11:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 259 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.