GNU bug report logs - #51345
dd with conv=fsync sometimes returns when its writes are still cached

Reported by: Sworddragon <sworddragon2 <at> gmail.com>

Date: Sat, 23 Oct 2021 09:20:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 51345 in the body.
You can then email your comments to 51345 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-coreutils <at> gnu.org:
bug#51345; Package coreutils. (Sat, 23 Oct 2021 09:20:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Sworddragon <sworddragon2 <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Sat, 23 Oct 2021 09:20:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Sworddragon <sworddragon2 <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: dd with conv=fsync sometimes returns when its writes are still cached
Date: Sat, 23 Oct 2021 08:18:49 +0000

[Message part 1 (text/plain, inline)]

On Knoppix 9.1 with the Linux Kernel 5.10.10-64 x86_64 and GNU Coreutils
8.32 I wanted to overwrite my USB Thumb Drive a few times with random data
via "dd if=/dev/random of=/dev/sdb bs=1M conv=fsync". While it usually
takes ~2+ minutes to perform this action dd returned once after less than
60 seconds which made me a bit curious. On a later attempt dd returned with
this command after ~1 minute again but the LED on my USB Thumb Drive was
still blinking and an immediated executed sync on the terminal blocked for
~1 minute and once it returned the LED on my USB Thumb Drive also stopped
blinking.

Knoppix 9.1 was booted as a live system from a DVD and the only another
persistent storage attached to it was an internal HDD which was already
overwritten the same way via a past booting session.


Unfortunately Linux is not my main operating system anymore so I randomly
encountered this issue on a nearly 1 year old distribution which might be
slightly outdated. But the issue is pretty severe as it can lead to
significant data loss so it is worth at least reporting it (and having the
"bad" behavior to always call sync after dd nonetheless will probably now
continue to stay for quite a while).

[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#51345; Package coreutils. (Sat, 23 Oct 2021 12:56:02 GMT) Full text and rfc822 format available.

Message #8 received at 51345 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Sworddragon <sworddragon2 <at> gmail.com>, 51345 <at> debbugs.gnu.org
Subject: Re: bug#51345: dd with conv=fsync sometimes returns when its writes
 are still cached
Date: Sat, 23 Oct 2021 13:55:19 +0100

On 23/10/2021 09:18, Sworddragon wrote:
> On Knoppix 9.1 with the Linux Kernel 5.10.10-64 x86_64 and GNU Coreutils
> 8.32 I wanted to overwrite my USB Thumb Drive a few times with random data
> via "dd if=/dev/random of=/dev/sdb bs=1M conv=fsync". While it usually
> takes ~2+ minutes to perform this action dd returned once after less than
> 60 seconds which made me a bit curious. On a later attempt dd returned with
> this command after ~1 minute again but the LED on my USB Thumb Drive was
> still blinking and an immediated executed sync on the terminal blocked for
> ~1 minute and once it returned the LED on my USB Thumb Drive also stopped
> blinking.
> 
> Knoppix 9.1 was booted as a live system from a DVD and the only another
> persistent storage attached to it was an internal HDD which was already
> overwritten the same way via a past booting session.
> 
> 
> Unfortunately Linux is not my main operating system anymore so I randomly
> encountered this issue on a nearly 1 year old distribution which might be
> slightly outdated. But the issue is pretty severe as it can lead to
> significant data loss so it is worth at least reporting it (and having the
> "bad" behavior to always call sync after dd nonetheless will probably now
> continue to stay for quite a while).
> 

Well we're relying on the kernel here to not return from fync()
until appropriate.  You could try running the following immediately after,
to see if it also returns quickly:

  blockdev --flushbufs /dev/sdb

Note a subsequent `sync /dev/sdb` would probably not be effective,
since that would only consider data written by theh sync process
(which would be nothing).

cheers,
Pádraig

Information forwarded to bug-coreutils <at> gnu.org:
bug#51345; Package coreutils. (Sun, 24 Oct 2021 08:32:02 GMT) Full text and rfc822 format available.

Message #11 received at 51345 <at> debbugs.gnu.org (full text, mbox):

From: Sworddragon <sworddragon2 <at> gmail.com>
To: 51345 <at> debbugs.gnu.org
Date: Sun, 24 Oct 2021 07:35:59 +0000

[Message part 1 (text/plain, inline)]

> You could try running the following immediately after,
> to see if it also returns quickly:
>
>    blockdev --flushbufs /dev/sdb


The issue does not reproduce always and the related USB Thumb Drive has
already been prepared for and does store important data so that is not an
easy task. The USB Thumb Drive is also a pretty old device (roughly 10
years or even older) with only 1 GB of storage space. When dd with
conv=fsync returned after half of its usual writing time I guess it is
unlikely that the controller of the USB Thumb Drive has its own dedicated
512 MiB buffer attached to it.


> Well we're relying on the kernel here to not return from fync()
> until appropriate.

But the question is if there is a minor unobvious bug somewhere is the
controlling logic of dd that might still cause such a bug. But I checked
the manpages for the sync() and fsync() calls and they are actually quite
interesting. fsync() describes as flushing the caches for even data
retrieval after crashes/reboots. But the interesting part here is that it
describes after that it blocks until the devices reports completion. But
what happens if the device reports completion even if the kernel still sees
cached writes in its memory-mapped area (since storage devices are like
their own small computers and could lie or have faulty firmwares)? If
fsync() returns early here it would not be against the documention in the
manpage. sync() is here more simple as it describes itself as writing all
pending modifications for file (meta)data to the underlying filesystems. If
this would result returning after the device reports completion but the
kernel still sees cached writes in its context that would be strictly a
bug. The interesting part here is that the notes section of sync() sets
sync(), syncfs() and fsync() equal in guarantees.

With this information I see 3 possibilities here:

1. This is a bug in the controlling logic of dd that might not be obvious
at all.
2. This is a bug in fsync() or somewhere more below in the Linux Kernel.
3. Returning early is the intended behavior of fsync() and does not
strictly conflict with the manpage.

If the last is the case it might be worth proposing a change to the Linux
Kernel to additionally ensure that all cached writes are being sent out
from the Kernel's context before a return from fsync() is possible. It
would also mean that currently users can't rely on fsync() (e.g. via dd
with conv=fsync) to ensure the data has been flushed - instead they would
need to take additional action like executing a sync in the terminal
afterwards.

[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#51345; Package coreutils. (Tue, 26 Oct 2021 00:20:02 GMT) Full text and rfc822 format available.

Message #14 received at 51345 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: Sworddragon <sworddragon2 <at> gmail.com>
Cc: 51345 <at> debbugs.gnu.org
Subject: Re: bug#51345: dd with conv=fsync sometimes returns when its writes
 are still cached
Date: Mon, 25 Oct 2021 18:19:24 -0600

Sworddragon wrote:
> On Knoppix 9.1 with the Linux Kernel 5.10.10-64 x86_64 and GNU Coreutils
> 8.32 I wanted to overwrite my USB Thumb Drive a few times with random data
> via "dd if=/dev/random of=/dev/sdb bs=1M conv=fsync". While it usually
> takes ~2+ minutes to perform this action dd returned once after less than
> 60 seconds which made me a bit curious.

I suggest another try using oflag=direct instead of conv=fsync.

    dd if=/dev/random of=/dev/sdb bs=1M oflag=direct

Also with rates status.

    dd if=/dev/random of=/dev/sdb bs=1M oflag=direct status=progress

Here is the documentation for it.

  ‘oflag=FLAG[,FLAG]...’

     ‘direct’
          Use direct I/O for data, avoiding the buffer cache.  Note that
          the kernel may impose restrictions on read or write buffer
          sizes.  For example, with an ext4 destination file system and
          a Linux-based kernel, using ‘oflag=direct’ will cause writes
          to fail with ‘EINVAL’ if the output buffer size is not a
          multiple of 512.

Bob

Information forwarded to bug-coreutils <at> gnu.org:
bug#51345; Package coreutils. (Thu, 28 Oct 2021 23:57:02 GMT) Full text and rfc822 format available.

Message #17 received at 51345 <at> debbugs.gnu.org (full text, mbox):

From: Sworddragon <sworddragon2 <at> gmail.com>
To: 51345 <at> debbugs.gnu.org
Date: Thu, 28 Oct 2021 21:59:06 +0000

[Message part 1 (text/plain, inline)]

Despite I'm not using Linux as main system anymore and wanted to avoid
getting into too much work I found some time to do some tests as this issue
bugs me just too much.

> You could try running the following immediately after,
> to see if it also returns quickly:
>
>   blockdev --flushbufs /dev/sdb

Yes, this command also blocks for a bit over 1 minute when this issue
occurs. Here is the output (I had to freely translate the strings since
this Knoppix instance is only in german so they might be slightly
inaccurate; Also I had to type all text since it was executed on a
different system but carefully checked to not introduce any typos):

root <at> Microknoppix:~# dd if=/dev/random of=/dev/sdb bs=1M conv=fsync
status=progress
1039138816 Bytes(1,0 GB, 991 MiB) copied, 56 s, 18,5 MB/s
dd: Error on writing '/dev/sdb': The device has not enough free space
999+0 records in
998+0 records out
1047526912 Bytes (1,0 GB, 999 MiB) copied, 57,0283 s, 18,4 MB/s
root <at> Microknoppix:~# time blockdev --flushbufs /dev/sdb

real    1m9,747s
user    0m0,001s
sys     0m0,005s


(A bit offtopic, but maybe another minor bug: The first line of the copied
bytes differ slightly with every attempt (like 991 MiB, 994 MiB, 997 MiB,
etc.) but the last line seems to be always the same up to the last byte. I
had the impression the output from status=progress does not do a final
update once dd throws the writing error because the free space went out, is
my assumption correct? If so, it would probably make sense and be helpful
for others on debugging attempts if dd would do a final update. But back to
the original topic)


> Also with rates status.
>
>     dd if=/dev/random of=/dev/sdb bs=1M oflag=direct status=progress

With conv=fsync the bug usually occurs every 3-4 attempts (but the sample
size is small) so I decided to do 20 attempts with oflag=direct. The issue
did not appear once and every attempt was roughly 129 seconds (+/- 0.5
seconds) and an executed sync afterwards always returned immediately. Does
direct I/O signal the related storage device to not use its bultin-cache?
If not this implies that the USB Thumb Drive's firmware is very likely not
faulty and does not contribute to this issue.

But I noticed some other thing (I also noticed it in the past but it got
now my attention after all the stable oflag=direct writes): With conv=fsync
the writing time is sometimes around double as high as it should be. On
doing now 3 attempts with it dd showed me final times of 244,382s, 239,127s
and 135,389s, while the status progress always stopped after 56s. So
waiting for the buffers to be flushed after dd claimed about no free empty
storage space the times differ very significantly at best roughly as fast
as oflag=direct and at worst ~2.5 times slower than they should be.


To sum this up we have 2 issues:

1. The status progress from status=progress does not make a final update
(which would be useful) - but that is offtopic and probably easy to fix so
I guess there is no more for me to do here.
2. fsync returns often either early when the buffers are not fully flushed
yet or it drastically increases the writing time compared to normal
attempts (even the normal attempts are apparently quite a bit slower than
direct I/O while probably the opposite should be the case) which is very
likely either a bug in dd or something in the Linux Kernel's fsync()
imlementation is very broken.

[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#51345; Package coreutils. (Fri, 29 Oct 2021 12:39:02 GMT) Full text and rfc822 format available.

Message #20 received at 51345 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Sworddragon <sworddragon2 <at> gmail.com>, 51345 <at> debbugs.gnu.org
Subject: Re: bug#51345:
Date: Fri, 29 Oct 2021 13:38:36 +0100

On 28/10/2021 22:59, Sworddragon wrote:
> Despite I'm not using Linux as main system anymore and wanted to avoid
> getting into too much work I found some time to do some tests as this issue
> bugs me just too much.
> 
>> You could try running the following immediately after,
>> to see if it also returns quickly:
>>
>>    blockdev --flushbufs /dev/sdb
> 
> Yes, this command also blocks for a bit over 1 minute when this issue
> occurs.

Right that suggests the conv=fsync to dd was ineffective

> Here is the output (I had to freely translate the strings since
> this Knoppix instance is only in german so they might be slightly
> inaccurate; Also I had to type all text since it was executed on a
> different system but carefully checked to not introduce any typos):
> 
> root <at> Microknoppix:~# dd if=/dev/random of=/dev/sdb bs=1M conv=fsync
> status=progress
> 1039138816 Bytes(1,0 GB, 991 MiB) copied, 56 s, 18,5 MB/s
> dd: Error on writing '/dev/sdb': The device has not enough free space
> 999+0 records in
> 998+0 records out

Ah right. What's happening is dd is not doing the fsync()
as it's exiting early due to write(2) getting ENOSPC.

As you've seen you can avoid the need for fsync()
to flush buffers with oflag=direct.
A reason that might be faster also is that depending on your free mem,
you would avoid churning the kernel caches.

Another way to at least ensure the conv=fsync was effective,
would be to not write too much.  I.e. you could determine
the exact size of the disk (with `blockdev --getsize64 /dev/sdb` for e.g.)
and then use an appropriate bs= and count=. That's awkward though
and difficult to do with good performance due to larger block sizes
not generally aligning with the device size.

So this is a gotcha that should at least be documented.
Though I'm leaning towards improving this by always
doing an fsync on exit if we get a read or write error
and have successfully written any data, so that
previously written data is sync'd as requested.

cheers,
Pádraig

Information forwarded to bug-coreutils <at> gnu.org:
bug#51345; Package coreutils. (Fri, 29 Oct 2021 13:33:01 GMT) Full text and rfc822 format available.

Message #23 received at 51345 <at> debbugs.gnu.org (full text, mbox):

From: Sworddragon <sworddragon2 <at> gmail.com>
To: 51345 <at> debbugs.gnu.org
Date: Fri, 29 Oct 2021 13:32:25 +0000

[Message part 1 (text/plain, inline)]

I got an email that somebody replied to this report but it looks like they
did send it to the wrong mail address (maybe they will be merged in order)
but I will still reply to it:

> Suggestion as a possible workaround: Have a look at random(4) and
random(7),
> and ask yourself whether your use of /dev/random, rather than
/dev/urandom,
> is really necessary for your application. If not, you might try
/dev/urandom
> instead and report what you observe.
>
> As documented in those man pages, there are good reasons to avoid using
> /dev/random, not the least of which is that it blocks frequently (every
time
> the entropy pool is exhausted), whereas /dev/urandom does not. That alone
may
> explain the execution time inconsistencies you're reporting.

I'm aware of the differences of both and I don't see how the use of
/dev/random here could explain any of the issues:

- The drastically increased writing times are caused after dd claimed about
no free empty space (assuming this means dd has finished doing its explicit
writing task, e.g. writing to the transparent underlying cache) and dd's
times until this point were fine implying that either significant blocking
has never occured or dd correcly handles blocking input files by
asynchronously continuing to write to the output file to avoid unneccessary
delays.
- That dd returns while the buffers are not flushed can't also be explained
by the use of /dev/random unless there would be some very crazy out of mind
bug somewhere.

But I still did some tests with /dev/urandom and conv=fsync and I did see
all 3 cases too (Normal writing times that are slightly longer (133.247s);
Drastically increased writing times (235.906s) and early returning from dd
(56.4327s) while an immediated executed sync still blocked for over a
minute).

Also for the slightly delayed writing times (~133s with conv=fsync compared
to ~129s with oflag=direct) I noticed that with conv=fsync the LED of the
USB Thumb Drive starts to blink a few seconds after dd started showing
status progress so I assume Knoppix's kernel/settings cause the cache to be
flushed slightly delayed which seems more or less normal and would explain
this specific case of being ~4s slower.


And while I was writing this the next message got in:


> Ah right. What's happening is dd is not doing the fsync()
> as it's exiting early due to write(2) getting ENOSPC.

But that would make not much sense for 2 reasons:

1. The error message that the space went empty is the expected behavior
here and there is no rational dd should exit then instead of still calling
fsync().
2. In the most attempts after dd has thrown the error message about no free
empty space it still blocked for at least over a minute and an immediated
excuted sync always returned.

So it looks dd sometimes does call fsync() on ENOSPC and sometimes it
doesn't. And why a successfull call to fsync() then is sometimes ~2.5
slower is also a mystery.


> So this is a gotcha that should at least be documented.
> Though I'm leaning towards improving this by always
> doing an fsync on exit if we get a read or write error
> and have successfully written any data, so that
> previously written data is sync'd as requested.

Yes, ensuring fsync() being called seems to be the better option here. But
this still leaves the question from above why dd seemingly does this
sometimes already on ENOSPC? Maybe it is a race between ENOSPC and fsync()
in dd causing this bug? Eventually the sometimes occuring very delayed
writes (e.g. ~235s instead of ~133s) play a role here?

[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#51345; Package coreutils. (Fri, 29 Oct 2021 16:38:01 GMT) Full text and rfc822 format available.

Message #26 received at 51345 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Sworddragon <sworddragon2 <at> gmail.com>, 51345 <at> debbugs.gnu.org
Subject: Re: bug#51345:
Date: Fri, 29 Oct 2021 13:06:31 +0100

On 28/10/2021 22:59, Sworddragon wrote:
> Despite I'm not using Linux as main system anymore and wanted to avoid
> getting into too much work I found some time to do some tests as this issue
> bugs me just too much.
> 
>> You could try running the following immediately after,
>> to see if it also returns quickly:
>>
>>    blockdev --flushbufs /dev/sdb
> 
> Yes, this command also blocks for a bit over 1 minute when this issue
> occurs.

Right, so that suggests conv=fsync on dd was ineffective.

> Here is the output (I had to freely translate the strings since
> this Knoppix instance is only in german so they might be slightly
> inaccurate; Also I had to type all text since it was executed on a
> different system but carefully checked to not introduce any typos):
> 
> root <at> Microknoppix:~# dd if=/dev/random of=/dev/sdb bs=1M conv=fsync
> status=progress
> 1039138816 Bytes(1,0 GB, 991 MiB) copied, 56 s, 18,5 MB/s
> dd: Error on writing '/dev/sdb': The device has not enough free space
> 999+0 records in
> 998+0 records out

Ah right. What's probably happening is that dd is not doing the fsync
because it's exiting early because of the ENOSPC from write(2).

To avoid needing the buffer drain (with fsync), you can use

Information forwarded to bug-coreutils <at> gnu.org:
bug#51345; Package coreutils. (Fri, 29 Oct 2021 19:59:01 GMT) Full text and rfc822 format available.

Message #29 received at 51345 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>,
 Sworddragon <sworddragon2 <at> gmail.com>, 51345 <at> debbugs.gnu.org
Subject: Re: bug#51345:
Date: Fri, 29 Oct 2021 12:58:10 -0700

On 10/29/21 05:38, Pádraig Brady wrote:
> I'm leaning towards improving this by always
> doing an fsync on exit if we get a read or write error
> and have successfully written any data, so that
> previously written data is sync'd as requested.

Sounds like a good idea to me too.

Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Fri, 28 Jan 2022 08:07:02 GMT) Full text and rfc822 format available.

Notification sent to Sworddragon <sworddragon2 <at> gmail.com>:
bug acknowledged by developer. (Fri, 28 Jan 2022 08:07:02 GMT) Full text and rfc822 format available.

Message #34 received at 51345-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Sworddragon <sworddragon2 <at> gmail.com>
Cc: 51345-done <at> debbugs.gnu.org
Subject: Re: bug#51345: dd with conv=fsync sometimes returns when its writes
 are still cached
Date: Fri, 28 Jan 2022 00:05:51 -0800

[Message part 1 (text/plain, inline)]

I found a bit of time to work on this and installed the attached patch, 
which should address the issue. Thanks for reporting it.

[0001-dd-synchronize-output-after-write-errors.patch (text/x-patch, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 25 Feb 2022 12:24:08 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 126 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #51345 dd with conv=fsync sometimes returns when its writes are still cached

GNU bug report logs - #51345
dd with conv=fsync sometimes returns when its writes are still cached