GNU bug report logs - #47883
sort -o loses data when it crashes

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: coreutils; Reported by: "Peter van Dijk" <peter@HIDDEN>; dated Sun, 18 Apr 2021 22:44:01 UTC; Maintainer for coreutils is bug-coreutils@HIDDEN.
Disconnected #47883 from all other report(s). Request was from Paul Eggert <eggert@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 47883 <at> debbugs.gnu.org:


Received: (at 47883) by debbugs.gnu.org; 24 Apr 2021 22:00:30 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Apr 24 18:00:30 2021
Received: from localhost ([127.0.0.1]:41640 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1laQKE-00019Z-2Z
	for submit <at> debbugs.gnu.org; Sat, 24 Apr 2021 18:00:30 -0400
Received: from zimbra.cs.ucla.edu ([131.179.128.68]:54650)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eggert@HIDDEN>) id 1laQKC-00015d-Av
 for 47883 <at> debbugs.gnu.org; Sat, 24 Apr 2021 18:00:28 -0400
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id 3554E160099;
 Sat, 24 Apr 2021 15:00:22 -0700 (PDT)
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032)
 with ESMTP id ECiNihcWpdFf; Sat, 24 Apr 2021 15:00:21 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id 7C1E81600A7;
 Sat, 24 Apr 2021 15:00:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026)
 with ESMTP id GkCZxJSlWXnR; Sat, 24 Apr 2021 15:00:21 -0700 (PDT)
Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com
 [172.91.119.151])
 by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 52DF8160099;
 Sat, 24 Apr 2021 15:00:21 -0700 (PDT)
To: L A Walsh <coreutils@HIDDEN>
References: <0910fdcb-ec97-45e2-9128-4bbb369d74d9@HIDDEN>
 <31aa9fcc-4be5-7a13-4682-3c320b46091d@HIDDEN>
 <608479A6.2000701@HIDDEN>
From: Paul Eggert <eggert@HIDDEN>
Organization: UCLA Computer Science Department
Subject: Re: bug#47883: sort -o loses data when it crashes
Message-ID: <c0b6a19e-3537-4a2f-9f3a-3c15b1a67cb8@HIDDEN>
Date: Sat, 24 Apr 2021 15:00:21 -0700
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
 Thunderbird/78.7.1
MIME-Version: 1.0
In-Reply-To: <608479A6.2000701@HIDDEN>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 47883
Cc: 47883 <at> debbugs.gnu.org, peter@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

As I wrote you privately last month, the coreutils maintainers (who are=20
not me) are pretty busy. The proposed change in bug#47883 would be=20
incompatible with longstanding tradition and would almost certainly=20
break some existing scripts running on GNU/Linux. This is not something=20
to do lightly.

It might be possible to come up with a different change that would=20
address the issue raised without being so disruptive. Whatever change=20
(if any) is chosen, someone needs to think it through, code it up,=20
document it, and test it. Although nobody's found the time to do that,=20
perhaps you could volunteer or find someone who could volunteer; that=20
would surely accelerate the process.

You mentioned that we have multiple bug reports (now 47059, 47883,=20
48002) on basically the same topic, so I have taken the liberty of=20
merging them.




Information forwarded to bug-coreutils@HIDDEN:
bug#47883; Package coreutils. Full text available.
Merged 47059 47883 48002. Request was from Paul Eggert <eggert@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 47883 <at> debbugs.gnu.org:


Received: (at 47883) by debbugs.gnu.org; 22 Apr 2021 02:11:26 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Apr 21 22:11:26 2021
Received: from localhost ([127.0.0.1]:60169 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1lZOoP-000806-QQ
	for submit <at> debbugs.gnu.org; Wed, 21 Apr 2021 22:11:26 -0400
Received: from zimbra.cs.ucla.edu ([131.179.128.68]:36840)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eggert@HIDDEN>) id 1lZOoM-0007zq-89
 for 47883 <at> debbugs.gnu.org; Wed, 21 Apr 2021 22:11:24 -0400
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id 22B56160151;
 Wed, 21 Apr 2021 19:11:16 -0700 (PDT)
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032)
 with ESMTP id 8cx-1lXBU54o; Wed, 21 Apr 2021 19:11:15 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id 5F1C8160152;
 Wed, 21 Apr 2021 19:11:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026)
 with ESMTP id EDsuPAQGhvcp; Wed, 21 Apr 2021 19:11:15 -0700 (PDT)
Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com
 [172.91.119.151])
 by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 3CCAE160151;
 Wed, 21 Apr 2021 19:11:15 -0700 (PDT)
To: Peter van Dijk <peter@HIDDEN>
References: <0910fdcb-ec97-45e2-9128-4bbb369d74d9@HIDDEN>
From: Paul Eggert <eggert@HIDDEN>
Organization: UCLA Computer Science Department
Subject: Re: bug#47883: sort -o loses data when it crashes
Message-ID: <31aa9fcc-4be5-7a13-4682-3c320b46091d@HIDDEN>
Date: Wed, 21 Apr 2021 19:11:14 -0700
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
 Thunderbird/78.7.1
MIME-Version: 1.0
In-Reply-To: <0910fdcb-ec97-45e2-9128-4bbb369d74d9@HIDDEN>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 47883
Cc: 47883 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

On 4/18/21 10:46 AM, Peter van Dijk wrote:
> While the manual (but not the manpage) mentions the data loss, I think =
it would be great if sort did not have this problem at all, and I think t=
he OpenGroup text also says it should not have this problem.

I don't know of any 'sort' implementation that does not have the problem=20
at all. For example, FreeBSD 'sort -o file file' can lose 'file' in some=20
(rare) cases. The only portable way to avoid this problem in a shell=20
script is to output to some other file first and make sure that worked,=20
before attempting to replace the input file.

Also, I don't see where the Open Group spec says what you're saying. On=20
the contrary, the spec merely says that '-o output' should cause output=20
to be sent to the output file. If there are multiple hard links to the=20
output file, this suggests 'sort' should update the output file's=20
contents without breaking any hard links. Admittedly the Open Group spec=20
is a bit vague in this area, but I certainly don't see anything implying=20
that GNU 'sort' does not conform to POSIX in this area.

FreeBSD 'sort' has a problem, in that 'sort -o A B' preserves all hard=20
links to A's file, but 'sort -o A A' does not because it breaks the link=20
from A. That's confusing.

Traditional Unix 'sort -o A' behaves the way GNU 'sort' does; it=20
preserves all hard links to A's file. So there is a compatibility=20
argument for doing things the way GNU 'sort' does them, even if that=20
might lead to more data loss in rare cases.




Information forwarded to bug-coreutils@HIDDEN:
bug#47883; Package coreutils. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 18 Apr 2021 22:43:13 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Apr 18 18:43:13 2021
Received: from localhost ([127.0.0.1]:48016 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1lYG8H-00062s-8w
	for submit <at> debbugs.gnu.org; Sun, 18 Apr 2021 18:43:13 -0400
Received: from lists.gnu.org ([209.51.188.17]:60840)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <peter@HIDDEN>) id 1lYBVI-00056v-Fo
 for submit <at> debbugs.gnu.org; Sun, 18 Apr 2021 13:46:41 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:53736)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <peter@HIDDEN>) id 1lYBVI-0005oO-64
 for bug-coreutils@HIDDEN; Sun, 18 Apr 2021 13:46:40 -0400
Received: from wout3-smtp.messagingengine.com ([64.147.123.19]:45469)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <peter@HIDDEN>) id 1lYBVE-0003HQ-Q0
 for bug-coreutils@HIDDEN; Sun, 18 Apr 2021 13:46:39 -0400
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41])
 by mailout.west.internal (Postfix) with ESMTP id 53B501B08
 for <bug-coreutils@HIDDEN>; Sun, 18 Apr 2021 13:46:33 -0400 (EDT)
Received: from imap2 ([10.202.2.52])
 by compute1.internal (MEProxy); Sun, 18 Apr 2021 13:46:33 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=content-type:date:from:message-id
 :mime-version:subject:to:x-me-proxy:x-me-proxy:x-me-sender
 :x-me-sender:x-sasl-enc; s=fm2; bh=O9UtlCiw1SjqQlQ05rvoRLfPGyMP6
 Z9vIN+IVhCS9/I=; b=jJPRdlRbZ3GmKvsqtDDzz1wIjWdWjNBz3Xemd3XFDHLri
 IxVYKe1l6imchYA3x1VVrIjnskTcgiVKaTWIH2onRFwMd69Xpq4pkzttiq5PX9Jd
 VCAqpa1jKUec73jC3gK68DkrWM3x/i0kMn//wC4pAIBSlpAwcsoEE8eqPJqQY2sE
 R4HE9BzmJ92P3dF8A5LPm/MTZwL3f7tNjMIooopTIR9NWEqFMbxt20F253hzg6eo
 x68TuUxzx6CBxUnHwompldvlOioMR4r67yGLijiZlfNKg+VYhT39iWaCLPdY9gY5
 EiIHvLgh4Gi/JPztvrZTB9ZUqLeVJTUK1DYSdJCkg==
X-ME-Sender: <xms:eHB8YOoZSiZct_ncCfHlLnpTzC1TwQ9PARHxG6AIE6-kBcVZ37W9nQ>
 <xme:eHB8YMpgg_iUPUTsZG9iU-RPuEo4zlaDmibYP7B5MmAd3jU_OhnvIN2QjLro_2CNV
 EKXuOdqlNmmBe69BA>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrvddtuddgvdehucetufdoteggodetrfdotf
 fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen
 uceurghilhhouhhtmecufedttdenucenucfjughrpefofgggkfffhffvufgtsehttdertd
 erredtnecuhfhrohhmpedfrfgvthgvrhcuvhgrnhcuffhijhhkfdcuoehpvghtvghrseej
 sghithhsrdhnlheqnecuggftrfgrthhtvghrnhepfeeuheeigfdthfejhefgleevhfdvje
 fgleelteelgeehieejhfektefhgeehfeeinecuffhomhgrihhnpehophgvnhhgrhhouhhp
 rdhorhhgpdhgnhhurdhorhhgnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpe
 hmrghilhhfrhhomhepphgvthgvrhesjegsihhtshdrnhhl
X-ME-Proxy: <xmx:eHB8YDMKfja0wGKyEjhWM7FySltQYI8OG_eVatq8V8O7d2fFgt_NUw>
 <xmx:eHB8YN6Vu1YBpAf7okGoFZsLMXsyC01IEl0IYhusKpgyCmtHT7Bhrw>
 <xmx:eHB8YN4HcF-kppElwHvgkqx3Bv4RVbrA0gs66ZCZtIK_j9bImqEf7A>
 <xmx:eHB8YIIAZ14OANCicIeeIMJ5E9_oCmg5-7I4PM3zWijlvx8cfH0zsg>
Received: by mailuser.nyi.internal (Postfix, from userid 501)
 id 84F06A00079; Sun, 18 Apr 2021 13:46:32 -0400 (EDT)
X-Mailer: MessagingEngine.com Webmail Interface
User-Agent: Cyrus-JMAP/3.5.0-alpha0-273-g8500d2492d-fm-20210323.002-g8500d249
Mime-Version: 1.0
Message-Id: <0910fdcb-ec97-45e2-9128-4bbb369d74d9@HIDDEN>
Date: Sun, 18 Apr 2021 19:46:01 +0200
From: "Peter van Dijk" <peter@HIDDEN>
To: bug-coreutils@HIDDEN
Subject: sort -o loses data when it crashes
Content-Type: text/plain
Received-SPF: none client-ip=64.147.123.19; envelope-from=peter@HIDDEN;
 helo=wout3-smtp.messagingengine.com
X-Spam_score_int: -25
X-Spam_score: -2.6
X-Spam_bar: --
X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001,
 SPF_NONE=0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Sun, 18 Apr 2021 18:43:11 -0400
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html: -o  output
    Specify the name of an output file to be used instead of the standard output. This file can be the same as one of the input files.

https://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html: "data may be lost if the system crashes or sort encounters an I/O or other serious error while a file is being sorted in place" and "sort with --merge (-m) can open the output file before reading all input"

While the manual (but not the manpage) mentions the data loss, I think it would be great if sort did not have this problem at all, and I think the OpenGroup text also says it should not have this problem. I looked around, and a lot of software does get this right (by opening a randomly-named temp file to write to, and only moving it into place when it is written successfuly) - GNU sed -i, OpenBSD sort, and surely there are more. As a bonus, doing this would also make the `-o someinputfile -m` case safe.

Reproduction of the data loss is easy:

$ seq 10000 > 10000 ; prlimit --fsize=10 sort -R -o 10000 10000 ; wc -l 10000
File size limit exceeded (core dumped)
2 10000


(coreutils shuf has the same problem even though not all code appears to be shared - for example, sorts open the file for writing even before it opens it for reading, while shuf reverses the order of those two operations. That difference makes no difference in the effect, though.)

-- 
  Peter van Dijk
  peter@HIDDEN




Acknowledgement sent to "Peter van Dijk" <peter@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-coreutils@HIDDEN. Full text available.
Report forwarded to bug-coreutils@HIDDEN:
bug#47883; Package coreutils. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Mon, 21 Feb 2022 09:15:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.