Paul Eggert <eggert@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.Received: (at 47883) by debbugs.gnu.org; 24 Apr 2021 22:00:30 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Sat Apr 24 18:00:30 2021 Received: from localhost ([127.0.0.1]:41640 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1laQKE-00019Z-2Z for submit <at> debbugs.gnu.org; Sat, 24 Apr 2021 18:00:30 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:54650) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eggert@HIDDEN>) id 1laQKC-00015d-Av for 47883 <at> debbugs.gnu.org; Sat, 24 Apr 2021 18:00:28 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 3554E160099; Sat, 24 Apr 2021 15:00:22 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id ECiNihcWpdFf; Sat, 24 Apr 2021 15:00:21 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 7C1E81600A7; Sat, 24 Apr 2021 15:00:21 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id GkCZxJSlWXnR; Sat, 24 Apr 2021 15:00:21 -0700 (PDT) Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com [172.91.119.151]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 52DF8160099; Sat, 24 Apr 2021 15:00:21 -0700 (PDT) To: L A Walsh <coreutils@HIDDEN> References: <0910fdcb-ec97-45e2-9128-4bbb369d74d9@HIDDEN> <31aa9fcc-4be5-7a13-4682-3c320b46091d@HIDDEN> <608479A6.2000701@HIDDEN> From: Paul Eggert <eggert@HIDDEN> Organization: UCLA Computer Science Department Subject: Re: bug#47883: sort -o loses data when it crashes Message-ID: <c0b6a19e-3537-4a2f-9f3a-3c15b1a67cb8@HIDDEN> Date: Sat, 24 Apr 2021 15:00:21 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: <608479A6.2000701@HIDDEN> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 47883 Cc: 47883 <at> debbugs.gnu.org, peter@HIDDEN X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) As I wrote you privately last month, the coreutils maintainers (who are=20 not me) are pretty busy. The proposed change in bug#47883 would be=20 incompatible with longstanding tradition and would almost certainly=20 break some existing scripts running on GNU/Linux. This is not something=20 to do lightly. It might be possible to come up with a different change that would=20 address the issue raised without being so disruptive. Whatever change=20 (if any) is chosen, someone needs to think it through, code it up,=20 document it, and test it. Although nobody's found the time to do that,=20 perhaps you could volunteer or find someone who could volunteer; that=20 would surely accelerate the process. You mentioned that we have multiple bug reports (now 47059, 47883,=20 48002) on basically the same topic, so I have taken the liberty of=20 merging them.
bug-coreutils@HIDDEN
:bug#47883
; Package coreutils
.
Full text available.Paul Eggert <eggert@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.Received: (at 47883) by debbugs.gnu.org; 22 Apr 2021 02:11:26 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Apr 21 22:11:26 2021 Received: from localhost ([127.0.0.1]:60169 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1lZOoP-000806-QQ for submit <at> debbugs.gnu.org; Wed, 21 Apr 2021 22:11:26 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:36840) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eggert@HIDDEN>) id 1lZOoM-0007zq-89 for 47883 <at> debbugs.gnu.org; Wed, 21 Apr 2021 22:11:24 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 22B56160151; Wed, 21 Apr 2021 19:11:16 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 8cx-1lXBU54o; Wed, 21 Apr 2021 19:11:15 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 5F1C8160152; Wed, 21 Apr 2021 19:11:15 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id EDsuPAQGhvcp; Wed, 21 Apr 2021 19:11:15 -0700 (PDT) Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com [172.91.119.151]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 3CCAE160151; Wed, 21 Apr 2021 19:11:15 -0700 (PDT) To: Peter van Dijk <peter@HIDDEN> References: <0910fdcb-ec97-45e2-9128-4bbb369d74d9@HIDDEN> From: Paul Eggert <eggert@HIDDEN> Organization: UCLA Computer Science Department Subject: Re: bug#47883: sort -o loses data when it crashes Message-ID: <31aa9fcc-4be5-7a13-4682-3c320b46091d@HIDDEN> Date: Wed, 21 Apr 2021 19:11:14 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: <0910fdcb-ec97-45e2-9128-4bbb369d74d9@HIDDEN> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 47883 Cc: 47883 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) On 4/18/21 10:46 AM, Peter van Dijk wrote: > While the manual (but not the manpage) mentions the data loss, I think = it would be great if sort did not have this problem at all, and I think t= he OpenGroup text also says it should not have this problem. I don't know of any 'sort' implementation that does not have the problem=20 at all. For example, FreeBSD 'sort -o file file' can lose 'file' in some=20 (rare) cases. The only portable way to avoid this problem in a shell=20 script is to output to some other file first and make sure that worked,=20 before attempting to replace the input file. Also, I don't see where the Open Group spec says what you're saying. On=20 the contrary, the spec merely says that '-o output' should cause output=20 to be sent to the output file. If there are multiple hard links to the=20 output file, this suggests 'sort' should update the output file's=20 contents without breaking any hard links. Admittedly the Open Group spec=20 is a bit vague in this area, but I certainly don't see anything implying=20 that GNU 'sort' does not conform to POSIX in this area. FreeBSD 'sort' has a problem, in that 'sort -o A B' preserves all hard=20 links to A's file, but 'sort -o A A' does not because it breaks the link=20 from A. That's confusing. Traditional Unix 'sort -o A' behaves the way GNU 'sort' does; it=20 preserves all hard links to A's file. So there is a compatibility=20 argument for doing things the way GNU 'sort' does them, even if that=20 might lead to more data loss in rare cases.
bug-coreutils@HIDDEN
:bug#47883
; Package coreutils
.
Full text available.Received: (at submit) by debbugs.gnu.org; 18 Apr 2021 22:43:13 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Sun Apr 18 18:43:13 2021 Received: from localhost ([127.0.0.1]:48016 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1lYG8H-00062s-8w for submit <at> debbugs.gnu.org; Sun, 18 Apr 2021 18:43:13 -0400 Received: from lists.gnu.org ([209.51.188.17]:60840) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <peter@HIDDEN>) id 1lYBVI-00056v-Fo for submit <at> debbugs.gnu.org; Sun, 18 Apr 2021 13:46:41 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:53736) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <peter@HIDDEN>) id 1lYBVI-0005oO-64 for bug-coreutils@HIDDEN; Sun, 18 Apr 2021 13:46:40 -0400 Received: from wout3-smtp.messagingengine.com ([64.147.123.19]:45469) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <peter@HIDDEN>) id 1lYBVE-0003HQ-Q0 for bug-coreutils@HIDDEN; Sun, 18 Apr 2021 13:46:39 -0400 Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.west.internal (Postfix) with ESMTP id 53B501B08 for <bug-coreutils@HIDDEN>; Sun, 18 Apr 2021 13:46:33 -0400 (EDT) Received: from imap2 ([10.202.2.52]) by compute1.internal (MEProxy); Sun, 18 Apr 2021 13:46:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-type:date:from:message-id :mime-version:subject:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; bh=O9UtlCiw1SjqQlQ05rvoRLfPGyMP6 Z9vIN+IVhCS9/I=; b=jJPRdlRbZ3GmKvsqtDDzz1wIjWdWjNBz3Xemd3XFDHLri IxVYKe1l6imchYA3x1VVrIjnskTcgiVKaTWIH2onRFwMd69Xpq4pkzttiq5PX9Jd VCAqpa1jKUec73jC3gK68DkrWM3x/i0kMn//wC4pAIBSlpAwcsoEE8eqPJqQY2sE R4HE9BzmJ92P3dF8A5LPm/MTZwL3f7tNjMIooopTIR9NWEqFMbxt20F253hzg6eo x68TuUxzx6CBxUnHwompldvlOioMR4r67yGLijiZlfNKg+VYhT39iWaCLPdY9gY5 EiIHvLgh4Gi/JPztvrZTB9ZUqLeVJTUK1DYSdJCkg== X-ME-Sender: <xms:eHB8YOoZSiZct_ncCfHlLnpTzC1TwQ9PARHxG6AIE6-kBcVZ37W9nQ> <xme:eHB8YMpgg_iUPUTsZG9iU-RPuEo4zlaDmibYP7B5MmAd3jU_OhnvIN2QjLro_2CNV EKXuOdqlNmmBe69BA> X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrvddtuddgvdehucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefofgggkfffhffvufgtsehttdertd erredtnecuhfhrohhmpedfrfgvthgvrhcuvhgrnhcuffhijhhkfdcuoehpvghtvghrseej sghithhsrdhnlheqnecuggftrfgrthhtvghrnhepfeeuheeigfdthfejhefgleevhfdvje fgleelteelgeehieejhfektefhgeehfeeinecuffhomhgrihhnpehophgvnhhgrhhouhhp rdhorhhgpdhgnhhurdhorhhgnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpe hmrghilhhfrhhomhepphgvthgvrhesjegsihhtshdrnhhl X-ME-Proxy: <xmx:eHB8YDMKfja0wGKyEjhWM7FySltQYI8OG_eVatq8V8O7d2fFgt_NUw> <xmx:eHB8YN6Vu1YBpAf7okGoFZsLMXsyC01IEl0IYhusKpgyCmtHT7Bhrw> <xmx:eHB8YN4HcF-kppElwHvgkqx3Bv4RVbrA0gs66ZCZtIK_j9bImqEf7A> <xmx:eHB8YIIAZ14OANCicIeeIMJ5E9_oCmg5-7I4PM3zWijlvx8cfH0zsg> Received: by mailuser.nyi.internal (Postfix, from userid 501) id 84F06A00079; Sun, 18 Apr 2021 13:46:32 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.5.0-alpha0-273-g8500d2492d-fm-20210323.002-g8500d249 Mime-Version: 1.0 Message-Id: <0910fdcb-ec97-45e2-9128-4bbb369d74d9@HIDDEN> Date: Sun, 18 Apr 2021 19:46:01 +0200 From: "Peter van Dijk" <peter@HIDDEN> To: bug-coreutils@HIDDEN Subject: sort -o loses data when it crashes Content-Type: text/plain Received-SPF: none client-ip=64.147.123.19; envelope-from=peter@HIDDEN; helo=wout3-smtp.messagingengine.com X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_NONE=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sun, 18 Apr 2021 18:43:11 -0400 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html: -o output Specify the name of an output file to be used instead of the standard output. This file can be the same as one of the input files. https://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html: "data may be lost if the system crashes or sort encounters an I/O or other serious error while a file is being sorted in place" and "sort with --merge (-m) can open the output file before reading all input" While the manual (but not the manpage) mentions the data loss, I think it would be great if sort did not have this problem at all, and I think the OpenGroup text also says it should not have this problem. I looked around, and a lot of software does get this right (by opening a randomly-named temp file to write to, and only moving it into place when it is written successfuly) - GNU sed -i, OpenBSD sort, and surely there are more. As a bonus, doing this would also make the `-o someinputfile -m` case safe. Reproduction of the data loss is easy: $ seq 10000 > 10000 ; prlimit --fsize=10 sort -R -o 10000 10000 ; wc -l 10000 File size limit exceeded (core dumped) 2 10000 (coreutils shuf has the same problem even though not all code appears to be shared - for example, sorts open the file for writing even before it opens it for reading, while shuf reverses the order of those two operations. That difference makes no difference in the effect, though.) -- Peter van Dijk peter@HIDDEN
"Peter van Dijk" <peter@HIDDEN>
:bug-coreutils@HIDDEN
.
Full text available.bug-coreutils@HIDDEN
:bug#47883
; Package coreutils
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.