X-Loop: help-debbugs@HIDDEN
Subject: bug#22108: diff wrapper script for very large files, low memory
Resent-From: Taco van Dijk <taco@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-diffutils@HIDDEN
Resent-Date: Mon, 07 Dec 2015 16:17:02 +0000
Resent-Message-ID: <handler.22108.B.144950499124887 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: report 22108
X-GNU-PR-Package: diffutils
X-GNU-PR-Keywords:
To: 22108 <at> debbugs.gnu.org
X-Debbugs-Original-To: bug-diffutils@HIDDEN
Received: via spool by submit <at> debbugs.gnu.org id=B.144950499124887
(code B ref -1); Mon, 07 Dec 2015 16:17:02 +0000
Received: (at submit) by debbugs.gnu.org; 7 Dec 2015 16:16:31 +0000
Received: from localhost ([127.0.0.1]:41868 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.80)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1a5ySj-0006TJ-RS
for submit <at> debbugs.gnu.org; Mon, 07 Dec 2015 11:16:30 -0500
Received: from eggs.gnu.org ([208.118.235.92]:33389)
by debbugs.gnu.org with esmtp (Exim 4.80)
(envelope-from <taco@HIDDEN>) id 1a5uEt-00076l-8H
for submit <at> debbugs.gnu.org; Mon, 07 Dec 2015 06:46:14 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from <taco@HIDDEN>) id 1a5uEr-0007ir-UF
for submit <at> debbugs.gnu.org; Mon, 07 Dec 2015 06:45:54 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level:
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled
version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:54345)
by eggs.gnu.org with esmtp (Exim 4.71)
(envelope-from <taco@HIDDEN>) id 1a5uEr-0007in-RA
for submit <at> debbugs.gnu.org; Mon, 07 Dec 2015 06:45:53 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:43395)
by lists.gnu.org with esmtp (Exim 4.71)
(envelope-from <taco@HIDDEN>) id 1a5uEq-0007o2-Az
for bug-diffutils@HIDDEN; Mon, 07 Dec 2015 06:45:53 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from <taco@HIDDEN>) id 1a5uEl-0007iW-5f
for bug-diffutils@HIDDEN; Mon, 07 Dec 2015 06:45:52 -0500
Received: from mx.waag.org ([195.169.149.61]:47158 helo=zimbra.waag.org)
by eggs.gnu.org with esmtp (Exim 4.71)
(envelope-from <taco@HIDDEN>) id 1a5uEk-0007fZ-Vh
for bug-diffutils@HIDDEN; Mon, 07 Dec 2015 06:45:47 -0500
Received: from localhost (localhost [127.0.0.1])
by zimbra.waag.org (Postfix) with ESMTP id DBE212A60197
for <bug-diffutils@HIDDEN>; Mon, 7 Dec 2015 12:45:51 +0100 (CET)
Received: from zimbra.waag.org ([127.0.0.1])
by localhost (zimbra.waag.org [127.0.0.1]) (amavisd-new, port 10032)
with ESMTP id VDypINDgLNGj for <bug-diffutils@HIDDEN>;
Mon, 7 Dec 2015 12:45:51 +0100 (CET)
Received: from localhost (localhost [127.0.0.1])
by zimbra.waag.org (Postfix) with ESMTP id 1B1012A60199
for <bug-diffutils@HIDDEN>; Mon, 7 Dec 2015 12:45:51 +0100 (CET)
X-Virus-Scanned: amavisd-new at zimbra.waag.org
Received: from zimbra.waag.org ([127.0.0.1])
by localhost (zimbra.waag.org [127.0.0.1]) (amavisd-new, port 10026)
with ESMTP id ZhFojm8wnyY2 for <bug-diffutils@HIDDEN>;
Mon, 7 Dec 2015 12:45:50 +0100 (CET)
Received: from zimbra.waag.org (zimbra.waag.org [195.169.149.61])
by zimbra.waag.org (Postfix) with ESMTP id 9612A2A60197
for <bug-diffutils@HIDDEN>; Mon, 7 Dec 2015 12:45:50 +0100 (CET)
Date: Mon, 7 Dec 2015 12:45:50 +0100 (CET)
From: Taco van Dijk <taco@HIDDEN>
Message-ID: <1787778417.63622490.1449488750054.JavaMail.zimbra@HIDDEN>
In-Reply-To: <1613289466.63616406.1449488225960.JavaMail.zimbra@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [195.169.149.2]
X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF42 (Mac)/8.0.9_GA_6191)
Thread-Topic: diff wrapper script for very large files, low memory
Thread-Index: MJqCFyiIElu9Esv/E1GQK7OUVzf34w==
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -4.0 (----)
X-Mailman-Approved-At: Mon, 07 Dec 2015 11:16:28 -0500
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -4.0 (----)
Hi,
For our current project we faced the following problem;
When trying to compare two large files (2* 4+ Gb) exceeding the RAM of the machine,
the machine would become unresponsive.
To solve this problem we have found a solution that might be worthwhile sharing, based around xxhash.
For anyone interested, you can find it here.
https://github.com/waagsociety/hashed-diff
Kind regards,
Taco van Dijk & Lodewijk Loos
Waag Society
--
PGP: 82EDF574
Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.503 (Entity 5.503) Content-Type: text/plain; charset=utf-8 X-Loop: help-debbugs@HIDDEN From: help-debbugs@HIDDEN (GNU bug Tracking System) To: Taco van Dijk <taco@HIDDEN> Subject: bug#22108: Acknowledgement (diff wrapper script for very large files, low memory) Message-ID: <handler.22108.B.144950499124887.ack <at> debbugs.gnu.org> References: <1787778417.63622490.1449488750054.JavaMail.zimbra@HIDDEN> X-Gnu-PR-Message: ack 22108 X-Gnu-PR-Package: diffutils Reply-To: 22108 <at> debbugs.gnu.org Date: Mon, 07 Dec 2015 16:17:02 +0000 Thank you for filing a new bug report with debbugs.gnu.org. This is an automatically generated reply to let you know your message has been received. Your message is being forwarded to the package maintainers and other interested parties for their attention; they will reply in due course. Your message has been sent to the package maintainer(s): bug-diffutils@HIDDEN If you wish to submit further information on this problem, please send it to 22108 <at> debbugs.gnu.org. Please do not send mail to help-debbugs@HIDDEN unless you wish to report a problem with the Bug-tracking system. --=20 22108: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D22108 GNU Bug Tracking System Contact help-debbugs@HIDDEN with problems
X-Loop: help-debbugs@HIDDEN
Subject: bug#22108: diff wrapper script for very large files, low memory
References: <1787778417.63622490.1449488750054.JavaMail.zimbra@HIDDEN>
In-Reply-To: <1787778417.63622490.1449488750054.JavaMail.zimbra@HIDDEN>
Resent-From: Jim Meyering <jim@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-diffutils@HIDDEN
Resent-Date: Mon, 02 May 2016 02:01:02 +0000
Resent-Message-ID: <handler.22108.B22108.146215443313783 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 22108
X-GNU-PR-Package: diffutils
X-GNU-PR-Keywords:
To: 22108 <at> debbugs.gnu.org, taco@HIDDEN
Received: via spool by 22108-submit <at> debbugs.gnu.org id=B22108.146215443313783
(code B ref 22108); Mon, 02 May 2016 02:01:02 +0000
Received: (at 22108) by debbugs.gnu.org; 2 May 2016 02:00:33 +0000
Received: from localhost ([127.0.0.1]:32908 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1ax3A0-0003aF-Vc
for submit <at> debbugs.gnu.org; Sun, 01 May 2016 22:00:33 -0400
Received: from mail-oi0-f46.google.com ([209.85.218.46]:33796)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <meyering@HIDDEN>) id 1ax39z-0003a2-M7
for 22108 <at> debbugs.gnu.org; Sun, 01 May 2016 22:00:31 -0400
Received: by mail-oi0-f46.google.com with SMTP id k142so175551083oib.1
for <22108 <at> debbugs.gnu.org>; Sun, 01 May 2016 19:00:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
h=mime-version:sender:from:date:message-id:subject:to;
bh=bVAYSsL3EdSpq9eXQegBlXiB+9i5Xx8kjO+7OwgLaqQ=;
b=lVfI1AofeIcGfUPRdEFsQeRlcyrMhLcZnA0qYpBP9pLfbs5o8j7fw7O517WBJv6+t+
HCPYRa9WzWWohlG9L2HVpkDk8rumPq7MR6JlhlxoLSnHJ11uSL+9D0nzftEvjOSa24mV
eOyxuGW+FucXdzWgiZ2fvZUCt5/TivnMo7tu4p1b3TQWE14NXm1wkguRv1+yJD3g+jKr
ayz7NxpUFAx4PF3QgtQRKr0ifD10SY2j1ybLzN9Ool3GGREvAdYYbQD/B9gXjSx7PcWw
oATG3ShnSgM6hMVee0yG5thKOON25NrBWclO8K2ZeW/eUOM9HxQLAo0SQ+TvEwbBnDuO
aVDw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20130820;
h=x-gm-message-state:mime-version:sender:from:date:message-id:subject
:to; bh=bVAYSsL3EdSpq9eXQegBlXiB+9i5Xx8kjO+7OwgLaqQ=;
b=f7zBlmDYV79OkBLYTXX5FBCHx/oAIUrB/5G1ekxvK0GsnaLflPLTE8Nouf6BmeWxe+
W3Nva/C1vJfQdWWcB4Exx64W3gDM6gcYZRwa9qMKpDCDWwXiWgw17476v9lLaRXc051Q
AFkA3Gjp+9Y+slz4r0+6zm1Y/ZfInwx98W1uDI9dqkxbfjiX/LYwcBnKTwX9ow5qDbNZ
+DjcmaRNOYtTYuHciXnnajwdf4U9IAxFZOqChWHniqXt6tRW9qYmlayJ8/oLItfPT3Fh
Lo70DwRjeu4SzVbkQC9k0f2C0o7zt9QejTTr1vBTNglF4etPHdLB/Znai+6baejPlg9X
nszA==
X-Gm-Message-State: AOPr4FXEI/rv1RcEZh4wUJBnEkvc163CQnZt4d0uyATW+FoHlXKzMzx83htTineoQ3HUmEjcuJvZNbtuh6psdA==
X-Received: by 10.157.1.120 with SMTP id 111mr12975022otu.172.1462154426077;
Sun, 01 May 2016 19:00:26 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.202.175.193 with HTTP; Sun, 1 May 2016 19:00:06 -0700 (PDT)
From: Jim Meyering <jim@HIDDEN>
Date: Sun, 1 May 2016 19:00:06 -0700
X-Google-Sender-Auth: _tb1AnzEmaOMpTRba9IwMK47lIc
Message-ID: <CA+8g5KGN0e8npXT7nJDWMJmy_kYYoOKKYcFawDTNgQ7xiBCrdg@HIDDEN>
Content-Type: text/plain; charset=UTF-8
X-Spam-Score: -0.5 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.5 (/)
tags 22108 wishlist
done
Thanks for the suggestion and pointer.
FYI, your problem is very similar to that described at http://bugs.gnu.org/21665
I'm marking this auto-created issue as "wishlist".
A combination of this approach and using mmap may be profitable when
input files are too large for available RAM.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.