GNU bug report logs - #22108
diff wrapper script for very large files, low memory

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: diffutils; Reported by: Taco van Dijk <taco@HIDDEN>; dated Mon, 7 Dec 2015 16:17:02 UTC; Maintainer for diffutils is bug-diffutils@HIDDEN.

Message received at 22108 <at> debbugs.gnu.org:


Received: (at 22108) by debbugs.gnu.org; 2 May 2016 02:00:33 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun May 01 22:00:33 2016
Received: from localhost ([127.0.0.1]:32908 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ax3A0-0003aF-Vc
	for submit <at> debbugs.gnu.org; Sun, 01 May 2016 22:00:33 -0400
Received: from mail-oi0-f46.google.com ([209.85.218.46]:33796)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <meyering@HIDDEN>) id 1ax39z-0003a2-M7
 for 22108 <at> debbugs.gnu.org; Sun, 01 May 2016 22:00:31 -0400
Received: by mail-oi0-f46.google.com with SMTP id k142so175551083oib.1
 for <22108 <at> debbugs.gnu.org>; Sun, 01 May 2016 19:00:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:from:date:message-id:subject:to;
 bh=bVAYSsL3EdSpq9eXQegBlXiB+9i5Xx8kjO+7OwgLaqQ=;
 b=lVfI1AofeIcGfUPRdEFsQeRlcyrMhLcZnA0qYpBP9pLfbs5o8j7fw7O517WBJv6+t+
 HCPYRa9WzWWohlG9L2HVpkDk8rumPq7MR6JlhlxoLSnHJ11uSL+9D0nzftEvjOSa24mV
 eOyxuGW+FucXdzWgiZ2fvZUCt5/TivnMo7tu4p1b3TQWE14NXm1wkguRv1+yJD3g+jKr
 ayz7NxpUFAx4PF3QgtQRKr0ifD10SY2j1ybLzN9Ool3GGREvAdYYbQD/B9gXjSx7PcWw
 oATG3ShnSgM6hMVee0yG5thKOON25NrBWclO8K2ZeW/eUOM9HxQLAo0SQ+TvEwbBnDuO
 aVDw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:sender:from:date:message-id:subject
 :to; bh=bVAYSsL3EdSpq9eXQegBlXiB+9i5Xx8kjO+7OwgLaqQ=;
 b=f7zBlmDYV79OkBLYTXX5FBCHx/oAIUrB/5G1ekxvK0GsnaLflPLTE8Nouf6BmeWxe+
 W3Nva/C1vJfQdWWcB4Exx64W3gDM6gcYZRwa9qMKpDCDWwXiWgw17476v9lLaRXc051Q
 AFkA3Gjp+9Y+slz4r0+6zm1Y/ZfInwx98W1uDI9dqkxbfjiX/LYwcBnKTwX9ow5qDbNZ
 +DjcmaRNOYtTYuHciXnnajwdf4U9IAxFZOqChWHniqXt6tRW9qYmlayJ8/oLItfPT3Fh
 Lo70DwRjeu4SzVbkQC9k0f2C0o7zt9QejTTr1vBTNglF4etPHdLB/Znai+6baejPlg9X
 nszA==
X-Gm-Message-State: AOPr4FXEI/rv1RcEZh4wUJBnEkvc163CQnZt4d0uyATW+FoHlXKzMzx83htTineoQ3HUmEjcuJvZNbtuh6psdA==
X-Received: by 10.157.1.120 with SMTP id 111mr12975022otu.172.1462154426077;
 Sun, 01 May 2016 19:00:26 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.202.175.193 with HTTP; Sun, 1 May 2016 19:00:06 -0700 (PDT)
From: Jim Meyering <jim@HIDDEN>
Date: Sun, 1 May 2016 19:00:06 -0700
X-Google-Sender-Auth: _tb1AnzEmaOMpTRba9IwMK47lIc
Message-ID: <CA+8g5KGN0e8npXT7nJDWMJmy_kYYoOKKYcFawDTNgQ7xiBCrdg@HIDDEN>
Subject: Re: diff wrapper script for very large files, low memory
To: 22108 <at> debbugs.gnu.org, taco@HIDDEN
Content-Type: text/plain; charset=UTF-8
X-Spam-Score: -0.5 (/)
X-Debbugs-Envelope-To: 22108
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.5 (/)

tags 22108 wishlist
done

Thanks for the suggestion and pointer.
FYI, your problem is very similar to that described at http://bugs.gnu.org/21665

I'm marking this auto-created issue as "wishlist".
A combination of this approach and using mmap may be profitable when
input files are too large for available RAM.




Information forwarded to bug-diffutils@HIDDEN:
bug#22108; Package diffutils. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 7 Dec 2015 16:16:31 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Dec 07 11:16:31 2015
Received: from localhost ([127.0.0.1]:41868 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1a5ySj-0006TJ-RS
	for submit <at> debbugs.gnu.org; Mon, 07 Dec 2015 11:16:30 -0500
Received: from eggs.gnu.org ([208.118.235.92]:33389)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <taco@HIDDEN>) id 1a5uEt-00076l-8H
 for submit <at> debbugs.gnu.org; Mon, 07 Dec 2015 06:46:14 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <taco@HIDDEN>) id 1a5uEr-0007ir-UF
 for submit <at> debbugs.gnu.org; Mon, 07 Dec 2015 06:45:54 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled
 version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:54345)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <taco@HIDDEN>) id 1a5uEr-0007in-RA
 for submit <at> debbugs.gnu.org; Mon, 07 Dec 2015 06:45:53 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:43395)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <taco@HIDDEN>) id 1a5uEq-0007o2-Az
 for bug-diffutils@HIDDEN; Mon, 07 Dec 2015 06:45:53 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <taco@HIDDEN>) id 1a5uEl-0007iW-5f
 for bug-diffutils@HIDDEN; Mon, 07 Dec 2015 06:45:52 -0500
Received: from mx.waag.org ([195.169.149.61]:47158 helo=zimbra.waag.org)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <taco@HIDDEN>) id 1a5uEk-0007fZ-Vh
 for bug-diffutils@HIDDEN; Mon, 07 Dec 2015 06:45:47 -0500
Received: from localhost (localhost [127.0.0.1])
 by zimbra.waag.org (Postfix) with ESMTP id DBE212A60197
 for <bug-diffutils@HIDDEN>; Mon,  7 Dec 2015 12:45:51 +0100 (CET)
Received: from zimbra.waag.org ([127.0.0.1])
 by localhost (zimbra.waag.org [127.0.0.1]) (amavisd-new, port 10032)
 with ESMTP id VDypINDgLNGj for <bug-diffutils@HIDDEN>;
 Mon,  7 Dec 2015 12:45:51 +0100 (CET)
Received: from localhost (localhost [127.0.0.1])
 by zimbra.waag.org (Postfix) with ESMTP id 1B1012A60199
 for <bug-diffutils@HIDDEN>; Mon,  7 Dec 2015 12:45:51 +0100 (CET)
X-Virus-Scanned: amavisd-new at zimbra.waag.org
Received: from zimbra.waag.org ([127.0.0.1])
 by localhost (zimbra.waag.org [127.0.0.1]) (amavisd-new, port 10026)
 with ESMTP id ZhFojm8wnyY2 for <bug-diffutils@HIDDEN>;
 Mon,  7 Dec 2015 12:45:50 +0100 (CET)
Received: from zimbra.waag.org (zimbra.waag.org [195.169.149.61])
 by zimbra.waag.org (Postfix) with ESMTP id 9612A2A60197
 for <bug-diffutils@HIDDEN>; Mon,  7 Dec 2015 12:45:50 +0100 (CET)
Date: Mon, 7 Dec 2015 12:45:50 +0100 (CET)
From: Taco van Dijk <taco@HIDDEN>
To: bug-diffutils@HIDDEN
Message-ID: <1787778417.63622490.1449488750054.JavaMail.zimbra@HIDDEN>
In-Reply-To: <1613289466.63616406.1449488225960.JavaMail.zimbra@HIDDEN>
Subject: diff wrapper script for very large files, low memory
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [195.169.149.2]
X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF42 (Mac)/8.0.9_GA_6191)
Thread-Topic: diff wrapper script for very large files, low memory
Thread-Index: MJqCFyiIElu9Esv/E1GQK7OUVzf34w==
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -4.0 (----)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Mon, 07 Dec 2015 11:16:28 -0500
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -4.0 (----)

Hi,

For our current project we faced the following problem;
When trying to compare two large files (2* 4+ Gb) exceeding the RAM of the machine, 
the machine would become unresponsive.

To solve this problem we have found a solution that might be worthwhile sharing, based around xxhash.

For anyone interested, you can find it here.

https://github.com/waagsociety/hashed-diff

Kind regards,

Taco van Dijk & Lodewijk Loos
Waag Society

-- 
PGP: 82EDF574 




Acknowledgement sent to Taco van Dijk <taco@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-diffutils@HIDDEN. Full text available.
Report forwarded to bug-diffutils@HIDDEN:
bug#22108; Package diffutils. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Mon, 25 Nov 2019 12:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.