X-Loop: help-debbugs@HIDDEN Subject: bug#46169: Parallelize merge sort Resent-From: Ole Tange <tange@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-coreutils@HIDDEN Resent-Date: Fri, 29 Jan 2021 09:08:02 +0000 Resent-Message-ID: <handler.46169.B.161191125228513 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: report 46169 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 46169 <at> debbugs.gnu.org X-Debbugs-Original-To: bug-coreutils@HIDDEN Received: via spool by submit <at> debbugs.gnu.org id=B.161191125228513 (code B ref -1); Fri, 29 Jan 2021 09:08:02 +0000 Received: (at submit) by debbugs.gnu.org; 29 Jan 2021 09:07:32 +0000 Received: from localhost ([127.0.0.1]:51147 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1l5PkZ-0007Pp-Ve for submit <at> debbugs.gnu.org; Fri, 29 Jan 2021 04:07:32 -0500 Received: from lists.gnu.org ([209.51.188.17]:42248) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <ole.tange@HIDDEN>) id 1l5PkW-0007Pd-8g for submit <at> debbugs.gnu.org; Fri, 29 Jan 2021 04:07:30 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:41866) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <ole.tange@HIDDEN>) id 1l5PkV-0007Cs-KS for bug-coreutils@HIDDEN; Fri, 29 Jan 2021 04:07:27 -0500 Received: from mail-ot1-f43.google.com ([209.85.210.43]:45769) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <ole.tange@HIDDEN>) id 1l5PkT-0000CU-Ul for bug-coreutils@HIDDEN; Fri, 29 Jan 2021 04:07:27 -0500 Received: by mail-ot1-f43.google.com with SMTP id n42so7962828ota.12 for <bug-coreutils@HIDDEN>; Fri, 29 Jan 2021 01:07:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=C3pWjYX3HBGjH008hGvDvXBF7PpbTOkeNOImYlYRKTY=; b=bituOmv/Xto1N6IdxSazknTi3o88oEl0Ec/mdO9lQISrXhpa21K/h5GZRbP+8m7ni5 jVCjYX/2bV5MYQmovqanHMc+xy7Gpo9XfaE5/SLzIB+HcrU91tDD6TSucFPjRjPmZDwL b3WvW2iEaCgqVNpeE1o7+zYnhOIAEZOSIK0NmV+Xen/bs4OmJ9s23g/W1vwiY1MOCfPA pJakVO/Vudn2ZBRyMlgVODMclexXMCtlHMhB295ZBDFeHvFZRX8tjH6JCeEZoAjMFefQ h6KTVp6qCmyJlnvzA2NCXQl7hzkyJMkSsiacbEmaza7sUriKVxH8agpSnREzEWQ6Bm4b SvlA== X-Gm-Message-State: AOAM531p5B8/wOW9/gxBpgcvdiivK5Rb7tVIX3Qv4Z2CRMg+ictMoUmK 7ESq8smfOdvZfK9q+wiycICKPcTJ/9x/Ud89HFKWeTN0wCU= X-Google-Smtp-Source: ABdhPJwXzci+BZ8iZefoVlOgL1VGki6+A7syhulbq3zHSYKV5sm5Ss/CX1HcV2IRUT1+ZHSXGK+s7v8bGy6qYSdwZy8= X-Received: by 2002:a05:6830:1ac3:: with SMTP id r3mr2552641otc.363.1611911244129; Fri, 29 Jan 2021 01:07:24 -0800 (PST) MIME-Version: 1.0 From: Ole Tange <tange@HIDDEN> Date: Fri, 29 Jan 2021 10:07:12 +0100 Message-ID: <CA+4vN7w1kiqihiTBB0LwKuC4Kav+ENUjaf3fDhr-7dpuQT5XmA@HIDDEN> Content-Type: text/plain; charset="UTF-8" Received-SPF: pass client-ip=209.85.210.43; envelope-from=ole.tange@HIDDEN; helo=mail-ot1-f43.google.com X-Spam_score_int: 10 X-Spam_score: 1.0 X-Spam_bar: + X-Spam_report: (1.0 / 5.0 requ) BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, TO_NO_BRKTS_PCNT=2.403 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: 1.6 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: During the past 6 months I have been using GNU sort on a lot of files in the GByte range on my 64 core machine. I have discovered, that GNU sort is bad at using all the cores. I have built parsort (now distributed as part of GNU parallel) as a small wrapper around GNU sort and it performs 2-3x better than GNU sort for files of this size on this machine. Content analysis details: (1.6 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at https://www.dnswl.org/, medium trust [209.51.188.17 listed in list.dnswl.org] -0.0 SPF_HELO_PASS SPF: HELO matches SPF record 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (ole.tange[at]gmail.com) 1.0 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 0.2 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail domains are different 0.0 RCVD_IN_MSPIKE_H4 RBL: Very Good reputation (+4) [209.51.188.17 listed in wl.mailspike.net] 0.2 FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and EnvelopeFrom freemail headers are different 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders 2.4 TO_NO_BRKTS_PCNT To: lacks brackets + percentage X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.8 (-) During the past 6 months I have been using GNU sort on a lot of files in the GByte range on my 64 core machine. I have discovered, that GNU sort is bad at using all the cores. I have built parsort (now distributed as part of GNU parallel) as a small wrapper around GNU sort and it performs 2-3x better than GNU sort for files of this size on this machine. The first step works great in GNU sort: Here you can really use all your cores. But after the first step, it seems GNU sort does a merge sort of all the results. And this pegs a single CPU at 100% while the rest of the cores are pretty lazy. It seems the merging is not parallelized. parsort does that a little better than GNU sort. Instead of doing a k-way merge at the final step, it does a 2-way merge on each core. It still has to a 2-way merge of all data on a single core in the end, so it is not even close to optimal. Wikipedia describes a way of doing parallel merge: https://en.wikipedia.org/wiki/Merge_sort#Parallel_multiway_merge_sort I am no C-coder, and I respect that we want coreutils to be written in C. Unfortunately that means I cannot really help doing the actual coding (at least not in a way that would not introduce a plethora of bugs). But looking at the pseudocode https://en.wikipedia.org/wiki/Merge_sort#Pseudocode it does not look that hard to implement. This method requires all data to fit in memory, but there are even more advanced methods (such as https://www.sciencedirect.com/science/article/pii/S0304397500002875) that would not require this. Could you consider implementing a parallel merge, so I can retire parsort? (Also: Thanks for maintaining a vital piece of software). /Ole
Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) Content-Type: text/plain; charset=utf-8 X-Loop: help-debbugs@HIDDEN From: help-debbugs@HIDDEN (GNU bug Tracking System) To: Ole Tange <tange@HIDDEN> Subject: bug#46169: Acknowledgement (Parallelize merge sort) Message-ID: <handler.46169.B.161191125228513.ack <at> debbugs.gnu.org> References: <CA+4vN7w1kiqihiTBB0LwKuC4Kav+ENUjaf3fDhr-7dpuQT5XmA@HIDDEN> X-Gnu-PR-Message: ack 46169 X-Gnu-PR-Package: coreutils Reply-To: 46169 <at> debbugs.gnu.org Date: Fri, 29 Jan 2021 09:08:02 +0000 Thank you for filing a new bug report with debbugs.gnu.org. This is an automatically generated reply to let you know your message has been received. Your message is being forwarded to the package maintainers and other interested parties for their attention; they will reply in due course. Your message has been sent to the package maintainer(s): bug-coreutils@HIDDEN If you wish to submit further information on this problem, please send it to 46169 <at> debbugs.gnu.org. Please do not send mail to help-debbugs@HIDDEN unless you wish to report a problem with the Bug-tracking system. --=20 46169: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D46169 GNU Bug Tracking System Contact help-debbugs@HIDDEN with problems
X-Loop: help-debbugs@HIDDEN Subject: bug#46169: Parallelize merge sort Resent-From: Paul Eggert <eggert@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-coreutils@HIDDEN Resent-Date: Fri, 29 Jan 2021 22:20:01 +0000 Resent-Message-ID: <handler.46169.B46169.161195877229364 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 46169 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Ole Tange <tange@HIDDEN> Cc: 46169 <at> debbugs.gnu.org Received: via spool by 46169-submit <at> debbugs.gnu.org id=B46169.161195877229364 (code B ref 46169); Fri, 29 Jan 2021 22:20:01 +0000 Received: (at 46169) by debbugs.gnu.org; 29 Jan 2021 22:19:32 +0000 Received: from localhost ([127.0.0.1]:53131 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1l5c71-0007dX-QL for submit <at> debbugs.gnu.org; Fri, 29 Jan 2021 17:19:32 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:40614) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eggert@HIDDEN>) id 1l5c70-0007dG-EJ for 46169 <at> debbugs.gnu.org; Fri, 29 Jan 2021 17:19:31 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id CBF211600F8; Fri, 29 Jan 2021 14:19:24 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id tHsb35WeeiDk; Fri, 29 Jan 2021 14:19:24 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 0504416013E; Fri, 29 Jan 2021 14:19:24 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id p_MehN53Xns6; Fri, 29 Jan 2021 14:19:23 -0800 (PST) Received: from [192.168.1.9] (cpe-23-243-218-95.socal.res.rr.com [23.243.218.95]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id D4E871600F8; Fri, 29 Jan 2021 14:19:23 -0800 (PST) References: <CA+4vN7w1kiqihiTBB0LwKuC4Kav+ENUjaf3fDhr-7dpuQT5XmA@HIDDEN> From: Paul Eggert <eggert@HIDDEN> Organization: UCLA Computer Science Department Message-ID: <84c45bc2-402a-70a1-bd7e-8523bf69eb77@HIDDEN> Date: Fri, 29 Jan 2021 14:19:23 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.6.1 MIME-Version: 1.0 In-Reply-To: <CA+4vN7w1kiqihiTBB0LwKuC4Kav+ENUjaf3fDhr-7dpuQT5XmA@HIDDEN> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -0.7 (/) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.7 (-) On 1/29/21 1:07 AM, Ole Tange wrote: > Could you consider implementing a parallel merge, so I can retire > parsort? Yes, improving that part of 'sort' performance has been on my long list of things to do for quite some time. If someone else could take up the task it'd be done quicker.... Anyway, thanks for reporting it as a bug, so that we can track it in our bug database.
Received: (at control) by debbugs.gnu.org; 21 Feb 2022 10:07:19 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Feb 21 05:07:19 2022 Received: from localhost ([127.0.0.1]:35125 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1nM5bC-000710-UX for submit <at> debbugs.gnu.org; Mon, 21 Feb 2022 05:07:19 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:50900) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eggert@HIDDEN>) id 1nM5bA-00070m-Li for control <at> debbugs.gnu.org; Mon, 21 Feb 2022 05:07:17 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id CBC83160103 for <control <at> debbugs.gnu.org>; Mon, 21 Feb 2022 02:07:10 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id pU8ggTA_GIUg for <control <at> debbugs.gnu.org>; Mon, 21 Feb 2022 02:07:10 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 39B66160106 for <control <at> debbugs.gnu.org>; Mon, 21 Feb 2022 02:07:10 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id YTtUGD4EEzS5 for <control <at> debbugs.gnu.org>; Mon, 21 Feb 2022 02:07:10 -0800 (PST) Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com [172.91.119.151]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 1812E160103 for <control <at> debbugs.gnu.org>; Mon, 21 Feb 2022 02:07:10 -0800 (PST) Message-ID: <9bcb9cf6-f624-45be-a13b-1f7753d938db@HIDDEN> Date: Mon, 21 Feb 2022 02:07:09 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 Content-Language: en-US To: control <at> debbugs.gnu.org From: Paul Eggert <eggert@HIDDEN> Organization: UCLA Computer Science Department Subject: coreutils bugs Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) tags 47085 notabug close 47085 close 47014 severity 46346 wishlist severity 46169 wishlist severity 45924 wishlist tags 45700 notabug close 45700
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.