GNU bug report logs - #16471
New entropy coding: faster than Huffman, compression rate like arithmetic

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: gzip; Severity: wishlist; Reported by: Jarosław Duda <dudaj@HIDDEN>; dated Thu, 16 Jan 2014 23:52:01 UTC; Maintainer for gzip is bug-gzip@HIDDEN.
Severity set to 'wishlist' from 'normal' Request was from Paul Eggert <eggert@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 16 Jan 2014 23:51:25 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 16 18:51:25 2014
Received: from localhost ([127.0.0.1]:54196 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1W3wia-0003Qf-Ej
	for submit <at> debbugs.gnu.org; Thu, 16 Jan 2014 18:51:24 -0500
Received: from eggs.gnu.org ([208.118.235.92]:48647)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <dudaj@HIDDEN>) id 1W3wfB-0003GQ-Iy
 for submit <at> debbugs.gnu.org; Thu, 16 Jan 2014 18:47:54 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <dudaj@HIDDEN>) id 1W3wf6-0006hF-OK
 for submit <at> debbugs.gnu.org; Thu, 16 Jan 2014 18:47:53 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled
 version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:58383)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <dudaj@HIDDEN>) id 1W3wf6-0006hB-Kv
 for submit <at> debbugs.gnu.org; Thu, 16 Jan 2014 18:47:48 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:50339)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <dudaj@HIDDEN>) id 1W3wf2-0004dc-6w
 for bug-gzip@HIDDEN; Thu, 16 Jan 2014 18:47:48 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <dudaj@HIDDEN>) id 1W3wex-0006gJ-UF
 for bug-gzip@HIDDEN; Thu, 16 Jan 2014 18:47:44 -0500
Received: from mailhub129.itcs.purdue.edu ([128.210.5.129]:53790)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <dudaj@HIDDEN>) id 1W3wex-0006g5-OS
 for bug-gzip@HIDDEN; Thu, 16 Jan 2014 18:47:39 -0500
Received: from [10.184.172.48] (pal-nat184-172-048.itap.purdue.edu
 [10.184.172.48]) (authenticated bits=0)
 by mailhub129.itcs.purdue.edu (8.14.4/8.14.4/mta-auth.smtp.purdue.edu) with
 ESMTP id s0GNlcWW032218
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT)
 for <bug-gzip@HIDDEN>; Thu, 16 Jan 2014 18:47:38 -0500
Message-ID: <52D86F98.1030301@HIDDEN>
Date: Thu, 16 Jan 2014 18:47:36 -0500
From: =?UTF-8?B?SmFyb3PFgmF3IER1ZGE=?= <dudaj@HIDDEN>
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64;
 rv:17.0) Gecko/20130801 Thunderbird/17.0.8
MIME-Version: 1.0
To: bug-gzip@HIDDEN
Subject: New entropy coding: faster than Huffman,
 compression rate like arithmetic
Content-Type: text/plain; charset=UTF-8; format=flowed
X-PMX-Version: 6.0.2.2308539
X-PerlMx-Virus-Scanned: Yes
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by
 mailhub129.itcs.purdue.edu id s0GNlcWW032218
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
 recognized.
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
 (bad octet value).
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -5.0 (-----)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Thu, 16 Jan 2014 18:51:22 -0500
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -5.0 (-----)

Hallo,

Since I haven't found gzip development mailing list, I am sending it here.
There is a new approach to entropy coding: asymmetric numeral systems.=20
It has some connection with arithmetic coding, but is simpler: uses a=20
single natural number as the state, instead of two to represent the=20
range. Thanks of it, we can put the entire behavior for given large=20
alphabet probability distribution into a relatively small table: a few=20
kilobytes for 256 size alphabet.
This way it turns out about 50% faster than Huffman decoding, still=20
providing accuracy (compression rate) like arithmetic coding.

Here is some implementation: https://github.com/Cyan4973/FiniteStateEntro=
py
Just replacing Huffman with it, we get improvement of both speed and=20
compression rate, like recently in Zhuff of Yann Collet:=20
http://fastcompression.blogspot.fr/2013/12/zhuff-v09-or-first-fse-applica=
tion.html
I thought it could also improve gzip DEFLATE in the same way?

Best,
Jarek

-- dr Jaros=C5=82aw Duda
Center for Science of Information, Purdue University, USA
http://www.soihub.org/people.php?id=3D484





Acknowledgement sent to Jarosław Duda <dudaj@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-gzip@HIDDEN. Full text available.
Report forwarded to bug-gzip@HIDDEN:
bug#16471; Package gzip. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Mon, 10 Nov 2014 18:45:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.