GNU bug report logs - #8519
24.0.50; doc-view: allow pdftotext -layout instead of -raw

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: emacs; Reported by: trentbuck@HIDDEN (Trent W. Buck); dated Mon, 18 Apr 2011 09:24:03 UTC; Maintainer for emacs is bug-gnu-emacs@HIDDEN.

Message received at 8519 <at> debbugs.gnu.org:


Received: (at 8519) by debbugs.gnu.org; 30 Jun 2011 22:07:22 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jun 30 18:07:21 2011
Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1QcPOL-0003KG-8w
	for submit <at> debbugs.gnu.org; Thu, 30 Jun 2011 18:07:21 -0400
Received: from smtp-out4.starman.ee ([85.253.0.6] helo=mx2.starman.ee)
	by debbugs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <juri@HIDDEN>) id 1QcPOJ-0003K3-7m
	for 8519 <at> debbugs.gnu.org; Thu, 30 Jun 2011 18:07:20 -0400
X-Virus-Scanned: by Amavisd-New at mx2.starman.ee
Received: from mail.starman.ee (62.65.210.87.cable.starman.ee [62.65.210.87])
	by mx2.starman.ee (Postfix) with ESMTP id 2176D3F40BC
	for <8519 <at> debbugs.gnu.org>; Fri,  1 Jul 2011 01:07:11 +0300 (EEST)
From: Juri Linkov <juri@HIDDEN>
To: 8519 <at> debbugs.gnu.org
Subject: Re: 24.0.50; doc-view: allow pdftotext -layout instead of -raw
Organization: JURTA
References: <87fwpf7w13.fsf@HIDDEN>
Date: Fri, 01 Jul 2011 00:57:45 +0300
In-Reply-To: <87fwpf7w13.fsf@HIDDEN> (Trent W. Buck's message of "Mon,
	18 Apr 2011 19:23:36 +1000")
Message-ID: <87k4c33rqq.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (x86_64-pc-linux-gnu)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Score: -3.2 (---)
X-Debbugs-Envelope-To: 8519
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.11
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/pipermail/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -3.2 (---)

> doc-view supports using pdftotext on ttys.
> Unfortunately it is hard-coded to pass -raw.
> I would prefer to pass -layout.
>
> Please modify doc-view to allow me to support something like
>
>     (setq doc-view-pdftotext-program-args '("-layout" "-nopgbrk"))

I came across the same need and found this bug report.

I think doc-view should also support other free software
that processes PDF files:

1. pdftk

pdftk is able to extract the PDF metadata (title, author, bookmarks, etc.),
e.g.

    pdftk file1.pdf dump_data output file1.txt

So for a large PDF document, doc-view could present the
Table of Contents where the user can navigate to the selected page,
and then convert only displayed pages instead of all pages
that is terribly slow for a 1000-page document.

pdftk also can prepare the PDF text for editing in emacs.
From `man pdftk':

  -compress useful when you want to edit PDF code
            in a text editor like vim or emacs.

   Uncompress PDF page streams for editing the PDF
   in a text editor (e.g., vim, emacs):

       pdftk doc.pdf output doc.unc.pdf uncompress

This feature could be used after typing `C-c C-c'.

Since pdftk is dependent on Java, doc-view should not require it
and should be able to detect the installed PDF processing programs
(with e.g. `(executable-find "pdftk")') and select one of them
according to the user's priority list.

2. A better program is `qpdf'. It has no problems mentioned above.
So doc-view should also detect the availability of
`(executable-find "qpdf")' as well and provide the same option for its
command line arguments (and use all features relevant to doc-view).

3. Using the PDF rendering library `poppler,' it's possible
to implement in Emacs a PDF viewer like `apvlv' for Vim.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs@HIDDEN:
bug#8519; Package emacs. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 18 Apr 2011 09:23:55 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Apr 18 05:23:55 2011
Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1QBkgV-0002s3-2S
	for submit <at> debbugs.gnu.org; Mon, 18 Apr 2011 05:23:55 -0400
Received: from eggs.gnu.org ([140.186.70.92])
	by debbugs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <trentbuck@HIDDEN>) id 1QBkgS-0002rr-Pm
	for submit <at> debbugs.gnu.org; Mon, 18 Apr 2011 05:23:53 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <trentbuck@HIDDEN>) id 1QBkgM-00053B-Hm
	for submit <at> debbugs.gnu.org; Mon, 18 Apr 2011 05:23:47 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM,
	RCVD_IN_DNSWL_LOW, RFC_ABUSE_POST, T_DKIM_INVALID,
	T_TO_NO_BRKTS_FREEMAIL autolearn=unavailable version=3.3.1
Received: from lists.gnu.org ([140.186.70.17]:45235)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <trentbuck@HIDDEN>) id 1QBkgM-000537-GB
	for submit <at> debbugs.gnu.org; Mon, 18 Apr 2011 05:23:46 -0400
Received: from eggs.gnu.org ([140.186.70.92]:42386)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <trentbuck@HIDDEN>) id 1QBkgL-0005iG-6c
	for bug-gnu-emacs@HIDDEN; Mon, 18 Apr 2011 05:23:46 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <trentbuck@HIDDEN>) id 1QBkgJ-00052i-Jh
	for bug-gnu-emacs@HIDDEN; Mon, 18 Apr 2011 05:23:45 -0400
Received: from mail-pw0-f41.google.com ([209.85.160.41]:37648)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <trentbuck@HIDDEN>) id 1QBkgJ-00052Z-C0
	for bug-gnu-emacs@HIDDEN; Mon, 18 Apr 2011 05:23:43 -0400
Received: by pwi10 with SMTP id 10so3074821pwi.0
	for <bug-gnu-emacs@HIDDEN>; Mon, 18 Apr 2011 02:23:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:from:to:subject:x-debbugs-cc:date:message-id
	:mime-version:content-type;
	bh=OXqOuu0Ees1yVmbNIr6aMnn/mufdXWRHJX0rlmutZ2g=;
	b=s+Ak1w75ASppy3aNYziJZTOvTi9sXdGCOYywKJmJXbpQRk28XUJrkJn7wPrZmin3qB
	Cc36eF8PgVVMbGyWBr7ZeZbdofIvUoUcw2VgK1RW3rHT23U5EMU8hEYL1bQCtC5jStaX
	uEI4sH12tANPJRAY1IQ9H/XbI2Yo3PB+oFI1Q=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=from:to:subject:x-debbugs-cc:date:message-id:mime-version
	:content-type;
	b=HiPrpoMZhl8H+B4V7xpIzdNvYDZ42NdEIRtVKW4V6u5s9Kqo1fwiN3/Xl8RNaRH61G
	YO50ojmKq4W/ujLN7PEMXERlEAttzSC36ed+3AZPeKP1y18UU9WUFtfoK5TYM8qfjvuW
	e4GVuk1exL5L/T6+acQ02JAndjyhBj+XJ8IWc=
Received: by 10.68.50.70 with SMTP id a6mr6590965pbo.25.1303118622240;
	Mon, 18 Apr 2011 02:23:42 -0700 (PDT)
Received: from localhost (office.cyber.com.au [203.7.155.20])
	by mx.google.com with ESMTPS id j7sm2192007pbg.65.2011.04.18.02.23.40
	(version=TLSv1/SSLv3 cipher=OTHER);
	Mon, 18 Apr 2011 02:23:41 -0700 (PDT)
From: trentbuck@HIDDEN (Trent W. Buck)
To: bug-gnu-emacs@HIDDEN
Subject: 24.0.50; doc-view: allow pdftotext -layout instead of -raw
X-Debbugs-Cc: rfrancoise@HIDDEN
Date: Mon, 18 Apr 2011 19:23:36 +1000
Message-ID: <87fwpf7w13.fsf@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3)
X-Received-From: 140.186.70.17
X-Spam-Score: -5.4 (-----)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.11
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/pipermail/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -5.4 (-----)

doc-view supports using pdftotext on ttys.
Unfortunately it is hard-coded to pass -raw.
I would prefer to pass -layout.

Please modify doc-view to allow me to support something like

    (setq doc-view-pdftotext-program-args '("-layout" "-nopgbrk"))

FYI, my pdftotext manpage says -raw is discouraged:

       -layout

              Maintain (as best as possible) the original physical
              layout of the text.  The default is to =b4undo' physical
              layout (columns, hyphenation, etc.)  and output the text
              in reading order.

       -raw   Keep the text in content stream order.  This is a hack
              which often "undoes" column formatting, etc.  Use of raw
              mode is no longer recommended.


In GNU Emacs 24.0.50.1 (x86_64-pc-linux-gnu)
 of 2010-12-14 on elegiac, modified by Debian
 (emacs-snapshot package, version 1:20101212-2)
configured using `configure  '--build' 'x86_64-linux-gnu' '--host' 'x86_64-linux-gnu' '--prefix=/usr' '--sharedstatedir=/var/lib' '--libexecdir=/usr/lib' '--localstatedir=/var' '--infodir=/usr/share/info' '--mandir=/usr/share/man' '--with-pop=yes' '--enable-locallisppath=/etc/emacs-snapshot:/etc/emacs:/usr/local/share/emacs/24.0.50/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/24.0.50/site-lisp:/usr/share/emacs/site-lisp' '--without-compress-info' '--with-x=no' '--without-dbus' '--without-sound' 'build_alias=x86_64-linux-gnu' 'host_alias=x86_64-linux-gnu' 'CFLAGS=-DDEBIAN -DSITELOAD_PURESIZE_EXTRA=5000 -g -O2' 'LDFLAGS=-g -Wl,--as-needed' 'CPPFLAGS=''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: C
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en_AU.utf8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default enable-multibyte-characters: t

Major mode: Man

Minor modes in effect:
  diff-auto-refine-mode: t
  shell-dirtrack-mode: t
  rcirc-track-minor-mode: t
  xterm-mouse-mode: t
  ido-everywhere: t
  savehist-mode: t
  icomplete-mode: t
  show-paren-mode: t
  delete-selection-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:
O A ESC O A ESC O A ESC O A ESC O A ESC O A ESC O A 
ESC O A ESC O A ESC O A ESC O A ESC O A ESC O A ESC 
O A ESC O A ESC O A ESC O A ESC O A ESC O A ESC O A 
ESC O A ESC O A ESC O A ESC O A ESC O A ESC O A ESC 
O A ESC O A ESC O A ESC O A ESC O A ESC O A ESC O A 
ESC O A ESC O B ESC O B ESC C-b ESC O A ESC O A ESC 
O A C-e RET RET ( e v a l - a f t e r - l o a d SPC 
" p DEL d o v - DEL DEL c - v i e w " RET TAB ' ( C-y 
C-x C-x C-g ESC s ESC O B TAB ESC O A C-e ESC C-k ESC 
O B ESC O B ESC O B ESC b ESC O B ESC b ESC b ESC d 
l a o u t DEL DEL DEL y o u t ESC O B ESC O A ESC O 
A ESC O A ESC O A ESC C-x C-x C-s ESC a ESC a C-x ESC 
O D C-x C-k C-x C-k RET C-x C-k RET y C-x 1 C-v C-v 
C-v C-v C-v ESC x m a n RET p d f t o t e x t RET C-x 
0 C-s r a w ESC O C ESC O C ESC O B C-v ESC x r e p 
o r t SPC e m a c s RET b u g RET

Recent messages:
Copying /scpc:soy:/cyber/tmp/split-handshake.pdf to /tmp/tramp.24520Pw.pdf...done
Tramp: Inserting local temp file `/tmp/tramp.24520Pw.pdf'...done
Wrote /tmp/docview1000/split-handshake.pdf
No PNG support is available, or some conversion utility for pdf files is missing.
Unable to render file.  View extracted text instead? (y or n)  y
Invoking man pdftotext in the background
Please wait: formatting the pdftotext man page...
pdftotext man page formatted
Mark saved where search started
call-interactively: End of buffer [2 times]

Load-path shadows:
/home/twb/.emacs.d/lisp/magit/magit-svn hides /usr/share/emacs/24.0.50/site-lisp/magit/magit-svn
/home/twb/.emacs.d/lisp/magit/magit-key-mode hides /usr/share/emacs/24.0.50/site-lisp/magit/magit-key-mode
/home/twb/.emacs.d/lisp/magit/magit hides /usr/share/emacs/24.0.50/site-lisp/magit/magit
/home/twb/.emacs.d/lisp/magit/magit-topgit hides /usr/share/emacs/24.0.50/site-lisp/magit/magit-topgit
/usr/share/emacs/24.0.50/site-lisp/puppet-el/puppet-mode hides /usr/share/emacs/site-lisp/puppet-mode
/usr/share/emacs/24.0.50/site-lisp/debian-startup hides /usr/share/emacs/site-lisp/debian-startup

Features:
(shadow mail-extr emacsbug eldoc paredit find-func apropos cus-edit
cus-start cus-load ibuf-ext ibuffer sort tramp-cmds noutline outline
w3m-cookie thingatpt w3m-search mule-util w3m-form w3m-symbol
w3m-bookmark w3m-session w3m browse-url doc-view image-mode timezone
w3m-hist w3m-fb bookmark-w3m w3m-ems w3m-ccl ccl w3m-favicon w3m-image
w3m-proc w3m-util cc-mode cc-fonts cc-menus cc-cmds cc-styles cc-align
cc-engine cc-vars cc-defs woman tabify man assoc conf-mode vc-rcs
newcomment rect sh-script executable grep whitespace log-edit pcvs-util
add-log gnus-cite gnus-art mm-uu mml2015 epg-config mm-view smime dig
mailcap nnir gnus-sum macroexp nnoo gnus-group gnus-undo nnmail
mail-source gnus-start gnus-spec gnus-int message sendmail rfc822 mml
mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 rfc2047 rfc2045
ietf-drums mailabbrev gmm-utils mailheader gnus-win gnus-range gnus
gnus-ems nnheader mail-utils mm-util mail-prsvr wid-edit rst compile
tool-bar etags windmove diff-mode vc help-mode easymenu view tramp-sh
shell comint tramp-cache tramp tramp-compat auth-source netrc gnus-util
password-cache format-spec advice help-fns advice-preload tramp-loaddefs
ffap vc-dispatcher vc-darcs cl xml vc-git image wdired multi-isearch
dired-aux dired regexp-opt disp-table rcirc time-date ring server
jka-compr edmacro kmacro xt-mouse ido savehist icomplete paren delsel
saveplace debian-el debian-el-loaddefs w3m-load emacs-goodies-el
emacs-goodies-custom emacs-goodies-loaddefs easy-mmode dpkg-dev-el
dpkg-dev-el-loaddefs ediff-hook vc-hooks lisp-float-type lisp-mode
register page menu-bar rfn-eshadow timer select mouse jit-lock font-lock
syntax facemenu font-core frame cham georgian utf-8-lang misc-lang
vietnamese tibetan thai tai-viet lao korean japanese hebrew greek
romanian slovak czech european ethiopic indian cyrillic chinese
case-table epa-hook jka-cmpr-hook help simple abbrev loaddefs button
minibuffer faces cus-face files text-properties overlay md5 base64
format env code-pages mule custom widget hashtable-print-readable
backquote make-network-process multi-tty emacs)




Acknowledgement sent to trentbuck@HIDDEN (Trent W. Buck):
New bug report received and forwarded. Copy sent to rfrancoise@HIDDEN, bug-gnu-emacs@HIDDEN. Full text available.
Report forwarded to owner <at> debbugs.gnu.org, rfrancoise@HIDDEN, bug-gnu-emacs@HIDDEN:
bug#8519; Package emacs. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Fri, 31 Oct 2014 17:00:04 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.