GNU bug report logs -
#31665
libxml-parse-html-region' doesn't extract text in tables
Previous Next
Reported by: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Date: Thu, 31 May 2018 09:56:02 UTC
Severity: minor
Tags: fixed, moreinfo
Fixed in version 27.1
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 31665 in the body.
You can then email your comments to 31665 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31665
; Package
emacs
.
(Thu, 31 May 2018 09:56:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Thu, 31 May 2018 09:56:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Dear bug-gnu-emacs, libxml-parse-html-region' doesn't extract text in <table>s,
KY> I found that Emacs' built-in function `libxml-parse-html-region'
KY> doesn't extract text existing in the table clause.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31665
; Package
emacs
.
(Thu, 31 May 2018 10:59:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 31665 <at> debbugs.gnu.org (full text, mbox):
積丹尼 Dan Jacobson <jidanni <at> jidanni.org> writes:
> Dear bug-gnu-emacs, libxml-parse-html-region' doesn't extract text in
> <table>s,
Do you have an example table that `libxml-parse-html-region' doesn't
"extract" text from?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Added tag(s) moreinfo.
Request was from
Noam Postavsky <npostavs <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Sun, 03 Jun 2018 00:19:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31665
; Package
emacs
.
(Thu, 07 Jun 2018 07:41:01 GMT)
Full text and
rfc822 format available.
Message #13 received at 31665 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
>>>>> "LI" == Lars Ingebrigtsen <larsi <at> gnus.org> writes:
LI> Do you have an example table that `libxml-parse-html-region' doesn't
LI> "extract" text from?
OK here is a mail that I cleaned off my personal phone bill from:
[gg.gz (application/gzip, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31665
; Package
emacs
.
(Sun, 29 Sep 2019 08:36:02 GMT)
Full text and
rfc822 format available.
Message #16 received at 31665 <at> debbugs.gnu.org (full text, mbox):
積丹尼 Dan Jacobson <jidanni <at> jidanni.org> writes:
>>>>>> "LI" == Lars Ingebrigtsen <larsi <at> gnus.org> writes:
>
> LI> Do you have an example table that `libxml-parse-html-region' doesn't
> LI> "extract" text from?
>
> OK here is a mail that I cleaned off my personal phone bill from:
What was it you think is missing from that table? I don't read Chinese,
but there didn't seem to be any text in that table, just a bunch of
images.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Added tag(s) moreinfo.
Request was from
Lars Ingebrigtsen <larsi <at> gnus.org>
to
control <at> debbugs.gnu.org
.
(Sun, 29 Sep 2019 08:36:03 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31665
; Package
emacs
.
(Sun, 29 Sep 2019 16:53:01 GMT)
Full text and
rfc822 format available.
Message #21 received at 31665 <at> debbugs.gnu.org (full text, mbox):
>>>>> "LI" == Lars Ingebrigtsen <larsi <at> gnus.org> writes:
LI> 積丹尼 Dan Jacobson <jidanni <at> jidanni.org> writes:
>>>>>>> "LI" == Lars Ingebrigtsen <larsi <at> gnus.org> writes:
>>
LI> Do you have an example table that `libxml-parse-html-region' doesn't
LI> "extract" text from?
>>
>> OK here is a mail that I cleaned off my personal phone bill from:
LI> What was it you think is missing from that table? I don't read Chinese,
LI> but there didn't seem to be any text in that table, just a bunch of
LI> images.
It should look like:
+----------------------------------------------------------------------------------------------------------------------------------------------------+
|+---------------------------------------------------------------------------------------------------------------------+ |
||+------------------------------------------------------------------------------------------------------------------+ | |
|||[banner2] | | |
|||------------------------------------------------------------------------------------------------------------------| | |
|||+---------------------------------------------------------------------------------------------------------------+ | | |
|||| |親愛的客戶,您好: | | | | |
|||| |-------------------------------------| | | | |
|||| |為保障您資料的安全,請輸入密碼開啟附 | | | | |
|||| |加檔案瀏覽您本期的帳單,密碼為『身分 | | | | |
|||| [IS1] |證號碼』(英文字母須大寫),營業人客戶 | [IS2] | | | |
|||| |不需輸入密碼即可瀏覽。 | | | | |
|||| |若無法開啟附加檔案,請先確認是否已下 | | | | |
|||| |載Acrobat Reader軟體。 | | | | |
|||| |-------------------------------------| | | | |
|||+---------------------------------------------------------------------------------------------------------------+ | | |
||+------------------------------------------------------------------------------------------------------------------+ | |
||++ | |
|||| | |
||++ | |
||+-------------------------------------------------------------------------------------------------------------------+| |
|||[new1] || |
|||+-----------------------------------------------------------------------------------------------------------------+|| |
|||| | [enf201]||| |
|||| |--------------------------------------------------------||| |
||||[end101] | [enl301]||| |
|||| |--------------------------------------------------------||| |
|||| | [enl401]||| |
|||+-----------------------------------------------------------------------------------------------------------------+|| |
||+-------------------------------------------------------------------------------------------------------------------+| |
||++ | |
|||| | |
||++ | |
||+------------------------------------------------------------------------------------------------------------------+ | |
|||[hot1] | | |
|||------------------------------------------------------------------------------------------------------------------| | |
|||+----------------------------------+ | | |
||||[hot1]|[hot2]|[hot3]|[hot4]|[hot5]| | | |
|||+----------------------------------+ | | |
||+------------------------------------------------------------------------------------------------------------------+ | |
||++ | |
|||| | |
||++ | |
||+------------------------------------------------------------------------------------------------------------------+ | |
|||[link1] | | |
|||+-----------------------------------------------------------------+ | | |
|||||| | | | | | | |
||||++------------+----------------+----------------+----------------| | | |
||||||電子帳單Q&A | 費率說明 | 客戶消費資訊 | 線上繳費 | | | |
||||++------------+----------------+----------------+----------------| | | |
|||||| 服務專線 | 貼心提醒 |不可不知行動優惠| HiNet好康優惠 | | | |
|||+-----------------------------------------------------------------+ | | |
||+------------------------------------------------------------------------------------------------------------------+ | |
||++ | |
|||| | |
||++ | |
||+------------------------------------------------------------------------------------------------------------------+ | |
||| [cht] | | |
||+------------------------------------------------------------------------------------------------------------------+ | |
|+---------------------------------------------------------------------------------------------------------------------+ |
+----------------------------------------------------------------------------------------------------------------------------------------------------+
But instead all we get is:
From: Phone Co. <p <at> cht.com.tw>
Subject: Phone Bill
To: "jidanni <at> jidanni.org" <jidanni <at> jidanni.org>
Date: Thu, 17 May 2018 12:12:06 +0800
Reply-To: x <at> cht.com.tw
[1. text/html]
中華電信電子帳單
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31665
; Package
emacs
.
(Mon, 30 Sep 2019 05:06:01 GMT)
Full text and
rfc822 format available.
Message #24 received at 31665 <at> debbugs.gnu.org (full text, mbox):
The HTML in that email is invalid. It's basically on the form
<table>
<tbody>
foo
</tbody>
</table>
"foo" won't be rendered by shr.
shr does try to deal with invalid tables, though. If the <tbody>
elements hadn't been there, then the "foo" would have been, so I guess
some more work is required in that area.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31665
; Package
emacs
.
(Mon, 30 Sep 2019 05:29:01 GMT)
Full text and
rfc822 format available.
Message #27 received at 31665 <at> debbugs.gnu.org (full text, mbox):
Lars Ingebrigtsen <larsi <at> gnus.org> writes:
> shr does try to deal with invalid tables, though. If the <tbody>
> elements hadn't been there, then the "foo" would have been, so I guess
> some more work is required in that area.
I've now fixed this on the trunk.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Added tag(s) fixed.
Request was from
Lars Ingebrigtsen <larsi <at> gnus.org>
to
control <at> debbugs.gnu.org
.
(Mon, 30 Sep 2019 05:29:03 GMT)
Full text and
rfc822 format available.
bug marked as fixed in version 27.1, send any further explanations to
31665 <at> debbugs.gnu.org and 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Request was from
Lars Ingebrigtsen <larsi <at> gnus.org>
to
control <at> debbugs.gnu.org
.
(Mon, 30 Sep 2019 05:29:03 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31665
; Package
emacs
.
(Tue, 01 Oct 2019 02:44:01 GMT)
Full text and
rfc822 format available.
Message #34 received at 31665 <at> debbugs.gnu.org (full text, mbox):
On Mon, 30 Sep 2019 07:28:19 +0200, Lars Ingebrigtsen wrote:
> I've now fixed this on the trunk.
Verified. Thank you for improving it!
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Tue, 29 Oct 2019 11:24:07 GMT)
Full text and
rfc822 format available.
This bug report was last modified 4 years and 179 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.