GNU bug report logs - #77539
31.0.50; nnatom: missing content in articles.

Previous Next

Package: emacs;

Reported by: Fernando de Morais <fernandodemorais.jf <at> gmail.com>

Date: Fri, 4 Apr 2025 18:35:01 UTC

Severity: normal

Found in version 31.0.50

To reply to this bug, email your comments to 77539 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#77539; Package emacs. (Fri, 04 Apr 2025 18:35:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Fernando de Morais <fernandodemorais.jf <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Fri, 04 Apr 2025 18:35:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Fernando de Morais <fernandodemorais.jf <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Cc: Daniel Semyonov <daniel <at> dsemy.com>
Subject: 31.0.50; nnatom: missing content in articles.
Date: Fri, 04 Apr 2025 15:33:53 -0300
When accessing certain feeds with `nnatom', it is possible to notice the
absence of characters such as curly quotes.

To reproduce, which is possible with `emacs -Q', please:

1. M-x load-library RET gnus RET
2. M-x gnus
3. In the groups buffer:
   - press B (shift+b), then type nnatom and insert the address
     planet.emcaslife.com/atom.xml then RET
4. In the browse server buffer, RET and select an article with RET

Results in:

        - It should be noticeable that curly quotes are missing from the
          text.  This can be validated by opening the original article
          link in a browser.

Additionally, in feeds like YouTube channels, the link to the video is
missing.  It is also possible to reproduce this with `emacs -Q' by
following the steps described above, but change the address to (Prot's
channel):

- www.youtube.com/feeds/videos.xml?channel_id=UC0uTPqBCFIpZxlz_Lv1tk_g

Results in:

        - Instead of the link, the content of the article is:
          (nil (Content-Type . text/html) links)


In GNU Emacs 31.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version
 3.24.49, cairo version 1.18.4) of 2025-04-04 built on sekai
Repository revision: 8c411381c69bf889243dc8a40cda22557e4b32be
Repository branch: master
System Description: Arch Linux

Configured using:
 'configure --with-pgtk --sysconfdir=/etc --prefix=/usr
 --libexecdir=/usr/lib --localstatedir=/var 'CFLAGS=-march=x86-64
 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=3
 -Wformat -Werror=format-security -fstack-clash-protection
 -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -g
 -ffile-prefix-map=/home/fernando/Documentos/Projetos/emacs/src=/usr/src/debug/emacs-git
 -flto=auto' 'LDFLAGS=-Wl,-O1 -Wl,--sort-common -Wl,--as-needed
 -Wl,-z,relro -Wl,-z,now -Wl,-z,pack-relative-relocs -flto=auto''

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
LCMS2 LIBOTF LIBSYSTEMD LIBXML2 MODULES NATIVE_COMP NOTIFY INOTIFY
PDUMPER PGTK PNG RSVG SECCOMP SOUND SQLITE3 THREADS TIFF
TOOLKIT_SCROLL_BARS TREE_SITTER WEBP XIM GTK3 ZLIB

Important settings:
  value of $LANG: pt_BR.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Org

Minor modes in effect:
  citar-embark-mode: t
  org-indent-mode: t
  org-superstar-mode: t
  pdf-occur-global-minor-mode: t
  flyspell-mode: t
  electric-pair-mode: t
  recentf-mode: t
  delete-selection-mode: t
  global-so-long-mode: t
  winner-mode: t
  icomplete-vertical-mode: t
  icomplete-mode: t
  minibuffer-depth-indicate-mode: t
  minibuffer-electric-default-mode: t
  savehist-mode: t
  marginalia-mode: t
  server-mode: t
  goto-address-mode: t
  consult-denote-mode: t
  denote-menu-bar-mode: t
  minions-mode: t
  movemail-auto-fetch-mode: t
  windmove-mode: t
  display-time-mode: t
  global-pulse-line-mode: t
  pulse-line-mode: t
  mode-line-visible-bell-mode: t
  override-global-mode: t
  display-battery-mode: t
  gcmh-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  minibuffer-regexp-mode: t
  size-indication-mode: t
  column-number-mode: t
  line-number-mode: t
  auto-fill-function: org-auto-fill-function
  visual-line-mode: t
  transient-mark-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  temp-buffer-resize-mode: t

Load-path shadows:
None found.

Features:
(shadow footnote emacsbug citar-capf citar-embark embark-org
embark-consult embark citar citar-file citar-cache citar-format parsebib
image-file image-converter org-indent oc-basic ol-eww eww-lnum eww
vtable url-queue ol-rmail ol-mhe ol-irc ol-info ol-gnus nnselect
ol-docview doc-view filenotify ol-bibtex bibtex ol-bbdb ol-w3m ol-doi
org-link-doi cape-keyword cape pdf-sync pdf-outline pdf-links
pdf-history pdf-annot oc-bibtex ob-octave org-superstar org-element
org-persist org-id org-refile org-element-ast inline avl-tree generator
org-contrib org ob ob-tangle ob-ref ob-lob ob-table ob-exp org-macro
org-src sh-script smie treesit executable ob-comint org-pcomplete
org-list org-footnote org-faces org-entities org-version ob-emacs-lisp
ob-core ob-eval org-cycle org-table ol org-fold org-fold-core org-keys
oc org-loaddefs holidays holiday-loaddefs cal-menu calendar cal-loaddefs
org-compat org-macs facemenu pdf-occur ibuffer-vc ibuffer-tramp
tramp-cache time-stamp tramp trampver tramp-integration files-x
tramp-message tramp-compat shell pcmpl-args pcmpl-gnu pcmpl-linux
pcmpl-unix pcomplete tramp-loaddefs ibuf-ext ibuffer ibuffer-loaddefs
tablist advice tablist-filter semantic/wisent/comp semantic/wisent
semantic/wisent/wisent semantic/util-modes semantic/util semantic
semantic/tag semantic/lex semantic/fw mode-local cedet pdf-isearch
pdf-misc imenu pdf-loader pdf-tools pdf-view pdf-cache pdf-info tq
pdf-util pdf-macs image-mode exif noutline outline jka-compr misearch
multi-isearch qp help-fns radix-tree mule-util sort gnus-cite smiley
textsec uni-scripts idna-mapping ucs-normalize uni-confusable
textsec-check mail-extr gnus-bcklg gnus-ml disp-table pulse color
mm-archive network-stream url-http url-gw nsm url-cache url-auth nnatom
nnfeed nnmaildir nnagent nnml nnnil nnrss mm-url gnus-topic mairix
ecomplete gnus-search eieio-opt speedbar ezimage dframe find-func
gnus-dup gnus-draft nndraft nnmh gnus-demon gnus-async comp comp-cstr
gnus-agent gnus-srvr gnus-score score-mode nnvirtual gnus-msg gnus-art
mm-uu mml2015 mm-view mml-smime smime gnutls dig nntp gnus-cache
gnus-sum shr-tag-pre-highlight language-detection shr pixel-fill kinsoku
url-file svg dom gnus-group gnus-undo gnus-start gnus-dbus gnus-cloud
nnimap nnmail mail-source utf7 nnoo parse-time iso8601 gnus-spec
gnus-int gnus-range message sendmail yank-media puny rfc822 mml mml-sec
mm-decode mm-bodies mm-encode mail-parse rfc2231 rfc2047 rfc2045
ietf-drums mailabbrev gmm-utils mailheader gnus-win gnus nnheader
gnus-util mail-utils range mm-util mail-prsvr face-remap view
flymake-languagetool time-date checkdoc lisp-mnt flymake warnings
display-line-numbers epa-file epa derived epg rfc6068 epg-config
flyspell ispell hl-line hideshow corfu rainbow-delimiters ffap elec-pair
recentf tree-widget delsel so-long winner icomplete mb-depth
minibuf-eldef savehist marginalia cus-start auth-source-pass server
goto-addr thingatpt consult-denote denote dired-x dired-aux dired
dired-loaddefs xref project consult bookmark minions let-alist pcase
ibuf-macs orderless compat windmove time cl-extra help-mode edmacro
kmacro bind-key easy-mmode format-spec battery dbus compile
text-property-search comint ansi-osc ansi-color ring comp-run
comp-common xml gcmh system-packages site-start auctex-autoloads
tex-site bbdb-csv-import-autoloads bbdb-autoloads cape-autoloads
citar-embark-autoloads citar-autoloads citeproc-autoloads
consult-denote-autoloads consult-dir-autoloads consult-eglot-autoloads
corfu-autoloads csv-mode-autoloads denote-autoloads dired-du-autoloads
diredfl-autoloads edit-indirect-autoloads elpher-autoloads
embark-consult-autoloads consult-autoloads embark-autoloads
emms-autoloads engrave-faces-autoloads eww-lnum-autoloads
fennel-mode-autoloads flymake-languagetool-autoloads
flyspell-correct-autoloads gcmh-autoloads gemini-mode-autoloads
htmlize-autoloads ibuffer-tramp-autoloads ibuffer-vc-autoloads
lua-mode-autoloads marginalia-autoloads markdown-mode-autoloads
minions-autoloads nov-autoloads esxml-autoloads kv-autoloads
ob-sagemath-autoloads olivetti-autoloads orderless-autoloads
org-contrib-autoloads org-superstar-autoloads ox-gemini-autoloads
parsebib-autoloads pass-autoloads f-autoloads dash-autoloads
password-store-otp-autoloads password-store-autoloads
pcmpl-args-autoloads pcsv-autoloads pdf-tools-autoloads
platformio-mode-autoloads async-autoloads projectile-autoloads
pyvenv-autoloads queue-autoloads rainbow-delimiters-autoloads
rainbow-mode-autoloads rec-mode-autoloads s-autoloads
sage-shell-mode-autoloads rx deferred-autoloads
shr-tag-pre-highlight-autoloads language-detection-autoloads
string-inflection-autoloads sxhkdrc-mode-autoloads
system-packages-autoloads tablist-autoloads transmission-autoloads info
with-editor-autoloads yaml-mode-autoloads yasnippet-autoloads
modus-vivendi-tinted-theme modus-themes package browse-url xdg url
url-proxy url-privacy url-expand url-methods url-history url-cookie
generate-lisp-file url-domsuf url-util mailcap url-handlers url-parse
auth-source cl-seq eieio eieio-core cl-macs password-cache json subr-x
map byte-opt gv bytecomp byte-compile url-vars cus-edit pp cus-load
icons wid-edit cl-loaddefs cl-lib rmc iso-transl tooltip cconv eldoc
paren electric uniquify ediff-hook vc-hooks lisp-float-type elisp-mode
mwheel term/pgtk-win pgtk-win term/common-win touch-screen pgtk-dnd
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace
newcomment text-mode lisp-mode prog-mode register page tab-bar menu-bar
rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock
font-lock syntax font-core term/tty-colors frame minibuffer nadvice seq
simple cl-generic indonesian philippine cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese composite emoji-zwj charscript charprop case-table
epa-hook jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button
loaddefs theme-loaddefs faces cus-face macroexp files window
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget keymap hashtable-print-readable backquote threads dbusbind
inotify dynamic-setting system-font-setting font-render-setting cairo
gtk pgtk lcms2 multi-tty move-toolbar make-network-process
tty-child-frames native-compile emacs)

Memory information:
((conses 16 1183543 258584) (symbols 48 49343 5) (strings 32 362585 68966)
 (string-bytes 1 15034677) (vectors 16 339750) (vector-slots 8 3265006 149918)
 (floats 8 91743 939) (intervals 56 5166 4752) (buffers 992 46))

-- 
Regards,
Fernando de Morais.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77539; Package emacs. (Fri, 04 Apr 2025 22:44:02 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Daniel Semyonov <daniel <at> dsemy.com>
To: Fernando de Morais <fernandodemorais.jf <at> gmail.com>
Cc: bug-gnu-emacs <at> gnu.org
Subject: Re: 31.0.50; nnatom: missing content in articles.
Date: Sat, 05 Apr 2025 01:40:54 +0300
[Message part 1 (text/plain, inline)]
>>>>> Fernando de Morais writes:

    > When accessing certain feeds with `nnatom', it is possible to notice the
    > absence of characters such as curly quotes.

    > To reproduce, which is possible with `emacs -Q', please:

    > 1. M-x load-library RET gnus RET
    > 2. M-x gnus
    > 3. In the groups buffer:
    >    - press B (shift+b), then type nnatom and insert the address
    >      planet.emcaslife.com/atom.xml then RET
    > 4. In the browse server buffer, RET and select an article with RET

    > Results in:

    >         - It should be noticeable that curly quotes are missing from the
    >           text.  This can be validated by opening the original article
    >           link in a browser.

I think those characters disappear due to the text being HTML, which
causes Gnus to display it differently.
AFAICT, the characters do appear in the body of the text saved by nnatom.

    > Additionally, in feeds like YouTube channels, the link to the video is
    > missing.  It is also possible to reproduce this with `emacs -Q' by
    > following the steps described above, but change the address to (Prot's
    > channel):

    > - www.youtube.com/feeds/videos.xml?channel_id=UC0uTPqBCFIpZxlz_Lv1tk_g

    > Results in:

    >         - Instead of the link, the content of the article is:
    >           (nil (Content-Type . text/html) links)

YouTube "articles" should actually contain two links, a link to the
channel and the video; unfortunately even though I remember testing this
it seems I made a typo and any article with an empty body (other than
links) will be broken in this way.
Attached a patch which fixes this.

Daniel

[0001-nnatom-Fix-parsing-of-empty-articles.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77539; Package emacs. (Sat, 05 Apr 2025 03:14:01 GMT) Full text and rfc822 format available.

Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Fernando de Morais <fernandodemorais.jf <at> gmail.com>
To: Daniel Semyonov <daniel <at> dsemy.com>
Cc: bug-gnu-emacs <at> gnu.org
Subject: Re: 31.0.50; nnatom: missing content in articles.
Date: Sat, 05 Apr 2025 00:10:38 -0300
Hello Daniel,

Thank you for checking on this issue.  I applied the patch and it fixed
the problem with YouTube channel feeds!

Daniel Semyonov <daniel <at> dsemy.com> writes:

> I think those characters disappear due to the text being HTML, which
> causes Gnus to display it differently.
> AFAICT, the characters do appear in the body of the text saved by nnatom.

In that case, do you think this is something that can be solved in
`nnatom', or would it require investigating other parts of Gnus?  I ask
because I used `nnrss' + atom2rss.xml for a while to read some Atom
feeds, and this issue with the characters didn't appear.

Interestingly, Philip Kaludercic's feed (amodernist.com/all.atom)
doesn't show the problem when accessed via `nnatom'...

Thanks again!

-- 
Regards,
Fernando de Morais.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77539; Package emacs. (Sat, 05 Apr 2025 09:18:01 GMT) Full text and rfc822 format available.

Message #14 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Daniel Semyonov <daniel <at> dsemy.com>
To: Fernando de Morais <fernandodemorais.jf <at> gmail.com>
Cc: bug-gnu-emacs <at> gnu.org
Subject: Re: 31.0.50; nnatom: missing content in articles.
Date: Sat, 05 Apr 2025 12:15:00 +0300
>>>>> Fernando de Morais writes:

    > Hello Daniel,
    > Thank you for checking on this issue.  I applied the patch and it fixed
    > the problem with YouTube channel feeds!

    > Daniel Semyonov <daniel <at> dsemy.com> writes:

    >> I think those characters disappear due to the text being HTML, which
    >> causes Gnus to display it differently.
    >> AFAICT, the characters do appear in the body of the text saved by nnatom.

    > In that case, do you think this is something that can be solved in
    > `nnatom', or would it require investigating other parts of Gnus?  I ask
    > because I used `nnrss' + atom2rss.xml for a while to read some Atom
    > feeds, and this issue with the characters didn't appear.

IIRC HTML articles are displayed by Gnus using shr.el, I assume this
would need to be fixed there (though I'm not sure if this is really a
"bug" or expected behavior for an HTML renderer).

    > Interestingly, Philip Kaludercic's feed (amodernist.com/all.atom)
    > doesn't show the problem when accessed via `nnatom'...

The curly quotes I found in this feed do not appear literally in the
feed, but rather appear as "&#8220;" and "&#8221;" in the HTML content
of the article; AFAIU this is the correct way to include such characters
in HTML, so it doesn't surprise me they show up correctly in Gnus.

    > Thanks again!

Thank you for reporting these issues,
Daniel




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77539; Package emacs. (Sat, 05 Apr 2025 13:35:01 GMT) Full text and rfc822 format available.

Message #17 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Fernando de Morais <fernandodemorais.jf <at> gmail.com>
To: Daniel Semyonov <daniel <at> dsemy.com>
Cc: bug-gnu-emacs <at> gnu.org
Subject: Re: 31.0.50; nnatom: missing content in articles.
Date: Sat, 05 Apr 2025 10:33:52 -0300
Hello Daniel,

Daniel Semyonov <daniel <at> dsemy.com> writes:

> IIRC HTML articles are displayed by Gnus using shr.el, I assume this
> would need to be fixed there (though I'm not sure if this is really a
> "bug" or expected behavior for an HTML renderer).

`shr' is also the default HTML render for EWW, if I'm not mistaken.  I
did a test: I repeated the same steps from my original report and
navigated, via the article link, to the original page using EWW, and the
characters appeared normally.

I also navigated to the feed page (planet.emacslife.com) via EWW, and
everything is displayed normally there.  Maybe is not related to `shr',
then?

> Thank you for reporting these issues,

You're welcome!

-- 
Regards,
Fernando de Morais.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77539; Package emacs. (Sat, 05 Apr 2025 14:01:02 GMT) Full text and rfc822 format available.

Message #20 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Daniel Semyonov <daniel <at> dsemy.com>
To: Fernando de Morais <fernandodemorais.jf <at> gmail.com>
Cc: bug-gnu-emacs <at> gnu.org
Subject: Re: 31.0.50; nnatom: missing content in articles.
Date: Sat, 05 Apr 2025 16:57:15 +0300
>>>>> Fernando de Morais writes:

    > Hello Daniel,
    > Daniel Semyonov <daniel <at> dsemy.com> writes:

    >> IIRC HTML articles are displayed by Gnus using shr.el, I assume this
    >> would need to be fixed there (though I'm not sure if this is really a
    >> "bug" or expected behavior for an HTML renderer).

    > `shr' is also the default HTML render for EWW, if I'm not mistaken.  I
    > did a test: I repeated the same steps from my original report and
    > navigated, via the article link, to the original page using EWW, and the
    > characters appeared normally.

    > I also navigated to the feed page (planet.emacslife.com) via EWW, and
    > everything is displayed normally there.  Maybe is not related to `shr',
    > then?

Interesting, looking at the page source it seems the quotes are included
literally like they are in the Atom feed, so I would expect it to be
broken in the same way, but it also seems fine on my end; I'll have to
do some more tests.




This bug report was last modified 7 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.