GNU bug report logs -
#46807
[website] return 404 with HTTP header 'Accept-Language: zh-CN,zh'
Previous Next
Reported by: ylc991 <ylc991 <at> 163.com>
Date: Sat, 27 Feb 2021 02:45:02 UTC
Severity: normal
Done: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 46807 in the body.
You can then email your comments to 46807 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-guix <at> gnu.org
:
bug#46807
; Package
guix
.
(Sat, 27 Feb 2021 02:45:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
ylc991 <ylc991 <at> 163.com>
:
New bug report received and forwarded. Copy sent to
bug-guix <at> gnu.org
.
(Sat, 27 Feb 2021 02:45:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/html, inline)]
Information forwarded
to
bug-guix <at> gnu.org
:
bug#46807
; Package
guix
.
(Sat, 27 Feb 2021 12:32:01 GMT)
Full text and
rfc822 format available.
Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Ylc991,
Thanks for the report!
My verbose notes so far; I need to (finally!) set up a local build
of the Web site first.
ylc991 写道:
> Hello! My webbrowser has set ‘Accept-Language’ to 'zh-CN,zh' by
> default, and https://guix.gnu.org returns 404.
Indeed, handling of zh-CN specifically is broken. :-(
--8<---------------cut here---------------start------------->8---
~ λ curl -LI -H 'Accept-Language: zh-cn' https://guix.gnu.org
HTTP/1.1 404 Not Found
[...]
--8<---------------cut here---------------end--------------->8---
This is because our nginx configuration
(maintenance/hydra/nginx/berlin.scm) does:
--8<---------------cut here---------------start------------->8---
set_from_accept_language $lang en de es fr zh-CN;
[...]
try_files $uri /$lang/$uri /$lang/$uri/index.html =404;
--8<---------------cut here---------------end--------------->8---
i.e., it looks in /srv/guix.gnu.org/zh-CN, but our website uses...
--8<---------------cut here---------------start------------->8---
nckx <at> berlin ~$ ls -d /srv/guix.gnu.org/zh*
/srv/guix.gnu.org/zh-cn/
--8<---------------cut here---------------end--------------->8---
...lowercase. This questionable choice comes from
artwork/po/ietf-tags.scm:
--8<---------------cut here---------------start------------->8---
;;; This file contains an association list for each translation
from
;;; the locale to an IETF language tag to be used in the URL path
of
;;; translated pages. The language tag results from the
translation
;;; team<E2><80><99>s language code from
;;; <https://translationproject.org/team/index.html>. The
underscore
;;; in the team<E2><80><99>s code is replaced by a hyphen. For
example, az would
;;; be used for the Azerbaijani language (not az-Latn) and zh-CN
would
;;; be used for mainland Chinese (not zh-Hans-CN)
([...]
("zh_CN" . "zh-cn"))
--8<---------------cut here---------------end--------------->8---
Questionable only because, while a lowercase region is technically
valid, it's so rare that it's likely to cause problems -- as we
found out.
> I have tested with curl, 'zh-CN,zh', 'zh-CN', [is 404]
These are valid, so the nginx accept-language module accepts them,
but then looks for a subdirectory that doesn't exist and returns
404.
> 'zh-cn' is 404
This is valid, but since we configure the accept-language module
to use ‘zh-CN’ it normalises $lang to the latter. Which is good,
but it causes the same 404 as above.
> 'zh_CN' is 200.
This is bogus (‘_’ is not valid), hence ignored, and so the site
falls back to English 200.
> 'zh' [is 200]
Valid but the accept-language module is not clever; we need to add
an explicit 'zh' entry for that to work:
--8<---------------cut here---------------start------------->8---
set_from_accept_language $lang en de es fr zh-CN zh en;
--8<---------------cut here---------------end--------------->8---
I expect that adding it and changing ietf-tags.scm to use "zh-CN"
will fix both 404s, but need to check that it doesn't break
anything else.
The other untested solution is using lowercase
--8<---------------cut here---------------start------------->8---
set_from_accept_language $lang en de es fr zh-cn zh en;
--8<---------------cut here---------------end--------------->8---
but I--assuming that even works--'m not fond of making the
unconventional the norm.
Kind regards,
T G-R
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
bug-guix <at> gnu.org
:
bug#46807
; Package
guix
.
(Sat, 27 Feb 2021 12:32:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#46807
; Package
guix
.
(Sat, 27 Feb 2021 12:36:02 GMT)
Full text and
rfc822 format available.
Message #14 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
It might be related to translations. When you use zh-cn, we have a translation for that language, so you're redirected to it. Not sure why you get a 404 though.
Le 26 février 2021 21:18:12 GMT-05:00, ylc991 <ylc991 <at> 163.com> a écrit :
>Hello! My webbrowser has set ‘Accept-Language’ to 'zh-CN,zh' by
>default, and https://guix.gnu.org returns 404. I have tested with curl,
>'zh-CN,zh', 'zh-CN', 'zh-cn' is 404 while 'zh', 'zh_CN' is 200.
>
>
>The first time I found it is on 2021-02-23. And it didn't happened
>about one or two months ago. I think there may be something wrong with
>the web server.
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-guix <at> gnu.org
:
bug#46807
; Package
guix
.
(Sat, 27 Feb 2021 12:36:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#46807
; Package
guix
.
(Mon, 01 Mar 2021 10:08:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 46807 <at> debbugs.gnu.org (full text, mbox):
Hello,
ylc991 <ylc991 <at> 163.com> skribis:
> Hello! My webbrowser has set ‘Accept-Language’ to 'zh-CN,zh' by default, and https://guix.gnu.org returns 404. I have tested with curl, 'zh-CN,zh', 'zh-CN',
> 'zh-cn' is 404 while 'zh', 'zh_CN' is 200.
Florian, could it be that we’re not normalizing language tags
appropriately? Does that ring a bell?
Thanks for your report!
Ludo’.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#46807
; Package
guix
.
(Mon, 01 Mar 2021 10:50:01 GMT)
Full text and
rfc822 format available.
Message #23 received at 46807 <at> debbugs.gnu.org (full text, mbox):
Hello,
On Mon, Mar 01, 2021 at 11:06:59AM +0100, Ludovic Courtès wrote:
> Florian, could it be that we’re not normalizing language tags
> appropriately? Does that ring a bell?
Tobias’ analysis likely is correct. I haven’t yet build a current
berlin virtual machine to test though.
We’re not normalizing language tags at all currently. Doing URL
redirects in nginx confuses me greatly; I have no idea how to
concisely specify redirects *and* have them execute in the right
order. The many lines
(redirect "/blog/2006/purely-functional-software-deployment-model" "/$lang/blog/2006/purely-functional-software-deployment-model/")
and similar in maintenance.git’s hydra/nginx/berlin.scm file are a bad
solution and are testament to my confusion. I would not like one line
for each package.
Regards,
Florian
Information forwarded
to
bug-guix <at> gnu.org
:
bug#46807
; Package
guix
.
(Thu, 04 Mar 2021 11:04:01 GMT)
Full text and
rfc822 format available.
Message #26 received at 46807 <at> debbugs.gnu.org (full text, mbox):
On Sat, Feb 27, 2021 at 01:31:40PM +0100, Tobias Geerinckx-Rice via Bug reports for GNU Guix wrote:
> I expect that adding it and changing ietf-tags.scm to use "zh-CN" will fix
> both 404s, but need to check that it doesn't break anything else.
I made the tiny change to guix-artwork’s ietf-tags.scm as
04c96a370b8cae48ed162e4414b8950cc65c513b now (sorry for taking so
long):
diff --git a/website/po/ietf-tags.scm b/website/po/ietf-tags.scm
index 32b81ef..5bd22f4 100644
--- a/website/po/ietf-tags.scm
+++ b/website/po/ietf-tags.scm
@@ -10,4 +10,4 @@
("de_DE" . "de")
("es_ES" . "es")
("fr_FR" . "fr")
- ("zh_CN" . "zh-cn"))
+ ("zh_CN" . "zh-CN"))
Note that the prior zh-cn URLs will be broken.
I will play around with nginx’ map directive to make zh-cn and zh
Accept-Language settings direct to the proper URL later, afterwards I
will close this bug. zh-cn URLs remain invalid. Links to the manual
continue to use zh-cn.
For testing I dug out the VM code
<https://lists.gnu.org/archive/html/bug-guix/2020-04/msg00195.html>
where I had removed parts of berlin that are not relevant to the
website. The change breaks neither website nor manual.
Thanks ylc991 for the report!
Regards,
Florian
Information forwarded
to
bug-guix <at> gnu.org
:
bug#46807
; Package
guix
.
(Fri, 05 Mar 2021 11:55:01 GMT)
Full text and
rfc822 format available.
Message #29 received at 46807 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello all,
On Mon, Mar 01, 2021 at 11:06:59AM +0100, Ludovic Courtès wrote:
> Florian, could it be that we’re not normalizing language tags
> appropriately? Does that ring a bell?
The attached patch to maintenance.git fixes the remaining minor issue:
Now Accept-Language language codes get normalized, zh to zh-CN, so web
browsers requesting any kind of Chinese get the website in mainland
Chinese. (This is a minor issue. The only valid URL is /zh-CN/ since
my last patch to guix-artwork because I don’t know how to
rewrite/redirect URLs in nginx.)
The patch was tested on a berlin VM.
There is no copyright header in maintenance.git’s
hydra/nginx/berlin.scm so I did not add a copyright. I hereby license
the patch CC0
<https://creativecommons.org/publicdomain/zero/1.0/legalcode>.
Shall I just push? A reconfigure of berlin will be necessary but is
not urgent.
Regards,
Florian
[0001-nginx-berlin-Normalize-Accept-Language-language-code.patch (text/plain, attachment)]
Information forwarded
to
bug-guix <at> gnu.org
:
bug#46807
; Package
guix
.
(Fri, 05 Mar 2021 16:11:03 GMT)
Full text and
rfc822 format available.
Message #32 received at 46807 <at> debbugs.gnu.org (full text, mbox):
Thank you for your help! Everything goes fine now.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#46807
; Package
guix
.
(Mon, 08 Mar 2021 13:28:01 GMT)
Full text and
rfc822 format available.
Message #35 received at 46807 <at> debbugs.gnu.org (full text, mbox):
Hi,
"pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> skribis:
> The attached patch to maintenance.git fixes the remaining minor issue:
> Now Accept-Language language codes get normalized, zh to zh-CN, so web
> browsers requesting any kind of Chinese get the website in mainland
> Chinese. (This is a minor issue. The only valid URL is /zh-CN/ since
> my last patch to guix-artwork because I don’t know how to
> rewrite/redirect URLs in nginx.)
>
> The patch was tested on a berlin VM.
Yay!
> There is no copyright header in maintenance.git’s
> hydra/nginx/berlin.scm so I did not add a copyright. I hereby license
> the patch CC0
> <https://creativecommons.org/publicdomain/zero/1.0/legalcode>.
Good point; I guess it was meant to be GPLv3+ like the rest, but thanks
for clarifying.
> Shall I just push? A reconfigure of berlin will be necessary but is
> not urgent.
Yes, sounds good!
We’ll reconfigure sooner or later, just ping if you don’t see it happen
within two weeks or so.
Thanks,
Ludo’.
Reply sent
to
"pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
:
You have taken responsibility.
(Thu, 11 Mar 2021 00:03:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
ylc991 <ylc991 <at> 163.com>
:
bug acknowledged by developer.
(Thu, 11 Mar 2021 00:03:02 GMT)
Full text and
rfc822 format available.
Message #40 received at 46807-done <at> debbugs.gnu.org (full text, mbox):
Pushed to maintenance.git as 82b075685b6089c7f98acb0993c003936d833776.
Closing. Thank you all!
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 08 Apr 2021 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 3 years and 12 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.