GNU bug report logs - #46807
[website] return 404 with HTTP header 'Accept-Language: zh-CN,zh'

Previous Next

Package: guix;

Reported by: ylc991 <ylc991 <at> 163.com>

Date: Sat, 27 Feb 2021 02:45:02 UTC

Severity: normal

Done: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 46807 in the body.
You can then email your comments to 46807 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#46807; Package guix. (Sat, 27 Feb 2021 02:45:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to ylc991 <ylc991 <at> 163.com>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Sat, 27 Feb 2021 02:45:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: ylc991 <ylc991 <at> 163.com>
To: bug-guix <at> gnu.org
Subject: [website] return 404 with HTTP header 'Accept-Language: zh-CN,zh'
Date: Sat, 27 Feb 2021 10:18:12 +0800
[Message part 1 (text/html, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#46807; Package guix. (Sat, 27 Feb 2021 12:32:01 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Tobias Geerinckx-Rice <me <at> tobias.gr>
To: ylc991 <ylc991 <at> 163.com>
Cc: bug-guix <at> gnu.org, 46807 <at> debbugs.gnu.org
Subject: Re: bug#46807: [website] return 404 with HTTP header
 'Accept-Language: zh-CN, zh'
Date: Sat, 27 Feb 2021 13:31:40 +0100
[Message part 1 (text/plain, inline)]
Ylc991,

Thanks for the report!

My verbose notes so far; I need to (finally!) set up a local build 
of the Web site first.

ylc991 写道:
> Hello! My webbrowser has set ‘Accept-Language’ to 'zh-CN,zh' by 
> default, and https://guix.gnu.org returns 404.

Indeed, handling of zh-CN specifically is broken.  :-(

--8<---------------cut here---------------start------------->8---
~ λ curl -LI -H 'Accept-Language: zh-cn' https://guix.gnu.org
HTTP/1.1 404 Not Found
[...]
--8<---------------cut here---------------end--------------->8---

This is because our nginx configuration 
(maintenance/hydra/nginx/berlin.scm) does:

--8<---------------cut here---------------start------------->8---
set_from_accept_language $lang en de es fr zh-CN;
[...]
try_files $uri /$lang/$uri /$lang/$uri/index.html =404;
--8<---------------cut here---------------end--------------->8---

i.e., it looks in /srv/guix.gnu.org/zh-CN, but our website uses...

--8<---------------cut here---------------start------------->8---
nckx <at> berlin ~$ ls -d /srv/guix.gnu.org/zh*
/srv/guix.gnu.org/zh-cn/
--8<---------------cut here---------------end--------------->8---

...lowercase.  This questionable choice comes from 
artwork/po/ietf-tags.scm:

--8<---------------cut here---------------start------------->8---
;;; This file contains an association list for each translation 
   from
;;; the locale to an IETF language tag to be used in the URL path 
   of
;;; translated pages.  The language tag results from the 
   translation
;;; team<E2><80><99>s language code from
;;; <https://translationproject.org/team/index.html>.  The 
   underscore
;;; in the team<E2><80><99>s code is replaced by a hyphen.  For 
   example, az would
;;; be used for the Azerbaijani language (not az-Latn) and zh-CN 
   would
;;; be used for mainland Chinese (not zh-Hans-CN)
([...]
("zh_CN" . "zh-cn"))
--8<---------------cut here---------------end--------------->8---

Questionable only because, while a lowercase region is technically 
valid, it's so rare that it's likely to cause problems -- as we 
found out.

> I have tested with curl, 'zh-CN,zh', 'zh-CN', [is 404]

These are valid, so the nginx accept-language module accepts them, 
but then looks for a subdirectory that doesn't exist and returns 
404.

> 'zh-cn' is 404

This is valid, but since we configure the accept-language module 
to use ‘zh-CN’ it normalises $lang to the latter.  Which is good, 
but it causes the same 404 as above.

> 'zh_CN' is 200.

This is bogus (‘_’ is not valid), hence ignored, and so the site 
falls back to English 200.

> 'zh' [is 200]

Valid but the accept-language module is not clever; we need to add 
an explicit 'zh' entry for that to work:

--8<---------------cut here---------------start------------->8---
set_from_accept_language $lang en de es fr zh-CN zh en;
--8<---------------cut here---------------end--------------->8---

I expect that adding it and changing ietf-tags.scm to use "zh-CN" 
will fix both 404s, but need to check that it doesn't break 
anything else.

The other untested solution is using lowercase

--8<---------------cut here---------------start------------->8---
set_from_accept_language $lang en de es fr zh-cn zh en;
--8<---------------cut here---------------end--------------->8---

but I--assuming that even works--'m not fond of making the 
unconventional the norm.

Kind regards,

T G-R
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#46807; Package guix. (Sat, 27 Feb 2021 12:32:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#46807; Package guix. (Sat, 27 Feb 2021 12:36:02 GMT) Full text and rfc822 format available.

Message #14 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Julien Lepiller <julien <at> lepiller.eu>
To: bug-guix <at> gnu.org,ylc991 <ylc991 <at> 163.com>,46807 <at> debbugs.gnu.org
Subject: Re: bug#46807: [website] return 404 with HTTP header
 'Accept-Language: zh-CN, zh'
Date: Sat, 27 Feb 2021 07:34:45 -0500
[Message part 1 (text/plain, inline)]
It might be related to translations. When you use zh-cn, we have a translation for that language, so you're redirected to it. Not sure why you get a 404 though.

Le 26 février 2021 21:18:12 GMT-05:00, ylc991 <ylc991 <at> 163.com> a écrit :
>Hello! My webbrowser has set ‘Accept-Language’ to 'zh-CN,zh' by
>default, and https://guix.gnu.org returns 404. I have tested with curl,
>'zh-CN,zh', 'zh-CN', 'zh-cn' is 404 while 'zh', 'zh_CN' is 200.
>
>
>The first time I found it is on 2021-02-23. And it didn't happened
>about one or two months ago. I think there may be something wrong with
>the web server.
[Message part 2 (text/html, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#46807; Package guix. (Sat, 27 Feb 2021 12:36:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#46807; Package guix. (Mon, 01 Mar 2021 10:08:02 GMT) Full text and rfc822 format available.

Message #20 received at 46807 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: ylc991 <ylc991 <at> 163.com>
Cc: Julien Lepiller <julien <at> lepiller.eu>,
 Florian Pelz <pelzflorian <at> pelzflorian.de>, 46807 <at> debbugs.gnu.org
Subject: Re: bug#46807: [website] return 404 with HTTP header
 'Accept-Language: zh-CN, zh'
Date: Mon, 01 Mar 2021 11:06:59 +0100
Hello,

ylc991 <ylc991 <at> 163.com> skribis:

> Hello! My webbrowser has set ‘Accept-Language’ to 'zh-CN,zh' by default, and https://guix.gnu.org returns 404. I have tested with curl, 'zh-CN,zh', 'zh-CN',
> 'zh-cn' is 404 while 'zh', 'zh_CN' is 200.

Florian, could it be that we’re not normalizing language tags
appropriately?  Does that ring a bell?

Thanks for your report!

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#46807; Package guix. (Mon, 01 Mar 2021 10:50:01 GMT) Full text and rfc822 format available.

Message #23 received at 46807 <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: ylc991 <ylc991 <at> 163.com>, Julien Lepiller <julien <at> lepiller.eu>,
 46807 <at> debbugs.gnu.org
Subject: Re: bug#46807: [website] return 404 with HTTP header
 'Accept-Language: zh-CN, zh'
Date: Mon, 1 Mar 2021 11:49:30 +0100
Hello,

On Mon, Mar 01, 2021 at 11:06:59AM +0100, Ludovic Courtès wrote:
> Florian, could it be that we’re not normalizing language tags
> appropriately?  Does that ring a bell?

Tobias’ analysis likely is correct.  I haven’t yet build a current
berlin virtual machine to test though.

We’re not normalizing language tags at all currently.  Doing URL
redirects in nginx confuses me greatly; I have no idea how to
concisely specify redirects *and* have them execute in the right
order.  The many lines

(redirect "/blog/2006/purely-functional-software-deployment-model" "/$lang/blog/2006/purely-functional-software-deployment-model/")

and similar in maintenance.git’s hydra/nginx/berlin.scm file are a bad
solution and are testament to my confusion.  I would not like one line
for each package.

Regards,
Florian




Information forwarded to bug-guix <at> gnu.org:
bug#46807; Package guix. (Thu, 04 Mar 2021 11:04:01 GMT) Full text and rfc822 format available.

Message #26 received at 46807 <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: Tobias Geerinckx-Rice <me <at> tobias.gr>
Cc: ylc991 <ylc991 <at> 163.com>, 46807 <at> debbugs.gnu.org
Subject: Re: bug#46807: [website] return 404 with HTTP header
 'Accept-Language: zh-CN, zh'
Date: Thu, 4 Mar 2021 12:03:00 +0100
On Sat, Feb 27, 2021 at 01:31:40PM +0100, Tobias Geerinckx-Rice via Bug reports for GNU Guix wrote:
> I expect that adding it and changing ietf-tags.scm to use "zh-CN" will fix
> both 404s, but need to check that it doesn't break anything else.

I made the tiny change to guix-artwork’s ietf-tags.scm as
04c96a370b8cae48ed162e4414b8950cc65c513b now (sorry for taking so
long):

diff --git a/website/po/ietf-tags.scm b/website/po/ietf-tags.scm
index 32b81ef..5bd22f4 100644
--- a/website/po/ietf-tags.scm
+++ b/website/po/ietf-tags.scm
@@ -10,4 +10,4 @@
  ("de_DE" . "de")
  ("es_ES" . "es")
  ("fr_FR" . "fr")
- ("zh_CN" . "zh-cn"))
+ ("zh_CN" . "zh-CN"))

Note that the prior zh-cn URLs will be broken.

I will play around with nginx’ map directive to make zh-cn and zh
Accept-Language settings direct to the proper URL later, afterwards I
will close this bug.  zh-cn URLs remain invalid.  Links to the manual
continue to use zh-cn.

For testing I dug out the VM code
<https://lists.gnu.org/archive/html/bug-guix/2020-04/msg00195.html>
where I had removed parts of berlin that are not relevant to the
website.  The change breaks neither website nor manual.

Thanks ylc991 for the report!

Regards,
Florian




Information forwarded to bug-guix <at> gnu.org:
bug#46807; Package guix. (Fri, 05 Mar 2021 11:55:01 GMT) Full text and rfc822 format available.

Message #29 received at 46807 <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: 46807 <at> debbugs.gnu.org
Cc: Ludovic Courtès <ludo <at> gnu.org>,
 Tobias Geerinckx-Rice <me <at> tobias.gr>, Julien Lepiller <julien <at> lepiller.eu>
Subject: Re: bug#46807: [website] return 404 with HTTP header
 'Accept-Language: zh-CN, zh'
Date: Fri, 5 Mar 2021 12:54:42 +0100
[Message part 1 (text/plain, inline)]
Hello all,

On Mon, Mar 01, 2021 at 11:06:59AM +0100, Ludovic Courtès wrote:
> Florian, could it be that we’re not normalizing language tags
> appropriately?  Does that ring a bell?

The attached patch to maintenance.git fixes the remaining minor issue:
Now Accept-Language language codes get normalized, zh to zh-CN, so web
browsers requesting any kind of Chinese get the website in mainland
Chinese.  (This is a minor issue.  The only valid URL is /zh-CN/ since
my last patch to guix-artwork because I don’t know how to
rewrite/redirect URLs in nginx.)

The patch was tested on a berlin VM.

There is no copyright header in maintenance.git’s
hydra/nginx/berlin.scm so I did not add a copyright.  I hereby license
the patch CC0
<https://creativecommons.org/publicdomain/zero/1.0/legalcode>.

Shall I just push?  A reconfigure of berlin will be necessary but is
not urgent.

Regards,
Florian
[0001-nginx-berlin-Normalize-Accept-Language-language-code.patch (text/plain, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#46807; Package guix. (Fri, 05 Mar 2021 16:11:03 GMT) Full text and rfc822 format available.

Message #32 received at 46807 <at> debbugs.gnu.org (full text, mbox):

From: YLC <ylc991 <at> 163.com>
To: 46807 <at> debbugs.gnu.org
Cc: julien <at> lepiller.eu, me <at> tobias.gr, ludo <at> gnu.org, pelzflorian <at> pelzflorian.de
Subject: Re: bug#46807: [website] return 404 with HTTP header
 'Accept-Language: zh-CN, zh'
Date: Fri, 5 Mar 2021 18:03:18 +0800 (CST)
Thank you for your help! Everything goes fine now.




Information forwarded to bug-guix <at> gnu.org:
bug#46807; Package guix. (Mon, 08 Mar 2021 13:28:01 GMT) Full text and rfc822 format available.

Message #35 received at 46807 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
Cc: Julien Lepiller <julien <at> lepiller.eu>, Tobias Geerinckx-Rice <me <at> tobias.gr>,
 46807 <at> debbugs.gnu.org
Subject: Re: bug#46807: [website] return 404 with HTTP header
 'Accept-Language: zh-CN, zh'
Date: Mon, 08 Mar 2021 14:27:26 +0100
Hi,

"pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> skribis:

> The attached patch to maintenance.git fixes the remaining minor issue:
> Now Accept-Language language codes get normalized, zh to zh-CN, so web
> browsers requesting any kind of Chinese get the website in mainland
> Chinese.  (This is a minor issue.  The only valid URL is /zh-CN/ since
> my last patch to guix-artwork because I don’t know how to
> rewrite/redirect URLs in nginx.)
>
> The patch was tested on a berlin VM.

Yay!

> There is no copyright header in maintenance.git’s
> hydra/nginx/berlin.scm so I did not add a copyright.  I hereby license
> the patch CC0
> <https://creativecommons.org/publicdomain/zero/1.0/legalcode>.

Good point; I guess it was meant to be GPLv3+ like the rest, but thanks
for clarifying.

> Shall I just push?  A reconfigure of berlin will be necessary but is
> not urgent.

Yes, sounds good!

We’ll reconfigure sooner or later, just ping if you don’t see it happen
within two weeks or so.

Thanks,
Ludo’.




Reply sent to "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>:
You have taken responsibility. (Thu, 11 Mar 2021 00:03:01 GMT) Full text and rfc822 format available.

Notification sent to ylc991 <ylc991 <at> 163.com>:
bug acknowledged by developer. (Thu, 11 Mar 2021 00:03:02 GMT) Full text and rfc822 format available.

Message #40 received at 46807-done <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: 46807-done <at> debbugs.gnu.org
Subject: Re: bug#46807: [website] return 404 with HTTP header
 'Accept-Language: zh-CN, zh'
Date: Thu, 11 Mar 2021 01:01:50 +0100
Pushed to maintenance.git as 82b075685b6089c7f98acb0993c003936d833776.

Closing.  Thank you all!




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 08 Apr 2021 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 12 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.