[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] language of the page depends on --compression={auto, none}?
From: |
balducci |
Subject: |
[Bug-wget] language of the page depends on --compression={auto, none}? |
Date: |
Sun, 13 Oct 2019 13:42:13 +0200 |
hello
I'm experiencing a problem related to --compression with wget.
I usually run with --compression=auto because some servers apparently
serve gzip compressed pages unconditionally (eg
http://llvm.org/releases/download.html).
Since few days (2019-10-11), for some reason, with --compression=auto
wget will download the SPANISH version of this page:
https://pypi.org/project/nose/, where with --compression=none it will
download the (usual) ENGLISH version of the same page.
I never had this "spanish" problem before (the wget command is run in a
daily procedure since many years); this makes me think that the
problem might be related to a server upgrade on that site (I didn't
record this: presently: Server: nginx/1.13.9)
I didn't notice any similar behavior with the many other sites I daily
monitor using wget (and using --compression=auto)
In any case, I wouldn't expect the *language* of a downloaded page to be
dependent on the --compression command line arg (!)
I don't know if this is a problem with wget or the http server...
I enclose a scriptlett that can be used to quickly check if the
problem is reproducible by others
(I can add that with curl I have no problem)
thanks a lot
ciao
gabriele
----8<----
#!/bin/sh
WGET=wget
root_dir=/tmp/wget
rm -rf ${root_dir}
mkdir -p ${root_dir}/{llvm/{auto,none},pypi/{auto,none}}
echo
echo "https://pypi.org/project/nose/"
echo "=============================="
cd ${root_dir}/pypi/auto
rm -f ./auto.html && \
${WGET} -d --compression=auto -O ./auto.html \
https://pypi.org/project/nose/ >&./auto.log && \
egrep -qi fecha ./auto.html && \
echo "--compression=auto: spanish"
cd ${root_dir}/pypi/none
rm -f ./none.html && \
${WGET} -d --compression=none -O ./none.html \
https://pypi.org/project/nose/ >&./none.log && \
egrep -qi 'Upload date' ./none.html && \
echo "--compression=none: english"
echo
echo "http://llvm.org/releases/download.html"
echo "======================================"
cd ${root_dir}/llvm/auto
rm -f ./auto.html && \
${WGET} -d --compression=auto -O ./auto.html \
http://llvm.org/releases/download.html >&./auto.log && \
echo "--compression=auto: $(file ./auto.html|sed -e's|^.*: ||' -e's|,.*||')"
echo " $(egrep 'Accept-Encoding' ./auto.log|head -1)"
echo " $(egrep 'Content-Encoding' ./auto.log)"
cd ${root_dir}/llvm/none
rm -f ./none.html && \
${WGET} -d --compression=none -O ./none.html \
http://llvm.org/releases/download.html >&./none.log && \
echo "--compression=none: $(file ./none.html|sed -e's|^.*: ||' -e's|,.*||')"
echo " $(egrep 'Accept-Encoding' ./none.log|head -1)"
echo " $(egrep 'Content-Encoding' ./none.log)"
exit
---->8----
- [Bug-wget] language of the page depends on --compression={auto, none}?,
balducci <=