[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] wget mirror site failing due to file / directory name clashes
From: |
Paul Beckett (ITCS) |
Subject: |
[Bug-wget] wget mirror site failing due to file / directory name clashes |
Date: |
Fri, 12 Oct 2012 13:38:57 +0000 |
I am attempting to use wget to create a mirrored copy of a CMS (Liferay)
website. I want to be able to failover to this static copy in case the
application server goes offline. I therefore need the URL's to remain
absolutely identical. The problem I have is that I cannot figure out how I can
configure wget in a way that will cope with:
http://www.example.com/about
http://www.example.com/about/something
In this case either the file or directory 'about' already exists at prevents
the second being created.
Initially I though the most obvious solution, was to rely on Apache's
DirectoryIndex, and save the files as:
/about/index.html
/about/something/index.html
But, currently I can't figure out how I can do this in a way that doesn't break
either the relative path to other pages or create links to the index.html
rather than the original location. I need the links (a href etc.) to still go
to /about and not explicitly call /index.html - as this will mean people may
bookmark things that won't exist when the CMS came back.
If anyone can offer me any advice on how I can achieve this (either correct
options), or how I could patch the source code to achieve this, I would be
extremely grateful.
Thanks,
Paul
/usr/local/bin/wget --background --append-output=/tmp/wget-log --no-verbose
--tries=20 --waitretry=10 --retry-connrefused --limit-rate=100m --quota=10000m
--timestamping
--directory-prefix=/usr/local/apache2/content/uk.ac.uea.www_flat2
--protocol-directories --user-agent="UEA WebSite Flattener" --backup-converted
-e robots=off --page-requisites --convert-links --recursive --level=inf
--trust-server-names --domains example.com www.example.com
- [Bug-wget] wget mirror site failing due to file / directory name clashes,
Paul Beckett (ITCS) <=