bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] trouble with URL vs local file names


From: Tobias Senz
Subject: [Bug-wget] trouble with URL vs local file names
Date: Thu, 18 Feb 2010 10:54:11 +0100
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20100111 Thunderbird/3.0.1

Heja :)

When using recursive or page requisite downloading local folders are
created. Is there any way to switch that off without loosing the URL?
It would make much sense to me for archival if local files were "flat"
but including as much of the URL as possible.
But, all files in ONE folder.

Like so, made up example:

wget -p -k www.google.de

creates locally the folders / files
./www.google.de
./www.google.de/index.html
./www.google.de/logos
./www.google.de/logos/olympics10-skeleton-hp.png

I'd prefer if that were ONLY files
./address@hidden
./address@hidden@olympics10-skeleton-hp.png

Or a URL like this
http://www.google.de/csi?v=3&s=webhp&action=&e=17259,17311,22713,23386,23756,23806&ei=JwZ9S-StKYaC_Aatp-z3BA&expi=17259,17311,22713,23386,23756,23806&imc=1&imn=1&imp=1&rt=prt.41,xjsls.93,xjses.163,xjsee.206,xjs.229,ol.468,iml.241

locally as file name
address@hidden@address@hidden@address@hidden@action@@address@hidden,17311,22713,23386,23756
[etc ...]

How would that be possible?
On more complicated pages or when getting not only one page in one
folder everything is otherwise spread around many (sub-)folders which
makes viewing lateron more difficult than need be.

I'm aware of the "--no-directories" but that does not retain the info
(or approximation thereof) what file name something has had on the
server. Or even which server it comes from. (Having protocol in name i
could live without. The server on host-span not so much.)
When using "--no-directories" with -N things just get written over or
without it there would be just too many copies of the same URL, across
several calls to wget. It also makes further processing impossible as
the original URL is pretty much lost.


And i'm also having trouble with the way files are named locally, the
"--restrict-file-names=" thingie.
Is there any way to also block "%" "&" "=" (and possibly others i can't
think of right now - "+" maybe?) locally as these seem to prevent
further processing in batch scripts? As mentioned above i'm more of a
fan of "@" for placeholders. Rarely (never?) used in http, and does not
seem to make any trouble when scripting.

I'm on Windows with Cygwin and mixing of both batch (cmd.exe) and shell
(sh, tcsh ...) scripting as well as (Win)DOS and Cygwin utilities might
happen. In other words, "these are unsafe to me", when filename is
passed to anything via command line. (Really haven't found any way to
escape these in some situations. Different type of quotes a-plenty,
backslashes too, nothing helps.)

http://wget.addictivecode.org/FeatureSpecifications
mentions "ContentFilters", where are these - or a description - to be
found? Future?
Is the "translate URIs to local filenames" mentioned there the same one
i'm having trouble with?
Preferrably the "flat path-/filenames" thing could be built in to wget
as "--flat" :)
(Maybe i'm imagining it a lot more simple, but if there already is a
central point where escaping for local file names happens, could the
slashes and backslashes just be removed before creating folders?)

Thanks!




reply via email to

[Prev in Thread] Current Thread [Next in Thread]