bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] [Bug-Wget] Issues with Metalink support


From: Darshit Shah
Subject: Re: [Bug-wget] [Bug-Wget] Issues with Metalink support
Date: Mon, 7 Apr 2014 16:42:17 +0200

On Mon, Apr 7, 2014 at 4:21 PM, L Walsh <address@hidden> wrote:
>
>
> Darshit Shah wrote:
>>
>> Wget could, in theory, use fallocate() for linux, posix_fallocate() for
>> other posix-compliant systems and SetFileInformationByHandle (is this
>> available on older versions of Windows?) for Windows systems. It isn't going
>> out of the way by a large extent but ensures Wget plays well on each system.
>> However, this is going to lead to way too many code paths and ifdef
>> statements, and personally speaking, I'd rather we use only
>> posix_fallocate() everywhere and the Windows SysCalls for Windows.
>
> ----
>         Hey, that'd be fine with me -- OR if the length is not known,
> then allocating 1Meg chunks at a time and truncating at the final
> write.  If performance was an issue, I'd fork off the truncation
> in background -- I do something similar in a file util that can
> delete duplicates, the deletions I do with async i/o in the
> background so they won't slow down the primary function.
>
>         I don't usually have a problem with fragmentation on linux
> as I run xfs and will do some pre-allocation for you (more in recent
> kernels with it's "speculative preallocation"), AND for those who
> have degenerate use cases or who are anal-retentive (*cough*) their
> is a file-system reorganizer that can be run when needed or on a nightly
> cronjob...  So this isn't really a problem for me -- I was answering
> the question because MS took preventative measures to try to slow
> down disk fragmentation, as NTFS (and FAT for that matter)
> will suffer when it gets bad like many file systems.  Most don't protect
> themselves to the extremes that xfs does to prevent it.
>
>         But a sane middle ground like using posix pre-alloc calls
> and such seem like a reasonable middle ground -- or preallocating
> larger spaces when downloading large files....
>
>         I.e. Probably don't want to allocate a meg for each little
> 1k file on a mirror, but if you see the file size is large (size known),
> or have downloaded a meg or more, then preallocation w/a truncate
> starts to make some sense...
>
I *think* we might be going far from the original issue. Wget as it is
right now on origin/master seems to work perfectly. We could probably
improve or optimize it, but that is calls for a separate discussion.
The issue at hand is how parallel-wget works. Now, one thing we must
remember in this use-case is, we *always* know the file size. If we
don't Wget should automatically fall back to non-parallel download of
a single file.

Armed with the knowledge that we know the file size, I believe the
right way is to allocate the complete block with a .tmp/,swp or
similar extension and then rename(2) the complete download. This is
important since with downloading a single file in multiple parts, you
want to be able to randomly write to different locations of the file.
Continuing such downloads would be a problem. The guys from metalink,
curl, etc have a better idea baout such download scenarios than we do
and could probably suggest some easier alternatives.

>         I was just speaking up to answer the question you posed, about
> why someone might copy to one place then another...it wasn't meant
> to create a problem as to give some insight as to why it might be done.
>
Never tried to insinuate that you were. :)
All help and advice is always welcome here as we try to learn and
understand about new things.



-- 
Thanking You,
Darshit Shah



reply via email to

[Prev in Thread] Current Thread [Next in Thread]