[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] How to ensure data completeness/integrity for the file downlo
From: |
Anamika Jindal |
Subject: |
[Bug-wget] How to ensure data completeness/integrity for the file downloaded using wget |
Date: |
Tue, 28 Jul 2009 16:49:00 +0530 |
Hi,
We have an open audit issue regarding the files that are pulled from
external interfaces. We download these files using wget utility. wget
commands are being called from Pro*C batches e.g. for reference, code is
something like
<< sprintf (WGET, "%s%s%s/%s.%s", "wget -P ",FEEDFILE_PATH,"
ftp://username:address@hidden", FileName, "Z");>>
Now, the audit issue is to ensure the data integrity and data completeness
for the file that has been downloaded using wget.
Option 1-> Recommended option is ofcourse checksum approach, in which we
can get the checksum (any checksum e.g. MD5, SH1)of the file on remote
server. After that, we can get the checksum of file on local server(just
downloaded using wget). Then we can compare checksum to ensure the file
has been successfully(and completely) downloaded. I checked on google/wget
manual. wget does not provide any option to get the checksum but there
were functions like gnu_md5.c, don't know why these are used..
Option 2 -> is to check the File size on remote FTP server. After
retrieving the file (using wget), our application can compare this file
size with the file size of retrieved file. If file size does not match,
error will be raised. Now wget does not provide any direct option for
getting the file size. But it gives that information in the output message
e.g.
*************************************************************************
--2009-07-28 09:52:41-- ftp://....
Resolving http-proxy.gslb.db.com... 10.233.152.36
Connecting to http-proxy.gslb.db.com|10.233.152.36|:8080... connected.
Proxy request sent, awaiting response... 200 OK
Length: 22774 (22K)
Saving to: `C090725.eod'
100%[===================================================================================>]
22,774 --.-K/s in 0.004s
Last-modified header missing -- time-stamps turned off.
2009-07-28 09:52:43 (5.09 MB/s) - `C090725.eod' saved [22774/22774]
***************************************************************************
Issue is , I can not automate this. If I read this output message from my
batch e.g. grep on file size OR 100%, then this is not something that will
remain same in all the wget versions. This output text can change for new
version of wget.
Even with the same version, If I check different file on different server
, output message is different. So, I do not want to rely on this
information.
************************8
Connecting to 10.140.76.100:21... connected.
Logging in as pardev ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD not needed.
==> SIZE CLO_090722.csv_22-07-2009 ... 5147277786087434
==> PASV ... done. ==> RETR CLO_090722.csv_22-07-2009 ... done.
Length: 5147277786087434 (4.6P)
0% [ ] 1,198,444 --.-K/s in 0.1s
2009-07-28 10:17:37 (8.89 MB/s) - `CLO_090722.csv_22-07-2009' saved
[1198444]
***************************
Option 3 ->I checked other options, and I found this option:
When running Wget with -N, with or without -r, the decision as to whether
or not to download a newer copy of a file depends on the local and remote
timestamp and size of the file.
So, we thought may be after downloading the file using wget, we can
execute wget -N, and if this command gives the message that file is same.
This will imply that (timestamp, size) on local is same as (timestamp,
size) on remote server. But when I checked this option in my Production
envt. I got this message:
<<Proxy request sent, awaiting response... 400 Bad Request
2009-07-28 09:55:39 ERROR 400: Bad Request.>>
This was working fine with a sample file in test envt,
Now, my requirement is very simple. To ensure the data
completeness/integrity. Can somebody please suggest which options I should
use or I can use?? My first preference is to compare checksum.
Thanks & Regards,
Anamika Jindal
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you
- [Bug-wget] How to ensure data completeness/integrity for the file downloaded using wget,
Anamika Jindal <=