[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] [PATCH] Add option to write URL rejections to a CSV log.
From: |
Daniel Kahn Gillmor |
Subject: |
Re: [Bug-wget] [PATCH] Add option to write URL rejections to a CSV log. |
Date: |
Tue, 28 Jul 2015 16:45:03 -0400 |
User-agent: |
Notmuch/0.20.2 (http://notmuchmail.org) Emacs/24.5.1 (x86_64-pc-linux-gnu) |
On Tue 2015-07-28 01:47:30 -0400, Jookia wrote:
> This allows you to figure out why URLs are being rejected and some context
> around it. CSV is used as the output format since it can be used easily
> parsed,
> and importantly only URL quoted fields are written.
[...]
> +static void write_url_csv (FILE* f, struct url *url)
> +{
> + if (!f)
> + return;
> +
> + char const *scheme_str = 0;
> + switch (url->scheme)
> + {
> + case SCHEME_HTTP: scheme_str = "SCHEME_HTTP"; break;
> + #ifdef HAVE_SSL
> + case SCHEME_HTTPS: scheme_str = "SCHEME_HTTPS"; break;
> + #endif
> + case SCHEME_FTP: scheme_str = "SCHEME_FTP"; break;
> + case SCHEME_INVALID: scheme_str = "SCHEME_INVALID"; break;
> + }
> +
> + fprintf (f, "%s,%s,%s,%i,%s,%s,%s,%s",
> + url->url,
> + scheme_str,
> + url->host,
> + url->port,
> + url->path,
> + url->params,
> + url->query,
> + url->fragment);
> +}
I like this idea, but i think the output file could be corrupted if any
of these pieces of the URL contain a comma character (",") or a
newline. Perhaps newlines are going to be "safe" because of
URL-escaping, but i don't think a comma needs to be URL-escaped.
--dkg