bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] [PATCH] Add option to write URL rejections to a CSV log.


From: Daniel Kahn Gillmor
Subject: Re: [Bug-wget] [PATCH] Add option to write URL rejections to a CSV log.
Date: Tue, 28 Jul 2015 16:45:03 -0400
User-agent: Notmuch/0.20.2 (http://notmuchmail.org) Emacs/24.5.1 (x86_64-pc-linux-gnu)

On Tue 2015-07-28 01:47:30 -0400, Jookia wrote:
> This allows you to figure out why URLs are being rejected and some context
> around it. CSV is used as the output format since it can be used easily 
> parsed,
> and importantly only URL quoted fields are written.
 [...]
> +static void write_url_csv (FILE* f, struct url *url)
> +{
> +  if (!f)
> +    return;
> +
> +  char const *scheme_str = 0;
> +  switch (url->scheme)
> +    {
> +      case SCHEME_HTTP:    scheme_str = "SCHEME_HTTP";    break;
> +      #ifdef HAVE_SSL
> +        case SCHEME_HTTPS: scheme_str = "SCHEME_HTTPS";   break;
> +      #endif
> +      case SCHEME_FTP:     scheme_str = "SCHEME_FTP";     break;
> +      case SCHEME_INVALID: scheme_str = "SCHEME_INVALID"; break;
> +    }
> +
> +  fprintf (f, "%s,%s,%s,%i,%s,%s,%s,%s",
> +    url->url,
> +    scheme_str,
> +    url->host,
> +    url->port,
> +    url->path,
> +    url->params,
> +    url->query,
> +    url->fragment);
> +}

I like this idea, but i think the output file could be corrupted if any
of these pieces of the URL contain a comma character (",") or a
newline.  Perhaps newlines are going to be "safe" because of
URL-escaping, but i don't think a comma needs to be URL-escaped.

              --dkg



reply via email to

[Prev in Thread] Current Thread [Next in Thread]