bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Overly permissive hostname matching


From: Daniel Kahn Gillmor
Subject: Re: [Bug-wget] Overly permissive hostname matching
Date: Wed, 19 Mar 2014 10:59:05 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Icedove/24.2.0

On 03/19/2014 06:19 AM, Tim Ruehsen wrote:
> As a programmer, I want to have control. E.g. the option to load from a 
> different file, or to switch off loading. Why ? e.g. for testing purposes, or 
> simply imagine a "swiss army knife" client for experts - maybe they want to 
> have control via CLI args. Or you are in a controlled environment and simply 
> don't want to waste CPU cycles when downloading a single file from a trusted 
> server. Just some examples.
> And than, clients like Wget would like to have access, at least for checking 
> cookies.

i understand, and i think we're probably not disagreeing -- you want the
ability to control it; i want sane defaults so that people who don't
touch it get sensible behavior.

> I just took a quick look but I am not sure about the API (i did not have this 
> 'aha' effect). But what I don't like is the dependency on PHP which is used 
> to 
> 'compile' the PSL before the C functions can use it. While the idea of 
> compilation/preprocessing is a good one, it should at least be optional.

pre-compilation/preprocessing is probably a reasonable performance
optimization for heavy use; we might even want a C library to embed a
precompiled version of the most recent known list at time of
compilation, so that it can be used with no initialization step or when
no file is available.  I don't think depending on php for the
pre-compilation step is a problem; that's just an additional
build-dependency, same as (for example) bison or cmake or python for
other C projects.  (though i confess i'd rather work with pretty much
any language other than PHP in general)

I agree that we probably want the library to support the generic case of
reading the PSL from a file, though.

I'm imagining a C library API that has a public suffix list context
object that can do efficient lookups (however we define the lookups),
and the library would bundle a pre-compiled context, based on the
currently-known public suffix list.

something like:

---------------
struct psl_ctx;
typedef struct psl_ctx * psl_ctx_t;
const psl_ctx_t psl_builtin;

psl_ctx_t psl_new_ctx_from_filename(const char* filename);
psl_ctx_t psl_new_ctx_from_fd(int fd);
void psl_free_ctx(psl_ctx_t ctx);

/*
  query forms, very rough draft -- do we need both?
  need to consider memory allocation responsibilities and
  DNS internationalization/canonicalization issues
*/

const char* psl_get_public_suffix(const psl_ctx_t, const char* domain);
const char* psl_get_registered_domain(const psl_ctx_t, const char* d);
---------------

> "the folks" it's me ;-)

Hi "the folks" :)  (and thanks for your work on mget!)

> I already thought of splitting libmget into several smaller libraries, like 
> libmget-common, libmget-cookies, libmget-psl ... whatever is needed.
> 
> What exactly do you think of ? What can I do to make Debian packaging easy ?

hm, it looks like libmget isn't in debian at all right now.  I'm swamped
with packaging work, and i'm not prepared to review something as
full-featured as libmget itself.  if you could break out the
publicsuffix code so that it was a distinct project from mget, but
provided the API that met the needs of libmget-cookies, that would be
the simplest thing for me to review and package;  we could run any
proposed API by Nikos to make sure it meets the needs of GnuTLS as well,
if you think it's a good idea to push this verification into the TLS
stack itself.

thanks for thinking about this,

        --dkg

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]