[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] libpsl design [was: Re: Overly permissive hostname matching]
From: |
Daniel Kahn Gillmor |
Subject: |
[Bug-wget] libpsl design [was: Re: Overly permissive hostname matching] |
Date: |
Fri, 21 Mar 2014 16:13:43 -0400 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Icedove/24.3.0 |
On 03/21/2014 05:03 AM, Tim Ruehsen wrote:
> Maybe you could just open issues (or even better, fork the repo, make your
> changes and create pull requests).
i've just pushed some cleanup suggestions here:
https://github.com/rockdaboot/libpsl/pull/1
i see you've pulled them already, thanks!
i've got three more conceptual issues which warrant discussion, rather
than a patch, though. If there's a better place to have this discussion
than this mailing list, i'm happy to move to it, please let me know where.
psl_is_tld() semantics
----------------------
the way i see it, we know what it means for psl_is_tld() to return
"true" -- but "false" could mean either:
(A) "this zone is subordinate to a TLD" (as example.com is to com)
or
(B) "this zone is superior to a TLD" (as uk is to co.uk). Note that
"uk" is not a public suffix.
libpsl in its current state appears to assume that psl_is_tld("uk")
return "true" even though "uk" is not a TLD, and is not a public suffix,
and does not meet Ángel's "one domain under which anyone* can register a
subdomain" definition.
perhaps if we invert the sense of the current test it will match more
cleanly. what about:
psl_is_private(char* d)
so:
psl_is_private("uk") → false
psl_is_private("example.com") → true
psl_is_private("www.example.com") → true
psl_is_private("a.b.c.example.com") → true
psl_is_private(".") → false
psl_is_private("com") → false
psl_is_private("co.ar") → false
the other API that might be relevant would be something like
psl_get_private_zone(char* d), which would return the shortest private
zone that contains d. so:
psl_get_private_zone("www.example.com") → "example.com"
psl_get_private_zone("example.co.uk") → "example.co.uk"
psl_get_private_zone("a.b.c.d.example.net") → "example.net"
psl_get_private_zone("com") → ERROR
psl_get_private_zone("uk") → ERROR
(this is the API supplied by regdom-libs, i think)
I chose the term "private" in contrast with the "public" from "public
suffix list" -- if folks have a better word to use, i'm happy to swap
something else in. regdom-libs uses the term "registered", which i
think means "placed in the public registry", which is intelligible to
me, but maybe only because i've thought about this problem way more than
anyone should have to. i don't know how much sense it would make to
users of the library.
IDNA
----
I hate to bring this up, because it's a nightmare and i have no good
answers, but what does this library expect to do about non-ASCII domain
names? effective_tld_names.dat contains the limits in unicode, encoded
as UTF-8, e.g.:
// xn--mgba3a4f16a.ir (<iran>.ir, Persian YEH)
ایران.ir
should we assume that the input from the user is in a similar form? do
we care about locale issues? what about unicode canonicalization? what
if the incoming data is in punycode (the xn--* ascii form) already?
the GNU folks have done the ugly ugly work for us if we're willing to
link to lgpl'ed libraries:
https://www.gnu.org/software/libidn/
malformed inputs
----------------
What should the library do with malformed inputs? i'm thinking about
super-long strings, strings starting with more than one dot, or with
multiple dots adjacent to each other, strings that don't match whatever
encoding we're expecting users to send, etc.
--dkg
signature.asc
Description: OpenPGP digital signature
- Re: [Bug-wget] Overly permissive hostname matching, (continued)
- Re: [Bug-wget] Overly permissive hostname matching, Jeffrey Walton, 2014/03/20
- Re: [Bug-wget] Overly permissive hostname matching, Ángel González, 2014/03/20
- Re: [Bug-wget] Overly permissive hostname matching, Jeffrey Walton, 2014/03/20
- Re: [Bug-wget] Overly permissive hostname matching, Daniel Stenberg, 2014/03/20
- Re: [Bug-wget] Overly permissive hostname matching, Ángel González, 2014/03/20
- Re: [Bug-wget] Overly permissive hostname matching, Jeffrey Walton, 2014/03/20
- Re: [Bug-wget] Overly permissive hostname matching, Ángel González, 2014/03/20
- Re: [Bug-wget] Overly permissive hostname matching, Tim Ruehsen, 2014/03/21
- Re: [Bug-wget] Overly permissive hostname matching, Ángel González, 2014/03/20
- Re: [Bug-wget] Overly permissive hostname matching, Tim Ruehsen, 2014/03/21
- [Bug-wget] libpsl design [was: Re: Overly permissive hostname matching],
Daniel Kahn Gillmor <=
- Re: [Bug-wget] libpsl design [was: Re: Overly permissive hostname matching], Ángel González, 2014/03/21
- Re: [Bug-wget] libpsl design, Daniel Kahn Gillmor, 2014/03/21
- Re: [Bug-wget] libpsl design, Ángel González, 2014/03/21
- Re: [Bug-wget] libpsl design, Tim Rühsen, 2014/03/22
- Re: [Bug-wget] libpsl design, Daniel Kahn Gillmor, 2014/03/22
- Re: [Bug-wget] libpsl design, Tim Rühsen, 2014/03/23
- Re: [Bug-wget] libpsl design, Dagobert Michelsen, 2014/03/23
- Re: [Bug-wget] libpsl design, Daniel Kahn Gillmor, 2014/03/23
- Re: [Bug-wget] Read error at byte ..., Tim Ruehsen, 2014/03/19
- Re: [Bug-wget] Read error at byte ..., Ángel González, 2014/03/19