[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Demexp-dev] What to check in an HTTP URL?
From: |
David MENTRE |
Subject: |
Re: [Demexp-dev] What to check in an HTTP URL? |
Date: |
Fri, 23 Sep 2005 16:45:19 +0200 |
User-agent: |
Gnus/5.1006 (Gnus v5.10.6) Emacs/21.4 (gnu/linux) |
Hello Félix and FX,
François-Xavier Ponscarme <address@hidden> writes:
> http://www.foad.org/~abigail/Perl/url2.html
Well, it is rather complicated to do an extensive check. I looked
quickly at BigBrother code but it uses Pcre with camlp4 extensions so I
don't want to dig in that code for now.
Right now, I do following checks:
- link field size is limited to 256 bytes;
- the link should match following (OCaml Str[1]) regexp:
^http://[-A-Za-z0-9_.]+\\(:[0-9]+\\)?[-A-Za-z0-9+&:;@_.%=?/]*$
This regexp limit ourself to a pretty basic character set and HTTP. I
prefer to be too much restrictive, and loosing the check afterwards if
needed.
Let me know if you see potential issues in this check.
Yours,
d.
Footnotes:
[1] Like in Emacs, '(' and ')' are doubly escaped.
--
pub 1024D/A3AD7A2A 2004-10-03 David MENTRE <address@hidden>
5996 CC46 4612 9CA4 3562 D7AC 6C67 9E96 A3AD 7A2A