help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Decoding URLs input


From: Yuri Khan
Subject: Re: Decoding URLs input
Date: Sun, 4 Jul 2021 03:16:37 +0700

On Sun, 4 Jul 2021 at 02:20, Jean Louis <bugs@gnu.support> wrote:

> From docstring of `url-unhex-string' I did not expect it would give
> just bytes back, then that should be IMHO described there, I am not
> sure really. Maybe it is assumed for programmer to know that.

I just fed it some percent-encoded sequences that I knew would result
in invalid UTF-8 when decoded. If it were doing a full decode, I
expected it to signal an error. It didn’t.

> The docstring is poor, it says like: "Remove %XX embedded spaces, etc in a
> URL." -- with "remove" I don't expect converting UTF-8 into bytes.

Yeah, that is bad. If I see “remove %xx” in a docstring, I expect
(string= (f "Hello%20World") "HelloWorld").

> I am now solving the issue that spaces are converted to plus sign and
> that I have to convert + signs maybe before:
> (decode-coding-string (url-unhex-string "Hello+There") 'utf-8)
> but maybe not before, maybe I leave it and convert later.

You have to replace them before percent-decoding. If you try it after
percent-decoding, you will not be able to distinguish a + that encodes
a space from a + that you just decoded from %2B. Luckily, spaces never
occur in a valid encoded query string; if they did and had some
meaning, you’d have to decode + *at the same time* as %xx.

Here, have some test cases:

    "Hello+There%7DWorld"   → "Hello There}World"
    "Hello%2BThere%7DWorld" → "Hello+There}World"


By the way, you’re in for some unspecified amount of pain by trying to
implement a web application without a framework. (And by a framework I
mean a library that would give you well-tested means to encode/decode
URL parts, HTTP headers, gzipped request/response bodies, base64,
quoted-printable, application/x-www-form-urlencoded,
multipart/form-data, json, …) CGI is not nearly as simple as it
initially appears to be when you read a hello-cgi-world tutorial.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]