[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Decoding URLs input
From: |
Yuri Khan |
Subject: |
Re: Decoding URLs input |
Date: |
Sat, 3 Jul 2021 18:10:47 +0700 |
On Sat, 3 Jul 2021 at 16:41, Jean Louis <bugs@gnu.support> wrote:
> As I am developing Double Opt-In CGI script served by Emacs I am
> unsure if this function is correct to be used the encoded strings that
> come from URL GET requests, like http://www.example.com/?message=Hello%20There
>
> (rfc2231-decode-encoded-string "Hello%20there") ⇒ "Hello there"
>
> If anybody knows or have clues, let me know. In other programming
> languages I have not been thinking of RFC, I don't know which RFC
> applies there.
Why not look at the RFC referenced in order to see whether it is or is
not relevant to your task?
https://datatracker.ietf.org/doc/html/rfc2231
It talks about encoding MIME headers, which is not what you’re dealing
with; and its encoded strings look like
<encoding>'<locale>'<percent-encoded-string>, which is not what you
have.
What you are dealing with is a URL, specifically, its query string
part. These are described in RFC 3986, and its percent-encoding scheme
in sections 2.1 and 2.5.
(url-unhex-string …) will do half the work for you: It will decode
percent-encoded sequences into bytes. By convention, in URLs,
characters are UTF-8-encoded before percent-encoding (see RFC 3986 §
2.5), so you’ll need to use:
(decode-coding-string (url-unhex-string s) 'utf-8)
to get a fully decoded text string.