|
From: | Adam Porter |
Subject: | Re: How to get buffer byte length (not number of characters)? |
Date: | Thu, 22 Aug 2024 07:26:58 -0500 |
User-agent: | Mozilla Thunderbird |
Hi Joseph, et al, On 8/22/24 02:24, Joseph Turner wrote:
plz.el does not manually encode buffer text *within Emacs* when sending requests to curl, but by default, plz.el sends data to curl with --data, which tells curl to strip CR and newlines. With the :body-type 'binary argument, plz.el instead uses --data-binary, which does no conversion.Newlines is a relatively minor issue (although it, too, needs to be considered). My main concern is with the text encoding. How can it be TRT to use 'binary when sending buffer text to curl? that would mean we are more-or-less always sending the internal representation of characters, which is superset of UTF-8. If the data was originally encoded in anything but UTF-8, reading it into Emacs and then sending it back will change the byte sequences from that other encoding to UTF-8. Moreover, 'binary does not guarantee that the result is valid UTF-8. So maybe I misunderstand how these plz.el facilities are used, but up front this sounds like a mistake.It could be. Eli, Adam, what do you think about the default coding systems for encoding the request body in the attached patch?
From an API perspective, I'm not sure. My idea for plz.el is to provide a simple, somewhat idiomatic Elisp API for making HTTP requests (and, of course, to make "correct" requests, in compliance with specifications and expectations). Given the relatively few clients of plz thus far, some issues are yet to be fully explored and developed, and encoding/decoding may be one of those rougher edges. For the use cases I'm aware of, it seems to work well and correctly, but there are undoubtedly improvements to be made.
Encoding/decoding is not exactly a simple matter, especially with regard to API design. Ultimately, no library can abstract it away from users' need to understand it. And I want plz's API to not have to change any more than necessary over time, so I'd want to be very deliberate with any changes to it. So it's appealing to do as little as possible in this regard, leaving as much as possible to the upstream user to handle outside of plz.
One way to do that is to do what hyperdrive.el is basically doing now, to tell plz to tell curl to handle the data as binary, i.e. to pass it through unchanged. But it seems that we haven't covered all of the bases with regard to these issues; rather, we have tested a subset of them that seem to work as expected.
Also, where it's possible to make plz DTRT automatically, integrating naturally with Elisp APIs and data structures, I'm certainly in favor of that. So, e.g. automatically using a buffer's expected encoding when passing its data to curl seems like the right thing to do, which plz doesn't do yet (and perhaps we could do the same thing when returning a buffer of data).
Of course, AFAIK we can't do such a thing when passing a string, so I guess the most we can do there is document recommended patterns for the user; IOW I'm tempted to leave encoding of strings to the user rather than add another argument for that, but we can talk about it.
Thanks, Adam
[Prev in Thread] | Current Thread | [Next in Thread] |