emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to get buffer byte length (not number of characters)?


From: Joseph Turner
Subject: Re: How to get buffer byte length (not number of characters)?
Date: Wed, 21 Aug 2024 16:52:39 -0700

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Joseph Turner <joseph@ushin.org>
>> Cc: emacs-devel@gnu.org, Andreas Schwab <schwab@suse.de>, Adam Porter
>>  <adam@alphapapa.net>
>> Date: Wed, 21 Aug 2024 02:20:09 -0700
>> 
>> Let's say I create a textual hyperdrive file using hyperdrive.el, and
>> then I upload it by sending its contents via curl to the local HTTP
>> server.  What coding system should be used when the file is uploaded?
>> 
>> Let's say I have a `iso-latin-1'-encoded file "foo.txt" on my local
>> filesystem.  I upload this encoded file to my hyperdrive by passing the
>> filename to curl, which uploads the bytes with no conversion.  Then I
>> open the "foo.txt" hyperdrive file using hyperdrive.el, which receives
>> the contents via curl from the local HTTP server.  In the hyperdrive
>> file buffer, buffer-file-coding-system should be `iso-latin-1' (right?).
>
> It's what I would expect, yes.  But you can try it yourself, of course
> and make sure it is indeed what happens.
>
>> Then, I edit the buffer and save it to the hyperdrive again with
>> hyperdrive.el, which this time sends the modified contents over the wire
>> to curl.  The uploaded file should be `iso-latin-1'-encoded (right?).
>
> Again, that'd be my expectation.  But it's better to test this
> assumption.
>
>> Currently, plz.el always creates the curl subprocess like so:
>> 
>> (make-process :coding 'binary ...)
>> 
>> https://git.savannah.gnu.org/cgit/emacs/elpa.git/tree/plz.el?h=externals-release/plz#n519
>> 
>> Does this DTRT?
>
> It could be TRT if plz.el encodes the buffer text "by hand" before
> sending the results to curl and decodes it when it receives text from
> curl.  Which I think is what happens there.

plz.el does not manually encode buffer text *within Emacs* when sending
requests to curl, but by default, plz.el sends data to curl with --data,
which tells curl to strip CR and newlines.  With the :body-type 'binary
argument, plz.el instead uses --data-binary, which does no conversion.

We don't want to strip newlines from hyperdrive files, so we always use
:body-type 'binary when sending buffer contents.  Should hyperdrive.el
encode data with `buffer-file-coding-system' before passing to plz.el?

When receiving text from curl, plz.el optionally decodes the text
according to the charset in the 'Content-Type' header, e.g., "text/html;
charset=utf-8" or utf-8 if no charset is found.

Perhaps hyperdrive.el should check the 'Content-Type' header charset,
then fallback to guessing the coding system based on filename and file
contents with `set-auto-coding' (to avoid decoding images, etc.), and
then finally fallback to something else?

Thank you!

Joseph



reply via email to

[Prev in Thread] Current Thread [Next in Thread]