bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#50946: insert-file-contents can corrupt buffers.


From: Eli Zaretskii
Subject: bug#50946: insert-file-contents can corrupt buffers.
Date: Sun, 03 Oct 2021 18:25:57 +0300

> Date: Sun, 3 Oct 2021 15:04:27 +0000
> Cc: joaotavora@gmail.com, 50946@debbugs.gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> Here is an updated patch, superseding my patch from midday.  I have
> amended the descriptions of the two functions, replacing "corruption" of
> the buffer by "inserting raw-text characters" in the first function, and
> added explanation to the second.

Thanks, see below some comments.

> I wasn't able to find a suitable target for a cross-reference explaining
> "raw-text".

I think "Coding System Basics" is where we describe that encoding.

> --- a/doc/lispref/files.texi
> +++ b/doc/lispref/files.texi
> @@ -556,14 +556,18 @@ Reading from Files
>  
>  If @var{beg} and @var{end} are non-@code{nil}, they should be numbers
>  that are byte offsets specifying the portion of the file to insert.
> -In this case, @var{visit} must be @code{nil}.  For example,
> +In this case, @var{visit} must be @code{nil}.  Be careful to ensure
> +that these byte positions are at character boundaries.  Otherwise,
> +Emacs's character code conversion will insert one or more raw-text
> +characters into the buffer, which is probably not what you want.  For

This isn't the whole story.  The problem is mainly with the
autodetection of encoding: it can go awry if you give it only a
portion of the file.  But if you bind coding-system-for-read, that
problem goes away, and the only effect of using BEG and END arguments
is limited to the first character/byte read.  In particular, if you
read a file in chunks, the character at the boundary could end up as 2
or more raw bytes -- but as long as you bind coding-system-for-read,
no other parts are supposed to be affected.  And the problematic
sequence of raw bytes can then be converted back to the original
character with very simple Lisp.

So the text you propose is too "frightening", in that it basically
says "don't use that".  Which is too tough, because valid use cases to
use that feature do exist, and if the programmer knows what he/she is
doing it doesn't have to produce garbled buffers.  For the manual, we
need more informative text, which mentions coding-system-for-read.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]