help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Text copied from *grep* buffer has NUL (0x00) characters


From: R. Diez
Subject: Re: Text copied from *grep* buffer has NUL (0x00) characters
Date: Sun, 9 May 2021 20:47:28 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1


There's nothing wrong with null bytes in a UTF-8 encoded file, not in
general.

Well, that's true by the book.

I already mentioned Meld and Pluma. Xfce's text editor, Mousepad, refuses too 
to open UTF-8 files with BOM if they contain a NUL character.

gedit at least used to have the same problem:
https://superuser.com/questions/246014/use-gedit-to-open-file-with-null-characters

Geany truncates the file at the first NUL.

So it is a problem in practice.

But we could of course insist on everyone switching to a proper text editor when they try to open our UTF-8 files with embedded NULs. That will surely make us even more popular... ]8-)


We could have an optional warning about null bytes (when?
when you save the buffer?).  But I see no reason to do that by
default, especially since such a feature would require a costly search
of the entire buffer.

Some terminal emulators warn when pasting suspicious text.

Emacs is already checking all bytes on save. I inserted an invalid sequence and 
got this warning on save:

-------------8<-------------8<-------------
These default coding systems were tried to encode text
in the buffer ‘Test3.txt’:
  (utf-8-with-signature-dos (11 . 4194176) (12 . 4194239))
However, each of them encountered characters it couldn’t encode:
  utf-8-with-signature-dos cannot encode these: \200 \277

Click on a character (or switch to this window by ‘C-x o’
and select the characters by RET) to jump to the place it appears,
where ‘M-x universal-argument C-x =’ will give information about it.

Select one of the safe coding systems listed below,
or cancel the writing with C-g and edit the buffer
   to remove or modify the problematic characters,
or specify any other coding system (and risk losing
   the problematic characters).

  raw-text no-conversion
-------------8<-------------8<-------------

Therefore, I don't think it would cost too much to check for NULs at the same 
time, and give users the choice.


This is easy to fix: customize the Grep command to not include
"--null".  That switch is mainly for systems that allow newlines in
file names, which MS-Windows doesn't allow, so if this switch causes
trouble in your usage, simply remove it.

I am using Linux. Of course, now that I know what the issue is, I can just remove --null from the grep command and be done with it. That would quietly fix the problem for me.

The reason I wrote a long e-mail is to illustrate my head scratching when I got hit several times, because it is not obvious where the problem is coming from.

I'll post again if I manage to reproduce a more serious variant of this issue where the file started to show Chinese characters in other editors, while Emacs decided to start showing ^M at the end of the lines. My guess is that it was a similar gotcha, because I have been copying from the *grep* buffer a few times in the last days.

I believe that this NUL gotcha is going to hit many people, who will then think "this is just another Emacs quirk". After all, the grep --null is a relatively recent change in Emacs 26.1 . And many log files have embedded NUL characters too, so you may inadvertently copy NUL characters along.


> For the detection of NULs in UTF-8 files, you could also ask for such
> a feature via `M-x report-emacs-bug` but it should be pretty easy to get
> something comparable with something like:
> [...]

I don't think it is desirable for users to install such Lisp hooks to deal with such corner cases. My opinion is that Emacs should be more helpful here by default. But maybe this mailing list post is enough, if users facing such "corruption" or character encoding problems manage to enter the right search terms.


> This "what you see in NOT what you get" is indeed undesirable.  I'm not
> sure it's easy to fix in a reliable way in Emacs (beside not using
> `--null` as Eli points out), but I suggest you `M-x report-emacs-bug`.
> Maybe grep-mode can add a `filter-buffer-substring-function` that
> converts those NUL into `:`.

That seems fair. I'll report that as a bug.

Regards,
  rdiez



reply via email to

[Prev in Thread] Current Thread [Next in Thread]