[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: dired-do-find-regexp failure with latin-1 encoding
From: |
Dmitry Gutov |
Subject: |
Re: dired-do-find-regexp failure with latin-1 encoding |
Date: |
Sat, 28 Nov 2020 23:04:10 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 |
On 28.11.2020 22:29, Eli Zaretskii wrote:
Ah, so this way the user explicitly searches for a regexp encoded as
latin-1?
More accurately, this is how to search in files encoded in Latin-1.
(The regexp also gets encoded in latin-1, but the important part is
the files' encoding.)
Right. So when there are files in different encodings, the result will
be not great, as expected.
Adding -a probably cannot do any harm, but its support should be
detected, since I don't think it's portable enough (it isn't in the
latest Posix spec, at least).
Are you sure about that? Are we sure it won't make searching binary
files slower, for example?
It will be slower, but more useful: by default Grep just says "Binary
file foo matches".
Do we want to search the "binary" files at all? Right now we simply
filter such matches out (see the definition of xref-matches-in-files),
and I have seen no complaints.
Also, the manual has this warning:
Warning: The -a option might output binary garbage, which can have
nasty side effects if the output is a terminal and if the terminal
driver interprets some of it as commands.
...which might conceivably mess up our parsing of Grep output sometimes?
This is not relevant, since we read that output, there's no terminal
device driver to interpret it and get messed up.
Our interpreter is our regexp with which we parse. But I suppose as long
as Grep doesn't insert unexpected newlines, the parser will be fine.
I actually don't think I understand why we need -a in this case, since
Grep looks for null bytes to decide this is a binary file, and encoded
non-ASCII characters don't have null bytes 9except if they are in
UTF-16).
Good question.
P.S. Or we can forgo all that and ask the users who want to search for
non-ASCII strings to install ripgrep.
We should support Grep regardless, since not everyone will have
ripgrep. And in any case, "C-x RET c" will be needed with it as well,
no?
I'd have to test it explicitly to say for sure, but:
ripgrep supports searching files in text encodings other than UTF-8,
such as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some
support for automatically detecting UTF-16 is provided. Other text
encodings must be specifically specified with the -E/--encoding flag.)
https://blog.burntsushi.net/ripgrep/#pitch
So if the file encoding is UTF-8, UTF-16, or latin-1 (AND the current
system locale matches that encoding), the search should work fine across
such files in different encodings, and without 'C-x RET c'.
Which doesn't cover all situations, of course, but it's about as much as
can be expected. And more than Grep can.
- dired-do-find-regexp failure with latin-1 encoding, Stephen Berman, 2020/11/28
- Re: dired-do-find-regexp failure with latin-1 encoding, Eli Zaretskii, 2020/11/28
- Re: dired-do-find-regexp failure with latin-1 encoding, Stephen Berman, 2020/11/28
- Re: dired-do-find-regexp failure with latin-1 encoding, Dmitry Gutov, 2020/11/28
- Re: dired-do-find-regexp failure with latin-1 encoding, Eli Zaretskii, 2020/11/28
- Re: dired-do-find-regexp failure with latin-1 encoding,
Dmitry Gutov <=
- Re: dired-do-find-regexp failure with latin-1 encoding, Dmitry Gutov, 2020/11/28
- Re: dired-do-find-regexp failure with latin-1 encoding, Eli Zaretskii, 2020/11/29
- Re: dired-do-find-regexp failure with latin-1 encoding, Dmitry Gutov, 2020/11/29
- Re: dired-do-find-regexp failure with latin-1 encoding, Eli Zaretskii, 2020/11/29
- Re: dired-do-find-regexp failure with latin-1 encoding, Dmitry Gutov, 2020/11/29
- Re: dired-do-find-regexp failure with latin-1 encoding, Eli Zaretskii, 2020/11/29
- Re: dired-do-find-regexp failure with latin-1 encoding, Dmitry Gutov, 2020/11/29
- Re: dired-do-find-regexp failure with latin-1 encoding, Eli Zaretskii, 2020/11/29
- Re: dired-do-find-regexp failure with latin-1 encoding, Yuri Khan, 2020/11/29
- Re: dired-do-find-regexp failure with latin-1 encoding, Stephen Berman, 2020/11/29