bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multilin

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multilin

From:	Dmitry Gutov
Subject:	bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
Date:	Thu, 17 Dec 2020 02:40:09 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

On 16.12.2020 22:32, Juri Linkov wrote:

Another backup plan is to use ripgrep.  Its multiline handling with -U
also allows to search words ignoring any whitespace, even newlines.
This is like isearch-lax-whitespace using search-whitespace-regexp
when it contains a newline, e.g. "[ \t\r\n]+".


Right. It has a problem of its own, though: it still outputs a file name
per line, even when a match is spread across several lines (unlike
pcregrep). So we're left guessing where a given multiline match ends.

Also, 'sort' doesn't seem to be able to treat both : and \0 as separators
at the same time.

Here's a rough patch, for illustration.


Thanks, now finally it's possible to search text ignoring whitespace
between words, for example:

   Find regexp: file[   
]+names

finds everything correctly, even though current implementation maybe
not the most elegant.

It's kind of working, but I'm not loving it.


What do you think about using the option `rg --json`?
Emacs has the fast JSON parsing library now, so using
JSON output would be more reliable.

Very interesting. It returns better data, each multiline match is whollyin one entry instead of being spread across lines. Even the matches areannotated with match string/length/absolute position.

We should really investigate it, but perhaps a bit later, including ourcapability to parse it quickly when there are a lot of matches (>1000),how said byte offsets interact with different file encodings.

Also, its output is not one JSON document but a series of them(including ones with just search statistics which we'll want to skip),but some re-search-forward followed by (json-parse-buffer) should do thetrick.

In the meantime, here's a smaller patch using the traditional outputformat. I figure since there is a file name on each line anyway, --nulldoesn't help much. So it can be simplified a little (see attached).

Unfortunately, xref-replace-in-matches is broken for such multilinematches. And, of course, it merges together matches on adjacent lines,whether they are one match or several (that hasn't changed from theprevious match). So more investigation is needed.

ripgrep-multiline.diff
Description: Text Data

[Prev in Thread]

Current Thread

[Next in Thread]

bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps, (continued)
- bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps, Dmitry Gutov, 2020/12/02
- bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps, Juri Linkov, 2020/12/01
  - bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps, Dmitry Gutov, 2020/12/02
    - bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps, Juri Linkov, 2020/12/06
    - bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps, Dmitry Gutov, 2020/12/15
    - bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps, Juri Linkov, 2020/12/16
    - bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps, Dmitry Gutov <=
- bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps, Juri Linkov, 2020/12/01
  - bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps, Dmitry Gutov, 2020/12/01

Prev by Date: bug#42406: Mouse-wheel scrolling can be flickering
Next by Date: bug#42406: Mouse-wheel scrolling can be flickering
Previous by thread: bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
Next by thread: bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
Index(es):
- Date
- Thread