--- Begin Message ---
Subject: |
30.0.50; [PATCH] 'project-find-regexp' passes Git submodules to the search program |
Date: |
Sat, 28 Oct 2023 22:36:07 -0700 |
X-Debbugs-Cc: dmitry@gutov.dev
To see this in action, you can do the following, starting from "emacs
-Q" inside of a Git repo that contains submodules:
M-x trace-function RET project-files RET
C-x p g some-string RET
If you look at the trace, you'll see that the files in your submodules
are returned from 'project-files', but so is the submodule directory.
That's not really correct, since 'project-files' is supposed to return
*files*, not directories. (There are already workarounds for this in
'project-search' and 'project-query-replace-regexp'.)
By default, this is just a theoretical problem, but if you customize
'xref-search-program-alist' and 'xref-search-program' to include some
other program, this can cause real issues. For example, I tried to add
"ag" to this list[1], and unfortunately, it just doesn't work in this
case. The results for submodules get duplicated, and there's no way I
can see with ag to search only the specified *files*, ignoring any
directories. (Looking at the definition for ripgrep, I'm guessing the
"-g '!*/'" is the trick for that program, but nothing similar works for ag.)
Attached are two possible patches for this: a minimal version that just
fixes 'project-find-regexp', and a maximal version that fixes this in
general, and should theoretically speed up 'project-search' and
'project-query-replace-regexp' since they no longer need to call
'file-regular-p' on every file.
Do either of these patches look ok?
[1] This is a long story, simplified for this message since the gory
details aren't especially relevant.
minimal_0001-Exclude-Git-submodules-when-getting-list-of-files-fo.patch
Description: Text document
maximal_0001-Exclude-Git-submodules-from-project-files.patch
Description: Text document
--- End Message ---
--- Begin Message ---
Subject: |
Re: bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git submodules to the search program |
Date: |
Mon, 30 Oct 2023 04:00:51 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 |
Version: 30.1
On 30/10/2023 02:58, Jim Porter wrote:
On 10/29/2023 2:41 PM, Dmitry Gutov wrote:
And that's not to mention usage over Tramp (which would be affected by
the +1 process call that you mentioned as well, but that seems
unavoidable).
Yeah, I don't see a way around that, unless we constructed a complex
command to run via Tramp that does it all in one go.
(I looked into using the "--stage" argument for "git ls-files", which
gets most of the way to fixing this, but you could break that logic with
evil file names not in the tree...)
Filtering out gitlinks based on the mode bits? Clever.
Would probably slow down the parsing of the output, though. Not sure by
how much.
Anyway, after recent experience micro-optimizing list operations, I
came up with this version where the impact seems minimal.
WDYT?
Thanks, that helped form the basis for the attached patch. It moves the
'member' call into the lambda for the original 'mapcar', and then we can
just 'delq' all the nil elements. Benchmarking against the Emacs repo,
this still seems to have a minimal perf impact.
Ah, nice. delq is pretty handy to abbreviate the relinking.
We could combine both steps to eliminate an extra list altogether, but
that only yields a further 2-3% improvement. Maybe later.
There might also be a benefit to using "git ls-files
--recurse-submodules" when we can (i.e. when not using "-o"), but maybe
we can leave that optimization for another day.
As long as it doesn't result in too much code duplication.
Anyway, I've pushed your latest patch. Thanks! And closing.
--- End Message ---