bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow


From: Dmitry Gutov
Subject: bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
Date: Wed, 17 Aug 2022 15:14:07 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1

On 17.08.2022 14:24, Eli Zaretskii wrote:
Date: Tue, 16 Aug 2022 22:32:23 +0300
Cc: 57245@debbugs.gnu.org
From: Dmitry Gutov <dgutov@yandex.ru>

On 16.08.2022 19:54, Eli Zaretskii wrote:
Stefan, can you see why syntax-related stuff in sgml-mode is so heavy
here?

nxml-syntax-propertize might well be heavier than average, but the delay
scales linearly with the size of the file.

Which is generally not a good scaling factor, especially if the
coefficient is quite large (as it seems to be in this case).

Someone can work on the coefficient, but any accurate parser has to scan the buffer from the beginning. At least once.

Migration to tree-sitter might give us a better coefficient later, but the principle will remain.

Which seems to be exactly the behavior the "font-lock narrowing" was
supposed to guard from?

No.  It wasn't supposed to fix modes that foolishly scan the buffer
from BOB to point.

You might want to choose words better.

It was supposed to fix modes which scan from the
beginning of line, and that is (a) only a problem when lines are very
long, and (b) much harder to solve in the mode itself, because
font-lock very frequently uses anchored regexps and otherwise likes to
start from BOL, and syntax processing also likes starting from BOL.

syntax-wholelines-max handles that problem.

Though it might depend on what you mean by "anchored regexps".

Btw, does nXML and/or sgml-mode use libxml for their analysis?  If
not, why not? wouldn't that be faster (and possibly more accurate)?

Might be "a simple matter of coding".

But we do need syntax-propertize to run, so that the user commands can rely on proper syntax information in the buffer. It remains to be seen whether xml-parse-region is a good base for nxml-syntax-propertize, and how much of a performance improvement it can bring (with all the string marshaling around).

nxml also probably handles invalid documents better, which might or might not be important.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]