[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#4175: 23.1; nxml-mode: Internal error in rng-validate-mode triggered
From: |
Mattias Engdegård |
Subject: |
bug#4175: 23.1; nxml-mode: Internal error in rng-validate-mode triggered |
Date: |
Sun, 10 Jul 2022 13:00:27 +0200 |
The bug is still very much there: I can reproduce it by reducing
emacs_re_max_failures from 40000 to 4000. It's just a matter of file size. The
failing regexp (used at xmltok.el:735) is, after rx conversion,
(rx (group
(| (group "xmlns")
(: (in "_" alpha)
(* (in "._-" alnum))))
(? (group ":"
(in "_" alpha)
(* (in "._-" alnum)))))
(* (in "\t\n\r "))
"="
(? (* (in "\t\n\r "))
(group
(| (: "'"
(* (not (in "\t\n\r&'<")))
(? (group
(in "\t\n\r&")
(* (not (in "'<")))))
"'")
(: "\""
(* (not (in "\t\n\r\"&<"))) ;;
(? (group ;;
(in "\t\n\r&") ;;
(* (not (in "\"<"))))) ;;
"\"")))
(| (group
(* (in "\t\n\r "))
">")
(: (group
(* (in "\t\n\r "))
"/")
(? (group ">")))
(group
(+ (in "\t\n\r "))))))
and the overflow likely occurs somewhere in the ;;-marked section above, while
parsing the big d="..." attribute value. That value isn't huge (55 KiB) and in
any case our parser clearly shouldn't need stack space in proportional to an
XML attribute value. (The default stack limit fails with attributes around 300
KiB in size, which is not big for an SVG file.) Isolated test case:
(let ((s (concat "'" (make-string 300000 ?a) "'")))
(string-match
(rx "'"
(* (not (in "\t\n\r&'<")))
(? (group
(in "\t\n\r&")
(* (not (in "'<")))))
"'")
s))
I suggest you rewrite the attribute parser so that it doesn't eat regexp stack.
For instance,
(rx "'" (* (not (in "'<"))) "'")
doesn't consume stack (thanks to the on_failure_keep_string_jump optimisation).
The parser needs to be a little more complex than that and validate entities
(the &xyz; things) and detect (and recover from) common errors such as missing
end quotes, so a single regexp isn't sufficient.