bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#4175: 23.1; nxml-mode: Internal error in rng-validate-mode triggered


From: Mattias Engdegård
Subject: bug#4175: 23.1; nxml-mode: Internal error in rng-validate-mode triggered
Date: Sun, 10 Jul 2022 13:00:27 +0200

The bug is still very much there: I can reproduce it by reducing 
emacs_re_max_failures from 40000 to 4000. It's just a matter of file size. The 
failing regexp (used at xmltok.el:735) is, after rx conversion,

(rx (group
     (| (group "xmlns")
        (: (in "_" alpha)
           (* (in "._-" alnum))))
     (? (group ":"
               (in "_" alpha)
               (* (in "._-" alnum)))))
    (* (in "\t\n\r "))
    "="
    (? (* (in "\t\n\r "))
       (group
        (| (: "'"
              (* (not (in "\t\n\r&'<")))
              (? (group
                  (in "\t\n\r&")
                  (* (not (in "'<")))))
              "'")
           (: "\""
              (* (not (in "\t\n\r\"&<")))    ;;
              (? (group                      ;;
                  (in "\t\n\r&")             ;;
                  (* (not (in "\"<")))))     ;;
              "\"")))
       (| (group
           (* (in "\t\n\r "))
           ">")
          (: (group
              (* (in "\t\n\r "))
              "/")
             (? (group ">")))
          (group
           (+ (in "\t\n\r "))))))

and the overflow likely occurs somewhere in the ;;-marked section above, while 
parsing the big d="..." attribute value. That value isn't huge (55 KiB) and in 
any case our parser clearly shouldn't need stack space in proportional to an 
XML attribute value. (The default stack limit fails with attributes around 300 
KiB in size, which is not big for an SVG file.) Isolated test case:

(let ((s (concat "'" (make-string 300000 ?a) "'")))
  (string-match
   (rx "'"
       (* (not (in "\t\n\r&'<")))
       (? (group
           (in "\t\n\r&")
           (* (not (in "'<")))))
       "'")
   s))

I suggest you rewrite the attribute parser so that it doesn't eat regexp stack. 
For instance,

(rx "'" (* (not (in "'<"))) "'")

doesn't consume stack (thanks to the on_failure_keep_string_jump optimisation). 
The parser needs to be a little more complex than that and validate entities 
(the &xyz; things) and detect (and recover from) common errors such as missing 
end quotes, so a single regexp isn't sufficient.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]