emacs-pretest-bug
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Refactoring xml.el namespace handling


From: Mark A. Hershberger
Subject: Refactoring xml.el namespace handling
Date: Mon, 01 Mar 2004 09:59:10 -0600
User-agent: Gnus/5.110002 (No Gnus v0.2) Emacs/22.0.0 (gnu/linux)

(Sent this earlier to just emacs-devel...)

In late September, I proposed changing the way xml.el handles
namespaces.  That message is appended below.

Because of a variety of things that happened between then and now, I
never submitted a patch to make these changes.

This patch differs slightly from the proposal in that namespaces are
interned after being concatenated with `:'.  Thus,

  <top xmlns="urn"/>

Is parsed into:

  (((:urn . "top")
   (((:http://www\.w3\.org/2000/xmlns/ . "") . "urn"))))

Or, without namespace aware parsing:

  ((top ((xmlns . "urn"))))

2004-02-29  Mark A. Hershberger  <address@hidden>

        * xml.el (xml-maybe-do-ns): new function to handle namespace
        expansion of qnames.
        (xml-parse-region): Update to work with namespace parsing.
        (xml-parse-tag): Change namespace parsing so that it produces
        (:uri . "lname") instead of {uri}lname.  Avoid unneccessy
        interning.
        (xml-parse-attlist): Update to work with namespace parsing.


*** xml.el      14 Jul 2003 20:41:12 -0000      1.23
--- xml.el      1 Mar 2004 15:53:50 -0000
***************
*** 52,66 ****
  
  ;;; LIST FORMAT
  
! ;; The functions `xml-parse-file' and `xml-parse-tag' return a list with
! ;; the following format:
  ;;
  ;;    xml-list   ::= (node node ...)
! ;;    node       ::= (tag_name attribute-list . child_node_list)
  ;;    child_node_list ::= child_node child_node ...
  ;;    child_node ::= node | string
! ;;    tag_name   ::= string
! ;;    attribute_list ::= (("attribute" . "value") ("attribute" . "value") ...)
  ;;                       | nil
  ;;    string     ::= "..."
  ;;
--- 52,66 ----
  
  ;;; LIST FORMAT
  
! ;; The functions `xml-parse-file', `xml-parse-region' and
! ;; `xml-parse-tag' return a list with the following format:
  ;;
  ;;    xml-list   ::= (node node ...)
! ;;    node       ::= (qname attribute-list . child_node_list)
  ;;    child_node_list ::= child_node child_node ...
  ;;    child_node ::= node | string
! ;;    qname      ::= (:namespace-uri . "name") | "name"
! ;;    attribute_list ::= ((qname . "value") (qname . "value") ...)
  ;;                       | nil
  ;;    string     ::= "..."
  ;;
***************
*** 68,73 ****
--- 68,78 ----
  ;; Whitespace is preserved.  Fixme: There should be a tree-walker that
  ;; can remove it.
  
+ ;; TODO:
+ ;;  * xml:base, xml:space support
+ ;;  * more complete DOCTYPE parsing
+ ;;  * pi support
+ 
  ;;; Code:
  
  ;; Note that {buffer-substring,match-string}-no-properties were
***************
*** 214,220 ****
                    (setq result (xml-parse-tag parse-dtd parse-ns))
                    (cond
                     ((null result))
!                    ((listp (car result))
                      (setq dtd (car result))
                      (if (cdr result)  ; possible leading comment
                          (add-to-list 'xml (cdr result))))
--- 226,233 ----
                    (setq result (xml-parse-tag parse-dtd parse-ns))
                    (cond
                     ((null result))
!                    ((and (listp (car result))
!                            parse-dtd)
                      (setq dtd (car result))
                      (if (cdr result)  ; possible leading comment
                          (add-to-list 'xml (cdr result))))
***************
*** 225,230 ****
--- 238,274 ----
              (cons dtd (nreverse xml))
            (nreverse xml)))))))
  
+ (defun xml-maybe-do-ns (name default xml-ns)
+   "Perform any namespace expansion.  NAME is the name to perform the 
expansion on.
+ DEFAULT is the default namespace.  XML-NS is a cons of namespace
+ names to uris.  When namespace-aware parsing is off, then XML-NS
+ is nil.
+ 
+ During namespace-aware parsing, any name without a namespace is
+ put into the namespace identified by DEFAULT.  nil is used to
+ specify that the name shouldn't be given a namespace."
+   (if (consp xml-ns)
+       (let* ((splitup (split-string name ":"))
+              (lname (or (nth 1 splitup)
+                         (nth 0 splitup)))
+              (prefix (if (nth 1 splitup)
+                          (nth 0 splitup)
+                        default))
+              (ns  (progn 
+                      (if (and 
+                           (string-equal lname "xmlns")
+                           (not prefix))
+                          (cdr (assoc "xmlns" xml-ns))
+                        (cdr (assoc prefix xml-ns))))))
+         (if ns
+              (cons ns
+                    (if (and
+                         (string-equal lname "xmlns")
+                         (not prefix))
+                        ""
+                      lname))
+            lname))
+     (intern name)))
  
  (defun xml-parse-tag (&optional parse-dtd parse-ns)
    "Parse the tag at point.
***************
*** 237,248 ****
   - a pair : the first element is the DTD, the second is the node."
    (let ((xml-ns (if (consp parse-ns)
                    parse-ns
!                 (if parse-ns
                      (list
!                      ;; Default no namespace
!                      (cons "" "")
                       ;; We need to seed the xmlns namespace
!                      (cons "xmlns" "http://www.w3.org/2000/xmlns/";))))))
      (cond
       ;; Processing instructions (like the <?xml version="1.0"?> tag at the
       ;; beginning of a document).
--- 281,292 ----
   - a pair : the first element is the DTD, the second is the node."
    (let ((xml-ns (if (consp parse-ns)
                    parse-ns
!                 (when parse-ns
                      (list
!                      ;; "xml" namespace
!                      (cons "xml"   :http://www.w3.org/XML/1998/namespace)
                       ;; We need to seed the xmlns namespace
!                      (cons "xmlns" :http://www.w3.org/2000/xmlns/))))))
      (cond
       ;; Processing instructions (like the <?xml version="1.0"?> tag at the
       ;; beginning of a document).
***************
*** 278,338 ****
        (goto-char (match-end 1))
        (let* ((node-name (match-string 1))
             ;; Parse the attribute list.
!            (children (list (xml-parse-attlist) (intern node-name)))
             pos)
  
        ;; add the xmlns:* attrs to our cache
        (when (consp xml-ns)
          (mapcar
           (lambda (attr)
!            (let* ((splitup (split-string (symbol-name (car attr)) ":"))
!                   (prefix (nth 0 splitup))
!                   (lname (nth 1 splitup)))
!              (when (string= "xmlns" prefix)
!                (setq xml-ns (append (list (cons (if lname
!                                                     lname
!                                                   "")
!                                                 (cdr attr)))
!                                     xml-ns)))))
!          (car children))
! 
!         ;; expand element names
!         (let* ((splitup (split-string (symbol-name (cadr children)) ":"))
!                (lname (or (nth 1 splitup)
!                           (nth 0 splitup)))
!                (prefix (if (nth 1 splitup)
!                            (nth 0 splitup)
!                          "")))
!           (setcdr children (list
!                             (intern (concat "{"
!                                            (cdr (assoc-string prefix xml-ns))
!                                            "}" lname)))))
! 
!         ;; expand attribute names
!         (mapcar
!          (lambda (attr)
!            (let* ((splitup (split-string (symbol-name (car attr)) ":"))
!                   (lname (or (nth 1 splitup)
!                              (nth 0 splitup)))
!                   (prefix (if (nth 1 splitup)
!                               (nth 0 splitup)
!                             (caar xml-ns))))
! 
!              (setcar attr (intern (concat "{"
!                                           (cdr (assoc-string prefix xml-ns))
!                                           "}" lname)))))
           (car children)))
  
!       ;; is this an empty element ?
!       (if (looking-at "/>")
!       (progn
!         (forward-char 2)
!         (nreverse children))
  
!       ;; is this a valid start tag ?
!       (if (eq (char-after) ?>)
!           (progn
!             (forward-char 1)
              ;;  Now check that we have the right end-tag. Note that this
              ;;  one might contain spaces after the tag name
              (let ((end (concat "</" node-name "\\s-*>")))
--- 322,354 ----
        (goto-char (match-end 1))
        (let* ((node-name (match-string 1))
             ;; Parse the attribute list.
!            (children (list (xml-parse-attlist xml-ns) node-name))
             pos)
  
        ;; add the xmlns:* attrs to our cache
        (when (consp xml-ns)
          (mapcar
           (lambda (attr)
!              (when (and (listp (car attr))
!                         (eq :http://www.w3.org/2000/xmlns/
!                             (caar attr)))
!                (setq xml-ns (append (list (cons (cdar attr)
!                                                 (intern (concat ":" (cdr 
attr)))))
!                                     xml-ns))))
           (car children)))
  
!         ;; expand element names
!         (setcdr children (list (xml-maybe-do-ns (cadr children) "" xml-ns)))
  
!       ;; is this an empty element ?
!         (if (looking-at "/>")
!             (progn
!               (forward-char 2)
!               (nreverse children))
!         ;; is this a valid start tag ?
!         (if (eq (char-after) ?>)
!             (progn
!               (forward-char 1)
              ;;  Now check that we have the right end-tag. Note that this
              ;;  one might contain spaces after the tag name
              (let ((end (concat "</" node-name "\\s-*>")))
***************
*** 349,355 ****
                    (setq pos (point))
                    (search-forward "<")
                    (forward-char -1)
!                   (let ((string (buffer-substring pos (point)))
                          (pos 0))
  
                      ;; Clean up the string.  As per XML
--- 365,372 ----
                    (setq pos (point))
                    (search-forward "<")
                    (forward-char -1)
!                   (let ((string (buffer-substring
!                                    pos (point)))
                          (pos 0))
  
                      ;; Clean up the string.  As per XML
***************
*** 369,390 ****
                                      (cdr children))
                              (cons string children))))))))
  
!             (goto-char (match-end 0))
              (nreverse children))
!         ;;  This was an invalid start tag
!         (error "XML: Invalid attribute list")))))
!      (t       ;; This is not a tag.
        (error "XML: Invalid character")))))
  
! (defun xml-parse-attlist ()
!   "Return the attribute-list after point.Leave point at the first non-blank 
character after the tag."
    (let ((attlist ())
        start-pos name)
      (skip-syntax-forward " ")
      (while (looking-at (eval-when-compile
                         (concat "\\(" xml-name-regexp "\\)\\s-*=\\s-*")))
!       (setq name (intern (match-string 1)))
!       (goto-char (match-end 0))
  
        ;; See also: http://www.w3.org/TR/2000/REC-xml-20001006#AVNormalize
  
--- 386,410 ----
                                      (cdr children))
                              (cons string children))))))))
  
!               (goto-char (match-end 0))
              (nreverse children))
!           ;;  This was an invalid Start tag
!           (error "XML: Invalid attribute list")))))
!      (t       ;; This is not a tag
        (error "XML: Invalid character")))))
  
! 
! (defun xml-parse-attlist (&optional xml-ns)
!   "Return the attribute-list after point.  Leave point at the first non-blank 
character after the tag."
    (let ((attlist ())
        start-pos name)
      (skip-syntax-forward " ")
      (while (looking-at (eval-when-compile
                         (concat "\\(" xml-name-regexp "\\)\\s-*=\\s-*")))
!       (let ((name (match-string 1))
!             (end  (match-end 0)))
!         (setq qname (xml-maybe-do-ns name nil xml-ns))
!         (goto-char end))
  
        ;; See also: http://www.w3.org/TR/2000/REC-xml-20001006#AVNormalize
  
***************
*** 395,403 ****
        (if (looking-at "'\\([^']*\\)'")
            (setq start-pos (match-beginning 0))
          (error "XML: Attribute values must be given between quotes")))
- 
        ;; Each attribute must be unique within a given element
!       (if (assoc name attlist)
          (error "XML: each attribute must be unique within an element"))
  
        ;; Multiple whitespace characters should be replaced with a single one
--- 415,422 ----
        (if (looking-at "'\\([^']*\\)'")
            (setq start-pos (match-beginning 0))
          (error "XML: Attribute values must be given between quotes")))
        ;; Each attribute must be unique within a given element
!       (if (assoc qname attlist)
          (error "XML: each attribute must be unique within an element"))
  
        ;; Multiple whitespace characters should be replaced with a single one
***************
*** 405,415 ****
        (let ((string (match-string 1))
            (pos 0))
        (replace-regexp-in-string "\\s-\\{2,\\}" " " string)
!       (push (cons name (xml-substitute-special string)) attlist))
  
        (goto-char start-pos)
        (forward-sexp)                  ; we have string syntax
- 
        (skip-syntax-forward " "))
      (nreverse attlist)))
  
--- 424,433 ----
        (let ((string (match-string 1))
            (pos 0))
        (replace-regexp-in-string "\\s-\\{2,\\}" " " string)
!       (push (cons qname (xml-substitute-special string)) attlist))
  
        (goto-char start-pos)
        (forward-sexp)                  ; we have string syntax
        (skip-syntax-forward " "))
      (nreverse attlist)))
  
***************
*** 488,494 ****
  
           ;;  Translation of rule [45] of XML specifications
           ((looking-at
!            "<!ELEMENT\\s-+\\([[:alnum:].%;]+\\)\\s-+\\([^>]+\\)>")
  
            (setq element (intern (match-string 1))
                  type    (match-string-no-properties 2))
--- 506,512 ----
  
           ;;  Translation of rule [45] of XML specifications
           ((looking-at
!            "<!ELEMENT\\s-+\\([[:alnum:].%;-]+\\)\\s-+\\([^>]+\\)>")
  
            (setq element (intern (match-string 1))
                  type    (match-string-no-properties 2))
***************
*** 517,528 ****
            (goto-char end-pos))
           ((looking-at "<!--")
            (search-forward "-->"))
! 
           (t
!           (error "XML: Invalid DTD item")))
  
          ;;  Skip the end of the DTD
!         (search-forward ">"))))
      (nreverse dtd)))
  
  
--- 535,554 ----
            (goto-char end-pos))
           ((looking-at "<!--")
            (search-forward "-->"))
!            ((looking-at 
"<!ENTITY\\s-+\\([[:alnum:].%;-]+\\)\\s-+\\([^>]+\\)>")
!             ; Put the ENTITY in
!             (goto-char (match-end 0)))
!            ((looking-at "<!ATTLIST\\s+\\([[:alnum:].%;-]+\\)\\s+\\([^>]+\\)>")
!             ; Put in the ATTLIST
!             (goto-char (match-end 0)))
!            ((looking-at 
"<!NOTATION\\s+\\([[:alnum:].%;-]+\\)\\s+\\([^>]+\\)>")
!             ; Put in the NOTATION
!             (goto-char (match-end 0)))
           (t
!           (error "XML: Invalid DTD item"))))
  
          ;;  Skip the end of the DTD
!         (search-forward ">")))
      (nreverse dtd)))
  
  



-- 
Peace is only better than war if peace isn't hell, too.
    -- Walker Percy, "The Second Coming"


From:      Mark A. Hershberger
Subject:   More XML Parsing: Is anyone using CVS xml.el namespace processing?
Date:      Sun, 28 Sep 2003 12:31:48 -0500

Recently I had a short conversation with James Clark about the
structures xml.el produces.

I asked him what he thought about the current (CVS) namespace-aware
processing.  Based on his feedback, I plan to submit changes that
will return an incompatible structure to the one currently in CVS.

Currently, when xml.el encounters a bit of XML like:

    <ns:xml xmlns:ns="uri:namespace" ns:attr="value"/>

it produces:

    (({uri:namespace}xml 
     (({http://www\.w3\.org/2000/xmlns/}ns . "uri:namespace") 
      ({uri:namespace}attr . "value"))))

At the time that I wrote this, I saw some W3 docs where this style was
used and copied it.  Some people here asked me why I did this instead
of something like (uri:namespace . "xml"), but I forged ahead.

Now, after my conversation with Mr. Clark, I've been persuaded that I
was wrong.  At his suggestion, I'd like to change the above xml
representation produce the following:

    (((uri:namespace . "xml")
     ((((http://www\.w3\.org/2000/xmlns/ . "ns") . "uri:namespace") 
      ((uri:namespace . "attr") . "value")))))

As Mr. Clark said:

    ... there are typically not very many different namespace URIs, so
    keeping them in Emacs symbol table is not a problem; in the
    returned representation of the XML, the namespaces would be
    shared, but strings are mutable in Emacs, which is kind of ugly.

Where there is no namespace given:

    <xml attr="value">

It would produce the following:

    (("xml"
      (("attr" . "value"))))

Unless there are major objections, I'd like to repent of my previous
code and submit changes to produce the above.

Mark.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]