bug-guile
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20339: sxml simple: sxml->xml mishandles namespaces?


From: tomas
Subject: bug#20339: sxml simple: sxml->xml mishandles namespaces?
Date: Wed, 13 Jul 2016 15:24:03 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, Jun 23, 2016 at 09:32:16PM +0200, Andy Wingo wrote:
> See thread here as well:
> http://thread.gmane.org/gmane.lisp.guile.devel/17709
> 
> I like Ricardo's patch but have some comments here:
> http://article.gmane.org/gmane.lisp.guile.devel/18384

(sorry for cc'ing both of you, but I don't know whether you are
subscribed to the bug. Two copies seemed more polite than none).

Sorry folks for not coming back earlier. Real Life and things.

Since I'm going to be off the 'net for one month starting next Friday,
I thought I'll write a short note.

I'll be back the 15th of August and am really willing to do whatever
it takes to bring this forward. OTOH, if any of you decides to pick
it up, I'm sure the results will be better :-)

Referring to Oleg Kiseliov's paper [1], there are actually three
things involved:

 - the namespace. This is an XML thing and will typically be
   an URI (I don't quite remember whether it *must* be an
   URI, but that's irrelevant. It may contain nasty characters
   (to XML: it isn't an XML "Name", and potentially to Scheme:
   there may be patentheses and things in there, so some
   Schemes won't make a symbol of that; Guile doesn't mind)

 - the namespace prefix. Again, an XML thing, basically giving
   a non-nasty abbreviation for the namespace, to stick it to
   the Name, making a "QName". The association prefix -> namespace
   is scoped to a node and its descendants, and can be shadowed
   at some node below

 - the namespace-id, an SXML thing. In [1], this is typically
   the namespace, but Oleg Kyselyov made provisions in [1] for a
   similar "abbreviation" (the user-ns-shortcut in [1], page 3),
   whose mapping can be attached to any node via the
   pseudo-attribute *NAMESPACES* [2], which can also carry the
   original (XML) namespace prefix.

   As far as I understand the paper, most of the time this
   namespace-id will be identical to the URI, but it is this
   what will be prefixed to the tag name symbols in the
   SXML representation.

What Ricardo's patch does is to conflate namespace prefix and
namespace-id and provide a mapping (namespace-id aka prefix) ->
namespace. This is actually quite elegant, since we don't need
the distinction between (XML) prefix and (SXML) namespace-id.

I think that we can, at least as (sxml simple) is concerned,
ignore this distinction.

What is missing? From my point of view:

 - At xml->sxml time, the user doesn't know which namespaces
   are in the xml. So it would be nice if the XML parser
   could provide that.

 - It would be super-nice if the XML parser could put that
   into the same nodes it found it, as described in [1]
   (i.e. in the (*NAMESPACES* ...) pseudo-attribute).
   This way we wouldn't have a global mapping, but one
   that resembles the original XML, even with the same
   prefixes. Less surprises overall. The round trip
   xml -> sxml -> xml would be (nearly) the identity.

   With Ricardo's patch it would lump all the namespace
   declarations up in the top node, which formally is
   correct, but might scare XML people a bit :-)

 - At sxml->xml time there should be a way to somehow
   generate prefixex for "new" namespaces. I don't know
   at the moment how this would work, that depends on
   how the user is supposed to insert new nodes in the
   SXML. Does she specify the namespace? Both prefix
   (aka namespace-id, under my current assumption) *and*
   namespace? (note that the namespace-id/prefix alone
   wouldn't be sufficient).

Sorry for this wall of text. I hope it makes some sense.

Regards

[1] http://okmij.org/ftp/papers/SXML-paper.pdf
[2] Actually, I'm cheating here: the thing is part of an
   "annotations" part, which according to the grammar comes
   *last*, after all the attributes. But it looks a bit
   like an attribute, with a strange name and a more
   complex value.

- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAleGQPMACgkQBcgs9XrR2kaMfgCeKbA4pWFrCZoxofDF4n9utgnZ
IzYAn1gozFwBLPd/rmNkZvJYDTJ9cIvr
=etJd
-----END PGP SIGNATURE-----





reply via email to

[Prev in Thread] Current Thread [Next in Thread]