gzz-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz-commits] fenfire/docs/pegboard/canon3_file_format--benja...


From: Benja Fallenstein
Subject: [Gzz-commits] fenfire/docs/pegboard/canon3_file_format--benja...
Date: Wed, 02 Apr 2003 09:03:31 -0500

CVSROOT:        /cvsroot/fenfire
Module name:    fenfire
Changes by:     Benja Fallenstein <address@hidden>      03/04/02 09:03:31

Modified files:
        docs/pegboard/canon3_file_format--benja: peg.rst 

Log message:
        Resolve issues

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/fenfire/fenfire/docs/pegboard/canon3_file_format--benja/peg.rst.diff?tr1=1.1&tr2=1.2&r1=text&r2=text

Patches:
Index: fenfire/docs/pegboard/canon3_file_format--benja/peg.rst
diff -u fenfire/docs/pegboard/canon3_file_format--benja/peg.rst:1.1 
fenfire/docs/pegboard/canon3_file_format--benja/peg.rst:1.2
--- fenfire/docs/pegboard/canon3_file_format--benja/peg.rst:1.1 Tue Apr  1 
14:46:07 2003
+++ fenfire/docs/pegboard/canon3_file_format--benja/peg.rst     Wed Apr  2 
09:03:30 2003
@@ -4,8 +4,8 @@
 
 :Author:       Benja Fallenstein
 :Date:         2003-04-01
-:Revision:     $Revision: 1.1 $
-:Last-Modified: $Date: 2003/04/01 19:46:07 $
+:Revision:     $Revision: 1.2 $
+:Last-Modified: $Date: 2003/04/02 14:03:30 $
 :Type:         Architecture
 :Scope:                Major
 :Status:       Current
@@ -21,6 +21,89 @@
 This PEG specifies such a format.
 
 
+Issues
+======
+
+- Does this also cover bags and sequences? Reification?
+
+   RESOLVED: Of course. All RDF structures (anything
+   that can be serialized as triples) can be
+   represented as Canon3.
+
+- Do we really need a new format?
+
+   RESOLVED: None of the existing formats are canonical.
+
+- How compatible is this with N3 and NTriples?
+  What are the differences?
+
+   RESOLVED: NTriples is encoded in US-ASCII and
+   doesn't allow for multi-line literals. N3 cannot
+   refer to anonymous nodes. An N3 processor
+   will be able to read any Canon3 file that does
+   not contain anonymous nodes (except if the
+   Unicode LINE SEPARATOR character is used,
+   which is not allowed by N3).
+
+   (Anonymous nodes in Canon3 are represented
+   as in NTriples.)
+   
+- Should the encoding allowed to be different?
+  
+   RESOLVED: No, since that would lose both
+   canonicality and compatibility with N3.
+
+- Is UTF-8 always sufficient?
+
+   RESOLVED: UTF-8 can represent all of Unicode and
+   RDF uses Unicode only; therefore, yes.
+
+- Is quoting with three quotes really what we want?
+
+   RESOLVED: Multiline literals is really what
+   we want-- imagine you have a 1K HTML document
+   as a literal and the encoder puts it all
+   in one line. (Also, with multiline literals,
+   CVS's diffs are more useful.)
+
+   Multiline literals are enclosed in three quotes in N3.
+
+- Does the specification need to talk about equal triples
+  occuring in the same graph? Can the same triple
+  occur twice, according to the RDF spec?
+
+   RESOLVED: There are tools which allow a single triple
+   to occur twice. Therefore, the spec should be clear
+   about the topic.
+
+- Why `Normalization Form C`_?
+
+   RESOLVED: Because it's required by N3, and because
+   it's the standard on the Web (http://www.w3.org/TR/charmod/).
+
+- Does it allow for the different newline conventions?
+
+   RESOLVED: Yes. (Normalization Form C only specifies that
+   composite characters like umlauts are stored in
+   composited, not decomposited form. See the spec.)
+
+- Wouldn't it be easier to produce the serialization format 
+  for each triple, and then put those into lexical order? 
+  Or if the parts must be compared 
+  separately, could we compare serializations of those parts?
+
+   RESOLVED: We assume that a Canon3 writer usually operates 
+   on an in-memory representation of an RDF graph. That
+   makes it easy to sort triples in unencoded, and hard
+   to sort them in the encoded way. It's also more scalable:
+   Sorting on the serializations would mean having to
+   generate the whole serialization in memory first,
+   before writing anything to the disk.
+
+   (Also note that simply sorting the *lines*
+   wouldn't work anyway, because of multiline literals.)
+
+
 Specification
 =============
 
@@ -39,13 +122,35 @@
     URItoken ::= "<" URIref ">"
     anonNode ::= "_:" [A-Za-z][A-Za-z0-9]*
     literal ::= #x22 #x22 #x22 string #x22 #x22 #x22 qualifiers
-    qualifiers ::= ("@" language)? ("^^" URItoken)?
+    qualifiers ::= ("@" language)? ("^^" type)?
+    type ::= URItoken
 
-The ``NEWLINE`` token may be any of CR, LF, and CRLF.
-(This is necessary for CVS to be useful across platforms.)
+The ``NEWLINE`` token may be any of CR, LF, CRLF, and
+the Unicode LINE SEPARATOR (U+2028).
+This is necessary for CVS, to be useful across platforms.
 In contexts where the specific form used matters,
 the newline character is LF. (In particular, when computing
 a content hash-- e.g., when creating a Canon3 Storm block.)
+It would be nicer to use LINE SEPARATOR, but that
+would be an incompatibility with N3.
+
+A ``string`` is any UTF-8 character sequence 
+encoded in the following way:
+
+- Double any backslash in the string.
+- Insert a backslash before the first of any three
+  consecutive double quotes (#x22) in the string.
+  (This means: In a sequence of three or more
+  double quote characters, instert a backslash
+  before all but the last two double quotes).
+
+For example, the string ``f\oo"""""ba"r`` becomes
+``f\\oo\"\"\"""ba"r``.
+
+Strings may contain newlines. Like all of Canon3,
+they are encoded in Normalization Form C.
+They are enclosed in triple double quotes
+(see production ``literal``).
 
 The triples must be ordered. Two triples are compared
 by comparing their subjects, properties, and objects
@@ -77,37 +182,30 @@
 equal triples in the graph to be serialized, this
 triple must occur only once in the serialization.
 
-``URIref`` is a URI reference as defined in [RFC 2396]. 
-Percent escapes (e.g. ``%2f``) should preferably
-be encoded in lower case. URIref may be either of the following:
-
-1. An absolute URI (e.g., ``http://example.org/``).
-2. An absolute URI plus a fragment identifier
-   (e.g., ``http://example.org/#foo``).
-3. The empty URI reference (which is a relative URI
-   refering to the current document).
-4. A standalone fragment identifier (e.g., ``#foo``),
-   refering to a fragment of the current document.
+``URIref`` is one of the following:
 
-``language`` is a Language-Tag as defined by [RFC 3066].
+1. An `RDF URI reference`_ encoded in UTF-8 (Normalization
+   Form C) as the rest of Canon3.
+2. An RDF URI reference with everything before the
+   fragment identifier (if any) omitted. This refers
+   to the current document (in the case of the empty
+   string) or to a fragment of it (e.g., ``#foo``).
 
-A ``string`` is any UTF-8 character sequence 
-encoded in the following way:
-
-- Double any backslash in the string.
-- Insert a backslash before the first of any three
-  consecutive double quotes (#x22) in the string.
-  (This means: In a sequence of three or more
-  double quote characters, instert a backslash
-  before all but the last two double quotes).
+``language`` is a Language-Tag as defined by [RFC 3066].
+If present, ``language`` and ``type`` indicate
+the `language tag and data type`_ of a literal.
 
-For example, the string ``f\oo"""""ba"r`` becomes
-``f\\oo\"\"\"""ba"r``.
+Here's an example Canon3 file::
 
-Strings may contain newlines. Like all of Canon3,
-they are encoded in Normalization Form C.
-They are enclosed in triple double quotes
-(see production ``literal``).
+    # Canon3 <http://fenfire.org/2003/Canon3/1.0/>
+    <> <http://example.org/isa> <http://example.org/document>.
+    <> <http://example.org/name> """Foobar
+    An example Canon3 "document\""""@en.
+    <> <http://example.org/name> """Foobar
+    Ein Beispiel eines Canon3-"Dokumentes\""""@de.
+    <#Foo> <http://example.org/name> """Foo fragment identifier""".
+    <http://example.org> <urn:x-foo:related> <urn:x-foo:rittlefricks>.
+    <http://example.org> <urn:x-files:rating> 
"""7"""^^<http://www.w3.org/2001/XMLSchema#int>.
 
 We will register a MIME type for Canon3.
 
@@ -117,4 +215,6 @@
 .. _Normalization Form C: http://www.unicode.org/unicode/reports/tr15/
 .. _NTriples: http://www.w3.org/TR/rdf-testcases/#ntriples
 .. _Notation 3: http://www.w3.org/DesignIssues/Notation3.html
-
+.. _RDF URI reference: http://www.w3.org/TR/rdf-concepts/#section-Graph-URIref
+.. _language tag and data type: 
+   http://www.w3.org/TR/rdf-concepts/#section-Literals
\ No newline at end of file




reply via email to

[Prev in Thread] Current Thread [Next in Thread]