[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Axiom-developer] No progress on notangle
From: |
Waldek Hebisch |
Subject: |
Re: [Axiom-developer] No progress on notangle |
Date: |
Sun, 18 Feb 2007 05:43:22 +0100 (CET) |
> I have tried to use read-sequence to load a file quickly, which works
> fine, but I'm afraid I haven't done too well with the problem of
> working with the resulting string. The code so far is actually slower
> than the earlier version, and from what I can tell the primary issue
> seems to be too many subseq operations on the big one-pamphlet string.
> I tried leaving the string intact and just identifying positions rather
> than chopping it off as I scanned, but the searches got progressively
> more expensive even though I was supplying new start positions.
> There's probably a better way to do it or an obvious mistake in the way
> I have done it, but I'm afraid I've been staring at it a bit too long.
Fast way to scan strings is to use finite automaton. The code below
shows how this can be done in Lisp.
Notes:
- This is a proof of concept (probably buggy) code, in particular I
only extract the main chunk (do not handle rescannig for other
chunks). Rescannig can be done by similar code, which simulatneously
could handle @-escapes ("@<<" etc). If escapes are not supported one
could do with a single scan, extending automaton to remember positions
of embedded tags. Alternative way to have single scan is to
copy content to new array during scan handling escapes during
copy.
- AFAICS this code is faster then Debian (pretty fast) notangle.
I belive that after adding needed features it can still be
fast.
- I use (unsigned-byte 8) as a type because Lisp implementation
may choose to have Unicode character and perfrom complicated
recoding during input and output. Also, Unicode characters
are likely to require more storage (so more time for memory
access and for garbage collection).
- The main function 'scan-for-chunks' is a rat nest of gotos.
But that is correct because finite automaton _is_ a rat nest
of transition and one of the simplest ways to implement
transitions is to use gotos. One can make it shorter by
using apropritate macros. Ideally the automaton code should
be mechanically generated -- writing in C I would use flex
(the task is trivial for flex, but no so trivial by hand).
- For ultimate speed pamplets should be read directly by Spad
scanner. This would avoid duplicate work. Namely, Spad have
to scan the file anyway. Recoginzing noweb markers adds
only little complexity to scanner and (if implemented via
finite automaton) almost no execution time. Spad performs
macro expansion and the same mechanizm can substitute chunks
if they respect pile rules. If one really wants unstructured
chunks one can handle them redirection parser input (like
include files).
- Re-doing Spad scanning/parsing is on my todo list. But ATM
parsing works no worse than the rest of compiler and while
quite slow it is still one of the fastest parts in the compiler.
So this task has very low priority.
Code follows:
;;; Usage:
;;; (untangle "input.file.name" "output.file.name")
;;;
(defconstant start-tag-code1 (char-code #\<))
(defconstant start-tag-code2 (char-code #\<))
(defconstant end-tag-code-1 (char-code #\>))
(defconstant end-tag-code-2 (char-code #\>))
(defconstant end-tag-code-3 (char-code #\=))
(defconstant chunk-end-code (char-code #\@))
(defconstant newline-code 10)
(defconstant space-code (char-code #\ ))
(defun read-file (f)
(let* ((b-len (file-length f))
(buff (make-array (list (+ b-len 1))
:element-type '(unsigned-byte 8))))
(read-sequence buff f)
(setf (aref buff b-len) newline-code)
buff))
(defun read-named-file (name)
(with-open-file (f name :element-type '(unsigned-byte 8))
(read-file f)))
(defclass chunk ()
((chunk-name
:initarg :chunk-name
:initform (error "Must supply a chunk name.")
:reader chunk-name
:documentation "Name of chunk.")
(chunk-contents
:initarg :chunk-contents
:initform '()
:accessor chunk-contents
:documentation "Text of chunk - may include references to other chunks")))
(defparameter *chunk-hash-table* (make-hash-table :test 'equalp))
(defun add-to-chunk-contents (name content)
(let ((chunk (gethash name *chunk-hash-table*)))
(if (not chunk)
(setf chunk (setf (gethash name *chunk-hash-table*)
(make-instance 'chunk :chunk-name name))))
(push (chunk-contents chunk) content)))
(defun view-all-chunks ()
(maphash #'(lambda (k v) (format t "~a => ~S~&" k v)) *chunk-hash-table*))
(defun print-chunk-name (name line-number)
(format t "Chunk start in line ~A: <<" line-number)
(dotimes (i (array-dimension name 0))
(format t "~A" (code-char (aref name i))))
(format t ">>~&"))
(defun scan-for-chunks (buff)
(prog ((pos -1)
(end-buff (- (array-dimension buff 0) 1))
(start-pos)
(code)
(chunk-name)
(line-number 0))
normal-start
(incf pos)
(incf line-number)
(if (>= pos end-buff)
(return-from scan-for-chunks))
(setf code (aref buff pos))
(if (equal code start-tag-code1)
(go chunk-start-tag-1))
(if (equal code newline-code)
(go normal-start))
(go normal)
normal
(incf pos)
(setf code (aref buff pos))
(if (equal code newline-code)
(go normal-start))
(go normal)
chunk-start-tag-1
(incf pos)
(setf code (aref buff pos))
(if (equal code start-tag-code2)
(progn
(setf start-pos (+ pos 1))
(format t "start-pos: ~A, char: ~A~&" start-pos
(aref buff start-pos))
(go in-chunk-start-tag)))
(if (equal code newline-code)
(go normal-start))
(go normal)
in-chunk-start-tag
(incf pos)
(setf code (aref buff pos))
(if (equal code end-tag-code-1)
(go in-chunk-start-tag-3))
(if (equal code newline-code)
(go normal-start))
(go in-chunk-start-tag)
in-chunk-start-tag-3
(incf pos)
(setf code (aref buff pos))
(if (equal code end-tag-code-2)
(go in-chunk-start-tag-4))
(if (equal code newline-code)
(go normal-start))
(go normal)
in-chunk-start-tag-4
(incf pos)
(setf code (aref buff pos))
(if (equal code end-tag-code-3)
(progn
(setf chunk-name (subseq buff start-pos (- pos 2)))
(go in-chunk-start-tag-trailing)))
(if (equal code newline-code)
(go normal-start))
(go normal)
in-chunk-start-tag-trailing
(incf pos)
(setf code (aref buff pos))
(if (equal code newline-code)
(progn
(print-chunk-name chunk-name line-number)
(setf chunk-start-pos (+ pos 1))
(go in-chunk)))
(if (equal code space-code)
(go in-chunk-start-tag-trailing))
(go normal)
chunk-start
(setf chunk-end-pos pos)
(incf pos)
(if (>= pos end-buff)
(break "unexpected end of file"))
(setf code (aref buff pos))
(incf line-number)
(if (equal code newline-code)
(go chunk-start))
(if (equal code chunk-end-code)
(progn
(add-to-chunk-contents
chunk-name
(list chunk-start-pos chunk-end-pos))
(go chunk-end-trailing)))
(go in-chunk)
chunk-end-trailing
(incf pos)
(setf code (aref buff pos))
(if (equal code newline-code)
(go normal-start))
(if (equal code space-code)
(go chunk-end-trailing))
(format t "pos: ~A, code: ~A~&" pos code)
(break "garbage after end of chunk marker")
in-chunk
(incf pos)
(setf code (aref buff pos))
(if (equal code newline-code)
(go chunk-start))
(go in-chunk)))
(defun untangle (in-file out-file)
(let ((buff (read-named-file in-file)) (chunk))
(scan-for-chunks buff)
(setf chunk (gethash (make-array (list 1)
:element-type '(unsigned-byte 8)
:initial-element (char-code #\*))
*chunk-hash-table*))
(if (not chunk)
(break "No main chunk"))
(with-open-file (of out-file :direction :output
:element-type '(unsigned-byte 8))
(dolist (seg (reverse (chunk-contents chunk)))
(write-sequence buff of :start (nth 0 seg)
:end (nth 1 seg))))))
--
Waldek Hebisch
address@hidden
- [Axiom-developer] No progress on notangle :-(, C Y, 2007/02/17
- [Axiom-developer] Minor cleanup on working version, C Y, 2007/02/17
- Re: [Axiom-developer] No progress on notangle,
Waldek Hebisch <=
- Re: [Axiom-developer] No progress on notangle, Waldek Hebisch, 2007/02/18
- Re: [Axiom-developer] No progress on notangle, Ralf Hemmecke, 2007/02/18
- Re: [Axiom-developer] No progress on notangle, Waldek Hebisch, 2007/02/18
- Re: [Axiom-developer] No progress on notangle, Ralf Hemmecke, 2007/02/18
- Re: [Axiom-developer] No progress on notangle, C Y, 2007/02/18
- Re: [Axiom-developer] No progress on notangle, Ralf Hemmecke, 2007/02/18