help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: why are there [v e c t o r s] in Lisp?


From: Pascal J. Bourguignon
Subject: Re: why are there [v e c t o r s] in Lisp?
Date: Fri, 16 Oct 2015 03:51:42 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

Emanuel Berg <embe8573@student.uu.se> writes:

> Why is there a special syntax for vectors? 

To make it easy to introduce literal vectors in your programs.

Without this syntax, you would only have run-time constructors, and you
would have to write more complex code.

For example, the equivalent of:

    (defun permut-0213 (x)
       (aref [0 2 1 3] x))

would have to be written as:

   (defun permut-0213 (x)
      (aref (load-time-value (vector 0 2 1 3)) x))

or:
    
   (let ((permut (vector 0 2 1 3)))
     (defun permut-0213 (x)
        (aref permut x)))


> In linear algebra, an n-dimensional vector is a sequence of n numbers,
> and collectively they make for something that has direction and
> magnitude (in particular, it doesn't have a position). But whatever
> the math, isn't that (a sequence of numbers) something that the lists
> of Lisp can handle just as well, or actually better, as it will be
> more generic (a lot of stuff that doesn't work on "real" Lisp vectors
> will work on vectors that are also lists). And using lists doesn't
> mean nobody cannot write hundreds of "math vector" specific stuff to
> modify those list vectors! Right?

Right.  And why have strings too!


> Have a look:
>
>     (vectorp [1 2 3]) ; t
>
>     (vectorp '(1 2 3)) ; nil - really?

    (vectorp "123")       ; --> nil - why? :-( 

    (sequencep [1 2 3])  ; --> t
    (sequencep '(1 2 3)) ; --> t
    (sequencep "123")    ; --> t

In Common Lisp, strings are vectors are sequence.
And in CL, vectors are arrays, which may be multi-dimensional, but
there's no such thing in emacs lisp.


The reason why there is a vector data type distinct from the list data
type is purely an optimization reason.  In ACL2, for example, there are
only "lists". https://en.wikipedia.org/wiki/ACL2



In lisp, there are no lists.  There's no list abstract data type.  There
are only cons cells, which are used to build linked lists of cons cells.

Here, a cons cell is represented with a box split in two parts, the car
and the cdr:

    +---+---+
    | * | * |-->
    +---+---+
      |
      v


Therefore a list such as (1 2 3 4 5) will be actually a cons cell:
 (1 . (2 . (3 . (4 . (5 . nil))))), which can be drawn as:

+-----------------------------------------------------------+
| (1 2 3 4 5)                                               |
|                                                           |
| +---+---+   +---+---+   +---+---+   +---+---+   +---+---+ |
| | * | * |-->| * | * |-->| * | * |-->| * | * |-->| * |NIL| |
| +---+---+   +---+---+   +---+---+   +---+---+   +---+---+ |
|   |           |           |           |           |       |
|   v           v           v           v           v       |
| +---+       +---+       +---+       +---+       +---+     |
| | 1 |       | 2 |       | 3 |       | 4 |       | 5 |     |
| +---+       +---+       +---+       +---+       +---+     |
+-----------------------------------------------------------+

The problem is that to find the ith element in the list, we have to walk
i cons cells:

    (defun elt (s i)
       (etypecase s
         (null nil)
         (cons (if (zerop i)
                  (car s)
                  (elt (cdr s) (1- i))))))

elt is therefore O(n) for lists of length n, which is rather slow for
such a basic access.



The solution is to use internally an indexed data structure: contiguous
memory blocks, for which we can compute the address in O(1):


+-----------------------------------------------------------+
| [1 2 3 4 5]                                               |
|                                                           |
| +-----+-----+-----+-----+-----+                           |                   
       
| |  *  |  *  |  *  |  *  |  *  |                           |                   
      
| +-----+-----+-----+-----+-----+                           |                   
      
|    |     |     |     |     |                              |                   
   
|    v     v     v     v     v                              |
|  +---+ +---+ +---+ +---+ +---+                            |
|  | 1 | | 2 | | 3 | | 4 | | 5 |                            |
|  +---+ +---+ +---+ +---+ +---+                            |
+-----------------------------------------------------------+


To find the address of the ith slot, we only have to multiply i by the
size of the slots, and add the address of the start of the vector:

(defun elt (s i)
  (etypecase s
    (null nil)
    (cons (if (zerop i)
              (car s)
              (elt (cdr s) (1- i))))
    (vector (%deref (%address (+ (%address-of s)
                                 (* i (%size-of-element-of s))))))))

All the operations performed for the vector case are O(1), therefore elt
for vectors is O(1), which is much faster than for list, notably when
the index is not small.

Furthermore, notice that this kind of vector addressing often benefit
from addressing modes implemented in the processor, so you don't really
have to compile it from lisp; the implementation will have a native
vector indexing operator that will translate to a single inline
processor instruction:

;; in CCL:
cl-user> (disassemble (lambda (s i)
                        (declare (optimize (speed 3) (space 3) (safety 0) 
(debug 0)))
                        (declare (type simple-vector s) (type fixnum i))
                        (svref s i)))
L0
    (leaq (@ (:^ L0) (% rip)) (% fn))       ;     [0]
    (pushq (% rbp))                         ;     [7]
    (movq (% rsp) (% rbp))                  ;     [8]
    (pushq (% arg_y))                       ;    [11]
    (pushq (% arg_z))                       ;    [12]
    (movq (@ -5 (% arg_y) (% arg_z)) (% arg_z)) ;    [13]
    (leaveq)                                ;    [18]
    (retq)                                  ;    [19]
nil
cl-user> 

Notice how the access to the vector slot is reduced to the single ix86
instruction:

    (movq   (@ -5 (% arg_y) (% arg_z))    (% arg_z))

(The offset -5 is introduced to remove the type tag in the address of
the vector).




Now, of course, (car l) will be faster than (aref v 0)
and (cadr l) might also be faster than (aref v 1), but already, (cadr l)
requires two memory accesses (read the cdr of l, read the car of that
cdr), while (aref v 1) only needs one memory access (the movq above), 
so even if movq with it's implied multiply and add will probably be
faster than a memory access, more so nowadays when processor (register)
speed have become much faster than memory accesses.



Therefore the lesson is that if you have to use elt (nth) on a list, in
a loop, you definitely do not want to do that if the list is more than
one or two elements.


Instead, either process the elements in the list in order using car/cdr,
or use a vector for random access.




In the case of emacs lisp, the difference is hidden in the virtual
machine, you would have to look at the C code:

(disassemble (lambda (s i) (aref s i)))
byte code:
  args: (s i)
0       varref    s
1       varref    i
2       aref      
3       return    

(disassemble (lambda (l i) (nth i l)))
byte code:
  args: (l i)
0       varref    i
1       varref    l
2       nth       
3       return    

-- 
__Pascal Bourguignon__                 http://www.informatimago.com/
“The factory of the future will have only two employees, a man and a
dog. The man will be there to feed the dog. The dog will be there to
keep the man from touching the equipment.” -- Carl Bass CEO Autodesk


reply via email to

[Prev in Thread] Current Thread [Next in Thread]