Re: simple first emacs script

help-gnu-emacs
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: simple first emacs script

From:	Pascal J. Bourguignon
Subject:	Re: simple first emacs script
Date:	Wed, 15 Dec 2010 19:37:00 +0100
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)
Tom <tom@somewhere.com> writes:

> Wow Pascal that is quite an amazing response thanks.
>
> You introduced several new things I don't know about so I can't
> comment on them until I go away and learn them but in response to what
> I do understand.
>
> Yes the indentation was destroyed by newsreader, but thanks for
> pointing me to paredit as I was finding managing parenthesis a pain.
>
>> The (require 'csv-mode) form would be better placed on the toplevel
>> (ie. above the defun form).
>
> I don't get this.  If I understand you correctly you are suggesting
> something like this:
> (require 'csv-mode)
> (defun ...
>           )
>
> If I do this then wont the require mode cease to be part of the
> functions definition.  Normally it would not be required to set the
> mode the csv as the file extension would be .csv and csv-mode is
> called automatically, but the raw files I receive have random
> extensions - I suppose I could rename them all to overcome this but it
> seemed simpler to tell the function to go into csv mode otherwise it
> tries to process the file in fundamental mode.

require only loads a library (if it is not already loaded).  Setting a
mode for a buffer would be done by calling the specific mode command,
which may be defined in a library:  (csv-mode)

(By the way, these commands, as well as a lot of rarely commands, may be
not loaded initially, but defined as autoloaded functions, that will
load the library that contains their real definition automatically the
first time they're called.  For example, if you start emacs again,
before loading csv-mode.el, the documentation of the command csv-mode
(C-x f csv-mode RET) is:

    csv-mode is an interactive autoloaded Lisp function in `csv-mode.el'.

    [Arg list not available until function definition is loaded.]

    Major mode for editing comma-separated value files.
)

Loading csv-mode with (require 'csv-mode) doesn't change the mode of the
buffer, and this doesn't prevent csv-kill-fields to work, even if the
mode of the buffer is not csv-mode.  In general, modes only establish
the key bindings and font-lock keywords to help editing a specific kind
of text, but all the commands are applyable in all the modes.  There may
be some special modes that do some behind the scene processing
(eg. building data structures in parallel to the buffer, defining
buffer-local variables) that would be required by some of their commands
for them to work, but it's rather rare. 


But, I would advise to avoid changing the mode of the buffer in a
command such as nirs-data-clean.  If you want to open the .R14 files in
the CSV mode, you can do it by adding an entry to the auto-mode-alist
variable (C-h v auto-mode-alist RET) in ~/.emacs:

    (push '("\\.R14$" . csv-mode) auto-mode-alist)

However, the R14 example file you gave is not a CSV formated file. 
See below.


>> Instead of push-mark (I don't see the matching pop-mark), you might use
>> the save-excursion macro.
>
> I did actually start with save-excursion but I have no interest in
> saving the point the mark, the whole point of pushing the mark and
> moving the point to the start of the buffer was to specify the region
> arguments in csv-kill-fields, i.e.
> (csv-kill-fields '(4 ...) (point) (mark))
>
> I guess this might be more logically done with
> (csv-kill-fields '(4 ...) (point-min) (point-max))
> would that be considered better form?

Yes, I noticed later your use of the (point) and (mark).  But indeed, if
you follow the documentation of push-mark, you'll see in the
documentation of set-mark that this mechanism is reserved to the user
interactive use, and that commands should avoid tampering with it.

At first, I kept (point-min) and (point-max) in local variables start
and end, but since I added the replacement of spaces by comma, this
changed the size of the buffer, and therefore the value of (point-max).
Therefore I called these functions everytime.

If you want to memorize a buffer position in the course of editing that
may change its absolute position (insertion and deletions), you may use
markers (see the functions make-marker and set-marker), but markers need
to be 'freed' explicitely by reseting them to nil, so they're less
convenient to use than just calling (point-max) again (but to keep a
position in the middle of the buffer they'd be the right mechanism to
use).


>> Ah, if you read the documentation of csv-kill-fields,  you will see that
>> it depends on the right setting of the variable csv-separators to know
>> what separator to use.  By default I have it set to a comma.  So you
>> want to bind this variable in your function:
>>
>>      (let ((csv-separators '(" ")))
>>        (csv-kill-fields ...))
>> Note however that it is a single character string, and that your fields
>> are separated by several.  When I try it, it fails with csv-kill-fields
>> complaining about the number of columns.  It is probably better to use
>> commas to separate the fields, ...
>
> I have read the documentation (that doesn't mean I understood it
> though).  I have the csv separators specified in my .emacs file (it
> seem it will accept both " " and "," so csv-modes seems to read my
> files correctly.  

Well, I'm not sure if it's a good idea to have both in the
csv-separators list.  One would have to check how csv functions deal
with csv-separators.  For example, I wonder what would happen if a
record contained both separators:

     data item,other data, item

It this ("data" "item,other" "data," "item")
or      ("data item"  "other data" " item")
or      ("data" "item" "other" "data" "item") 
?



> But I guess it is logical to specify these in the
> function in case I run it on a computer without these specified.  I
> don't seem to get problems  with csv-kill-fields complaining about
> number of columns but maybe I have just worked through it with trial
> and error and no real understanding.

I was surprised by this result too, given my reading of the
documentation of csv-kill-fields.



> Your final script seems more solid than mine, as in it behaves in the
> same way on either iteration even after undoing.  It doesn't seem to
> work perfectly with across all channel options in some of the files I
> ran it one, but there are lots of ideas in there (such as temporarily
> using commas) for me to incorporate into my script so thanks again.

Perhaps the problem comes from the fact that the files don't look like
csv files really.  They seem to have fixed-width columns, filled with
spaces.  In a csv file, if the separator character is present
consecutively, that would mean that there is an empty field in between.
Aligning data with a variable number of spaces is therefore
incompatible.

Perhaps some of your files have fields with spaces in the middle, or
empty fields.  Then simply replacing sequences of spaces by comma to
make it csv (or have csv function interpret the space as a field
separator) will make the csv function interpret incorrectly the fields.

I would advise to check the specifications of the file format, and
perhaps use a different code to convert it to csv.  For example,
assuming we have just records of fixed-width fields.

(defun spacep (ch)  (= ch ?\ )) ; one space character.


(let ((one-record "08.11.10 14:57:17  67    0   4      -2.9254      -2.3866  0  
 0  72    0   4      -3.3003      -2.7971  0   0  63    0   4      -2.8989      
-2.2108  0   0  75    0   4      -3.6963      -3.3294  0   0  AB0912040885-0  
AB0912040757-0  AB0912040628-0  AB0912040780-0"))
 
  (loop ; let's detect a data -> space transition
    with data = nil
    with fields = '()
    for pos from 0
    for ch across one-record
    do (if data
         (when (spacep ch)
           (setf data nil)
           (push pos fields))
         (unless (spacep ch)
           (setf data t)))
    finally (return (cons 0 (reverse (cons (length one-record) fields))))))

--> (0 8 17 21 26 30 43 56 59 63 67 72 76 89 102 105 109 113 118 122 135 148 
151 155 159 164 168 181 194 197 201 217 233 249 265)


So you could now split the record in fields, remove the spaces, and
concatenate it back into a csv record:




(defvar *r14-field-positions* 
        '(0 8 17 21 26 30 43 56 59 63 67 72 76 89 102 105 109 113 118
          122 135 148 151 155 159 164 168 181 194 197 201 217 233 249
          265)) 

(defun csvify-r14-record (record)
  (unsplit-string 
     (mapcar
        (lambda (field) ; if the field contains a comma, 
                        ; it needs to be quoted.
           (if (find ?, field)
              (concat "\"" 
                      (replace-regexp-in-string  "\"" "\\\""  field)
                      "\"")
              field))
        (loop
          for (start end) on *r14-field-positions*
          while end
          collect (string-trim " " (subseq record start end))))
      ","))



(let ((one-record "08.11.10 14:57:17  67    0   4      -2.9254      -2.3866  0  
 0  72    0   4      -3.3003      -2.7971  0   0  63    0   4      -2.8989      
-2.2108  0   0  75    0   4      -3.6963      -3.3294  0   0  AB0912040885-0  
AB0912040757-0  AB0912040628-0  AB0912040780-0"))

   (csvify-r14-record one-record))
--> 
"08.11.10,14:57:17,67,0,4,-2.9254,-2.3866,0,0,72,0,4,-3.3003,-2.7971,0,0,63,0,4,-2.8989,-2.2108,0,0,75,0,4,-3.6963,-3.3294,0,0,AB0912040885-0,AB0912040757-0,AB0912040628-0,AB0912040780-0"


So now we only have to call this function on each line of the buffer:


(defun csvify-r14-buffer ()
  (interactive)
  (dolines (start-line end-line)
     (let ((new-record (csvify-r14-record (buffer-substring start-line 
end-line))))
       (delete-region start-line end-line)
       (insert new-record))))



With the following functions and macro (from my personal library):


(defun string-trim (character-bag string-designator)
  "Common-Lisp: returns a substring of string, with all characters in \
character-bag stripped off the beginning and end.
"
  (unless (sequencep character-bag)
    (signal 'type-error  "Expected a sequence for `character-bag'."))
  (let* ((string (string* string-designator))
         (margin (format "[%s]*" (regexp-quote
                                  (if (stringp character-bag)
                                      character-bag
                                      (map 'string 'identity character-bag)))))
         (trimer (format "\\`%s\\(\\(.\\|\n\\)*?\\)%s\\'" margin margin)))
    (replace-regexp-in-string  trimer "\\1" string)))


(defun unsplit-string (string-list &rest separator)
  "Does the inverse than split-string. If no separator is provided 
then a simple space is used."
  (if (null separator)
      (setq separator " ")
      (if (= 1 (length separator))
          (setq separator (car separator))
          (error "unsplit-string: Too many separator arguments.")))
  (if (not (char-or-string-p separator))
      (error "unsplit-string: separator must be a string or a char."))
  (apply 'concat (list-insert-separator string-list separator)))
       

(defmacro* with-marker ((var position) &body body)
  (let ((vposition (gensym))) ; so (eq var position) still works.
    `(let* ((,vposition ,position)
            (,var (make-marker)))
       (set-marker ,var ,vposition)
       (unwind-protect (progn ,@body)
         (set-marker ,var nil)))))


(defmacro* dolines (start-end &body body)
  "Executes the body with start-var and end-var bound to the start \
and the end of each lines of the current buffer in turn."
  (let ((vline (gensym)))
    (destructuring-bind (start-var end-var) start-end
      `(let ((sm (make-marker))
             (em (make-marker)))
         (unwind-protect
              (progn
                (goto-char (point-min))
                (while (< (point) (point-max))
                  (let ((,vline (point)))
                    (set-marker sm (point))
                    (set-marker em (progn (end-of-line) (point)))
                    (let ((,start-var  (marker-position sm))
                          (,end-var    (marker-position em)))
                      ,@body)
                    (goto-char ,vline)
                    (forward-line 1))))
           (set-marker sm nil)
           (set-marker em nil))
         nil))))





So instead of the replace-regexp, you can use (csvify-r14-buffer).
At the end, you didn't say what resulting file format you wanted. You
could remove the last replace-regexp, and keep the result in csv
format, keep it, and have fields containing commas be left quoted (but
you don't seem to have such data anyways), or write a more sophisticated
command to format csv records into whatever format you want.


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
A bad day in () is better than a good day in {}.
[Prev in Thread]
Current Thread
[Next in Thread]
simple first emacs script, Tom, 2010/12/15
- Re: simple first emacs script, Pascal J. Bourguignon, 2010/12/15
  - Re: simple first emacs script, Tom, 2010/12/15
    - Re: simple first emacs script, Pascal J. Bourguignon <=
    - Re: simple first emacs script, Tom, 2010/12/16
    - Re: simple first emacs script, Stefan Monnier, 2010/12/16
    - Re: simple first emacs script, Thien-Thi Nguyen, 2010/12/18
Prev by Date: Re: Lexical binding and macros.
Next by Date: Re: random predicate function
Previous by thread: Re: simple first emacs script
Next by thread: Re: simple first emacs script
Index(es):
- Date
- Thread