help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to convert .doc to plain text ascii in emacs.


From: Thomas Persson
Subject: Re: How to convert .doc to plain text ascii in emacs.
Date: Sun, 02 May 2004 21:26:45 +0200
User-agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (gnu/linux)

gebser@speakeasy.net writes:

> Thanks very much.  Your elisp works great.  There's one glitch (which I
> realize is from antiword):
>
> The three characters "\342\200\231" should be replaced by the single 
> apostrophe character (').

The fact that antiword and my code leaves you with a buffer containing
numerical codes instead of the characters themselves is your first
problem. This doesn't happen for me at all. It's either a problem with
antiword or a problem with how emacs displays characters. Try running
antiword from the command line to figure out which.

> To do this by hand, I did M-x replace-regexp Return C-q 342 Return
> C-q 200 Return C-q 231 Return Return ' Return
>
> but this does not find the intended string.  The problem seems to be 
> that C-q 342 is immediately (in the minibuffer) converted into an 'a' 
> with a grave symbol over it.  Putting the point on the backslash (\) 
> preceding the 342 in the antiword-converted buffer and doing "C-u C-x =" 
> indeed shows this a-with-grave character to be (0342, 226, 0xe2).
>
> To create a simple test case, do the following:
>
> Open an empty *scratch* buffer.  Enter into it: C-q 342 Return C-q 200
> Return C-q 231 Return.  The first character that appears is the 
> a-with-grave; the second and third characters appear properly as 
> \200\231.  
>
> It is, I think, the failure of C-q 342 to be represented as \342 which 
> is the problem.  What is the solution?

The fact that you have a problem with replacing the numerical
character codes with the characters themselves is however definitely a
emacs related problem. As far as I can tell it would work to add the
replace-regexp business to the end of the antiword-buffer function
like this:


(defun antiword-buffer ()
  "Takes the current buffer as input to the external program antiword.

If the current buffer is a ms-word document it's contents are replaced
with the output from antiword and the extension `.doc' is replaced
with `.txt' in the buffer-file-name."
  (let ((txt-buffer-file-name (concat (substring (buffer-file-name) 0 -4)
                                      ".txt")))
    (shell-command-on-region (point-min) (point-max)
                             "cat | antiword -" nil t nil)
    (undo-start)
    (if (equal (buffer-string) "- is not a Word Document.\n")
        (or (undo-more 1)
            (message "%s - is not a Word Document."(current-buffer)))
      (set-visited-file-name txt-buffer-file-name)
      (not-modified)
      (replace-regexp "\342\200\231" "'"))))

;; The following expression makes sure that antiword-buffer is run when a
;; file with the .doc extension is opened.
(setq auto-mode-alist
      (append '(("\\.doc\\'" . antiword-buffer))
              auto-mode-alist))


If that doesn't work then perhaps "wvWare" or "undoc.el" ,as previous
posters have suggested, might be better solutions for you.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]