help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: how to scan file for non-ascii chars (eg cut-n-paste from ms-word)


From: harven
Subject: Re: how to scan file for non-ascii chars (eg cut-n-paste from ms-word)
Date: Tue, 18 Jan 2011 22:32:23 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)

dkcombs@panix.com (David Combs) writes:

> FURTHER, and more importantly, how do I *search* for
> one of these funny things, a left-double-quote, say?
> It's so *easy* to just hit C-s "!

You can go to the next non-ascii character using
C-M-s [^[:ascii:]] RET
Repeating C-s after that will recurse through the non-ascii characters.

> You mean do a query-replace on each non-ascii char?  How do I 
> even know which ones are even *in* some buffer of text?

You can use the next command to list all characters in the buffer together
with their frequencies. The non-ascii one should appear at the end.

(defun frequency ()
"Compute the frequencies for each character in the buffer.
 The result appears in another buffer called *frequency*"
(interactive)
(save-excursion
  (goto-char (point-min))
  (let ((freq (make-hash-table :test 'equal)))
    (while (re-search-forward "." nil t)
      (puthash (match-string 0)
        (1+  (gethash (match-string 0) freq 0))
               freq))
    (pop-to-buffer "*frequency*")
    (erase-buffer)
    (maphash
     '(lambda (key value)
        (insert key "  " (number-to-string value) "\n"))
     freq))
  (sort-numeric-fields -1 (point-min) (point-max))
  (reverse-region (point-min) (point-max))
  (other-window 1)))

>
> What'd be nice is something that went through the whole
> buffer *once*, doing the "right thing" with each
> non-ascii char.
>
> Do I make any sense?  Or do I not really understand?

Yes it makes sense.

Have a look at iso-cvt.el. This package provides commands to handle iso8859-1
characters. You can find there a function called iso-translate-conventions. 
This function translates character according to a translation table. I am not
aware of a table giving an ascii translation for all utf-8 characters, so you
will have to make your own, along the lines of

(defvar my-iso-trans-tab
  '(("à" "a")
    ("é" "e")
    ("ß" "s")
    ("ñ" "~n"))
  "Translation table for translating some character to ascii.
   This table is not exhaustive.")

Then, assuming you have executed iso-translate-conventions from iso-cvt.el,
use the following command to translate the selected region.

(defun my-iso-all2ascii (from to &optional buffer)
 "Translate to ascii characters.
Translate the region between FROM and TO using the table
`my-iso-trans-tab'.
Optional arg BUFFER is ignored (for use in `format-alist')."
 (interactive "*r")
 (iso-translate-conventions from to my-iso-trans-tab))

Hope that helps



reply via email to

[Prev in Thread] Current Thread [Next in Thread]