[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Gawk and non-ASCII characters
From: |
John Cowan |
Subject: |
Re: Gawk and non-ASCII characters |
Date: |
Sat, 16 Oct 2010 11:29:09 -0400 |
User-agent: |
Mutt/1.5.18 (2008-05-17) |
Charles Kozierok scripsit:
> I am grabbing HTML code from a site that has some non-ASCII codes in
> it. Specifically, the code is "C2 A0". This shows up in ANSI as a
> capital "A" with a circumflex on top followed by a space. In ASCII it
> becomes a regular "A" followed by a space.
What it is, is a non-breaking space ( ) encoded in UTF-8.
> I need to be able to properly identify these so I can get rid of them,
If you actually want to get rid of them, use "iconv -f UTF-8 -t ASCII".
Alternatively, leave them alone and switch to working in UTF-8. Notepad
can handle it, and so can many third-party editors.
--
What is the sound of Perl? Is it not the John Cowan
sound of a [Ww]all that people have stopped address@hidden
banging their head against? --Larry http://www.ccil.org/~cowan
- Gawk and non-ASCII characters, Charles Kozierok, 2010/10/16
- Re: Gawk and non-ASCII characters, Eli Zaretskii, 2010/10/16
- Re: Gawk and non-ASCII characters, Charles Kozierok, 2010/10/16
- Re: Gawk and non-ASCII characters, Eli Zaretskii, 2010/10/16
- Re: Gawk and non-ASCII characters, Charles Kozierok, 2010/10/16
- Re: Gawk and non-ASCII characters, Eli Zaretskii, 2010/10/16
- Re: Gawk and non-ASCII characters, Charles Kozierok, 2010/10/16
- Re: Gawk and non-ASCII characters, Eli Zaretskii, 2010/10/16
- Re: Gawk and non-ASCII characters, Charles Kozierok, 2010/10/16
Re: Gawk and non-ASCII characters,
John Cowan <=