[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Character encoding again.
From: |
John Darrington |
Subject: |
Re: Character encoding again. |
Date: |
Sun, 31 Oct 2010 12:44:43 +0000 |
User-agent: |
Mutt/1.5.18 (2008-05-17) |
On Sat, Oct 30, 2010 at 10:01:37PM -0700, Ben Pfaff wrote:
Character set names and their aliases are listed by IANA:
http://www.iana.org/assignments/character-sets
Wouldn't it have been nice if SPSS had used IANA MIB numbers instead of
these "codepage" numbers whose definition is so elusive!
> Moreover, there are a lot of SPSS data files which I have seen
> which have this "character_code" set to 2, yet contain data
> which are clearly not 7 bit ascii.
It was only a few SPSS versions back that SPSS appeared to start
putting values other than 2 into that field, and there are still
many older SPSS system files on the web. I guess that we will
have to either guess the encoding or depend on the user to tell
us the encoding for these files.
Based on what users have reported, SPSS treats character_code 2 as windows-1252
(even on non-windows OSes).
and here's the current output:
Your table seems to be the most comphrehensive I've seen yet. I suggest we'll
have to hash it with gperf or something. Codepage numbers which we cannot
resolve, I
suppose we'll have to devise some fallback heuristic. As for encodings which
we cannot
find a codepage number for, then we could just convert everything to UTF8.
J'
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.
signature.asc
Description: Digital signature