[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Detecting if a file is binary
From: |
tomas |
Subject: |
Re: Detecting if a file is binary |
Date: |
Tue, 24 Nov 2009 18:42:04 +0100 |
User-agent: |
Mutt/1.5.15+20070412 (2007-04-11) |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Tue, Nov 24, 2009 at 07:23:34AM -0800, Nordlöw wrote:
> Is there a way in emacs-lisp code to detect if a file binary, that is
> it does *not* contain a correct multi-character coding.
> Or can every possible combination of bytes always be correctly decoded
> by some character coding?
Yes, it can. For all one-byte encodings of the iso-8859-x family, each
byte represents a valid code point, for example. In utf-8 there are byte
sequences which can't (shouldn't) happen.
I think the only way to gain some confidence is by statistical analysis
of the text.
Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFLDBrsBcgs9XrR2kYRAvtOAJ9wJZ1Q9oTHX7rJUCb/0G3IhbzzKwCfaqBt
2ZZsjoR0Skn0QwptSPQVH1A=
=/HfN
-----END PGP SIGNATURE-----