We might rely on good old 'file' for character set detection. Manpage:
"If a file does not match any of the entries in the magic file, it is examined
to see if it seems to be a text file. ASCII, ISO-8859-x, non-ISO 8-bit
extended-ASCII character sets (such as those used on Macintosh and IBM PC
systems), UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC character
sets can be distinguished by the different ranges and sequences of bytes that
constitute printable text in each set. If a file passes any of these tests,
its character set is reported. ASCII, ISO-8859-x, UTF-8, and extended-ASCII
files are identified as ``text'' because they will be mostly readable on
nearly any terminal; UTF-16 and EBCDIC are only ``character data'' because,
while they contain text, it is text that will require translation before it
can be read. In addition, file will attempt to determine other characteris-
tics of text-type files. If the lines of a file are terminated by CR, CRLF,
or NEL, instead of the Unix-standard LF, this will be reported. Files that
contain embedded escape sequences or overstriking will also be identified."