[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [gnu.org #1363250] ASCII maintain.txt is no longer ASCII
From: |
John Darrington |
Subject: |
Re: [gnu.org #1363250] ASCII maintain.txt is no longer ASCII |
Date: |
Wed, 27 Feb 2019 08:29:43 +0100 |
User-agent: |
NeoMutt/20170113 (1.7.2) |
Just before we go off at too many tangents, a bit of background info for
this discussion.
* ASCII is a well defined standard, and all ASCII is UTF-8 (but the
converse is not true).
* The command iconv -f UTF-8 -t ASCII file will fail unless all the
characters in file are already ASCII. Hence it isn't a very useful
command.
* The coding standards say that we should prefer ASCII wherever
possible. If it is not possible, then we should use UTF-8.
I think that Therese is saying that there are some files which are using
UTF-8 when ASCII would have sufficed.
J'
On Tue, Feb 26, 2019 at 12:06:56PM -0500, Alfred M. Szmidt wrote:
> I have noticed that maintain.txt and maintain.info[1] are no longer
in
> ASCII, but in UTF-8. In particular they contain lots of easily
avoidable
> UTF-8 quoting characters (single and double quotes) that break
> displaying them in non-UTF-8 terminals. This is a pity because the
main
> use of such simple formats is to be displayed in simple terminals.
I'm not sure what is the definition of "ASCII" here, are you talking
about "printable" characters? In that case, the Info format has
always contained non-printable/non-ASCII characters, most notably #o37
for section splitting, the "#o0 #10 [" sequence for images, etc. So
these files have never been very readable on "simple text terminals"
(what do you mean by that more exactly? VT100 dumb terminal?).
For the text files, I think it still makes more sense to use UTF-8,
the default locale these days on GNU/Linux is UTF-8, and many of the
command line tools will output UTF-8 style quoting characters if that
is so.
Could you run your files through iconv and convert them from UTF-8 to
ASCII? Maybe,
iconv -f UTF-8 -t ASCII file...
> Given that there is just one letter out of the ASCII range in
> maintain.{txt,info} (the '??' in 'risqu??'), could it be possible to
keep
> these files as pure ASCII? Thanks.
990 matches in 490 lines for "[^[:ascii:]]" in buffer: maintain.txt
988 matches in 489 lines for "[^[:ascii:]]" in buffer: maintain.info
These are mostly quotes, but you have bullets and copyright, em-dashes
as well.
--
Avoid eavesdropping. Send strong encrypted email.
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.