gzz-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz] PEG: utf8


From: Tuomas Lukka
Subject: [Gzz] PEG: utf8
Date: Wed, 14 May 2003 11:12:21 +0300
User-agent: Mutt/1.4.1i

=============================================================
PEG src_utf8--tjl: UTF8 as our global encoding in all sources
=============================================================

:Author:   Tuomas J. Lukka
:Last-Modified: $Date: 2003/05/14 08:08:11 $
:Revision: $Revision: 1.1 $
:Status:   Incomplete

We are having lots of issues with docutils because we use
Latin-1 encoding in our files to write e.g. "Jyväskylä".
We're forward-looking in most of the other stuff we do.
I suggest that we do the same in this matter: We should
do the right thing and never look back.

I propose that we agree that the days of Latin-1 are
past and move everything we do to UTF-8. 


Issues
======

- Will this be a problem with email? E.g. posting PEGs...

    RESOLVED: Maybe, but not an important one. It's
    easy enough to read the few garbled symbols if there are any,
    and the important thing is that in CVS, things will work.

    This OTOH gives some incentive to start thinking about UTF-8 
    mail.

- Isn't UTF8 difficult to edit?

    RESOLVED: No, not any more.  Both emacs and vim support it.
    It's steadily gaining ground.

- Can we use UTF-8 with TeX? If not, what do we do?

    RESOLVED: Doesn't seem to be possible, but we can use
    the TeX escapes::

        \"a, \"o ...

    to handle this without breaking the high-bit rule.
    Besides, our use of TeX directly is on the way out.

- Are you serious about using smiley faces or other special
  unicode characters in identifiers?

    RESOLVED: Yes, occasionally, if they can help. However,
    much care is needed; never choose a character that looks
    like some other one. For instance, 2133 (SCRIPT CAPITAL M)
    is useless here. 

Changes
=======

In all ff subprojects, convert all files containing high-bit
characters (e.g. ä,ö) to UTF-8 encoding. (Including PEGs
like this one)

Explain this in README, along with instructions for the most
popular editors on how what to do.

Start using smiley faces as characters in Java identifiers ;)

Create a grep script which sniffs out Latin-1 ä, ö, Ä, Ö from new files.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]