gpsd-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gpsd-dev] Update on Python Version Compatibility


From: Fred Wright
Subject: Re: [gpsd-dev] Update on Python Version Compatibility
Date: Sun, 3 Apr 2016 13:16:26 -0700 (PDT)

On Fri, 1 Apr 2016, Eric S. Raymond wrote:
> Fred Wright <address@hidden>:
> > One complication on str vs. bytes is that the "latin-1 hack" doesn't
> > always work as well as it first appears.  In some contexts, a string is
> > assumed to have the default encoding, which is derived from the
> > environment.
>
> The Latin-1 hack has been extensively tested on about half a dozen
> serious applications (that I know of) at this point without springing
> a leak.
>
> Can you specify a context in which this happens so I can reproduce and
> examine the behavior?

I did some poking around at Python in general, and identified the problem.
It's not the overall default encoding that's the issue.  That remains
utf-8 regardless of the environment (and would choke on *some* arbitrary
binary data, anyway).  It's sys.stdout.encoding, which is derived from
LANG, and defaults to US-ASCII when LANG is unset (or empty).  This
behavior is the same in Python 2 and Python 3, but only Python 3 actually
pays attention to the encoding, so writing binary data to sys.stdout
manages to work in Python 2 but behaves in a LANG-dependent fashion (which
may or may not throw an exception) in Python 3.  Fortunately, there's also
a sys.stdout.buffer object, which accepts binary data (equivalent to
opening stdout as "wb").  It doesn't exist in Python 2, so a conditional
definition is needed to point at the proper destination for binary stdout,
but that's straightforward.

I've often thought that it was a bit kludgy to run binary data through
stdin/stdout streams, which were primarily intended to be terminal I/O
(sending binary data to an actual terminal is often a horrible mess), and
the advent of encoding awareness just makes this more so.  But as things
presently stand, gpsfake needs to be able to write binary data to stdout,
so it needs to work properly in that case.

Meanwhile, I came up with an improved approach to applying the "latin-1
hack".  The idea is that, instead of having conversion functions, one
defines a new 'strbytes' type, which is subclassed from 'bytes' but
overrides three methods so that conversions both to and from 'str' force
the 'latin-1' encoding.  Thus, this is actually a distinct data type which
more accurately (to the extent that 'latin-1' can be considered
"more accurately" :-)) reflects its content.

It would be even better if it could subclass *both* 'str' *and* 'bytes',
so that it could be used directly in contexts that expect 'str', but alas,
multiple inheritance from independent classes that use slots doesn't work.
So it's still necessary to wrap a str() around it for things that need a
'str'.

This is all for Python 3; for Python 2 'strbytes' is just a synonym for
'bytes', which is already a synonym for 'str', and no conversions are
needed.

I've already tried this out in a separate test module; now I just need to
apply it to the GPSD code.  Since it's needed by multiple modules, misc.py
seems like the appropriate place for it (unlike the present
polystr/polybytes, which are in client.py).

Fred Wright



reply via email to

[Prev in Thread] Current Thread [Next in Thread]