[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [h5md-user] units module
From: |
Peter Colberg |
Subject: |
Re: [h5md-user] units module |
Date: |
Tue, 5 Nov 2013 17:55:21 -0500 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Hi Felix,
On Mon, Nov 04, 2013 at 10:32:36AM +0100, Felix Höfling wrote:
> Writing UTF8 is easy as you pointed out, what about reading? I've never
> used it in practice. Can a reader store the raw string in char* and pass
> it to the udunits2 library? If this work with either encoding we may drop
> the "encoding" field of course.
UTF-8 is an encoding of the Unicde character set that uses one or
multiple bytes to represent a character. In C a UTF-8 encoded string
can be stored in a char array.
The HDF5 library does not handle encodings at all; the encoding
property for string datatypes is only an indication for the user.
One can store Unicode strings containing multiple-byte characters
using H5T_CSET_ASCII, and the HDF5 library does not complain.
The downside of this lack of encoding support is that the encoding of
the memory datatype specified when reading/writing a dataset/attribute
must match the encoding of the file datatype. Which is an unfortunate
design choice; e.g., reading an attribute with file datatype encoding
H5T_CSET_ASCII using memory datatype encoding H5T_CSET_UTF8 should
work, but it doesn't. One can register datatype conversion functions
as a band-aid, but that must be repeated for every application.
> Before we add something to the specificiation we should test it somehow.
> What about providing a code snippet in the implementation part of how to
> read UTF8 unit strings and how to interact with, e.g., udunits2?
Absolutely.
I would suggest adding an examples directory to the repository, and a
subdirectory for each library interface, e.g., HDF5 C, HDF5 Fortran,
and h5py. Sadly the HDF5 for LuaJIT module is not ready yet, otherwise
I would have written a few examples as well by now.
Regards,
Peter