|
From: | Felix Höfling |
Subject: | Re: [h5md-user] units module |
Date: | Mon, 04 Nov 2013 10:32:36 +0100 |
User-agent: | Opera Mail/12.15 (Linux) |
Am 01.11.2013, 16:13 Uhr, schrieb Peter Colberg <address@hidden>:
Hi Felix, hi all, On Thu, Oct 31, 2013 at 05:17:18PM +0100, Felix Höfling wrote:I made an effort to write down a specification for the units module to make progress. I took up Pierre's suggestion and added a list of units inspired by Mosaic and udunits2.Thank you for working on the units module!For the encoding, can we just go with UTF8, instead of both ASCII and UTF8?The issue with encodings is that HDF5 does not support implicit datatype conversion between ASCII and UTF8. So the reader needs to specify the correct encoding when reading a "unit" attribute, which is addressed in commit c065ace by the module attribute "encoding". Since UTF8 is a superset of ASCII (characters 0-127), the only thing a C or Fortran writer has to do to use UTF8 encoding is call H5Tset_cset on the datatype, e.g., hid_t dtype = H5Tcopy(H5T_C_S1); H5Tset_size(dtype, H5T_VARIABLE); H5Tset_cset(dtype, H5T_CSET_UTF8); In Python the string needs to be in UTF8 encoding: dataset.attrs["unit"] = u"nm" Peter
Hi Peter, The field was mainly thought for the reader: a minimal reader may want to process only ASCII strings and thus ignore the units if not in ASCII. I thought making a promise at the beginning would simplify things instead of checking the encoding for each string read. Writing UTF8 is easy as you pointed out, what about reading? I've never used it in practice. Can a reader store the raw string in char* and pass it to the udunits2 library? If this work with either encoding we may drop the "encoding" field of course. Before we add something to the specificiation we should test it somehow. What about providing a code snippet in the implementation part of how to read UTF8 unit strings and how to interact with, e.g., udunits2? Best regards, Felix
[Prev in Thread] | Current Thread | [Next in Thread] |