[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] units module

From: Pierre de Buyl
Subject: Re: [h5md-user] units module
Date: Fri, 18 Oct 2013 09:47:03 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

On Thu, Oct 17, 2013 at 03:16:40PM +0200, Konrad Hinsen wrote:
> Pierre de Buyl writes:
>  > So, to get back to Peter's message:
>  > 
>  > I propose that we follow udunits grammar by restricting it similarly to 
> Mosaic.
>  > For reference, Mosaic's definition is
>  > """
>  > The value of the units field is a text string in ASCII encoding. It 
> contains a
>  > sequence of unit factors separated by a space. A unit factor is a unit 
> symbol
>  > optionally followed by a non-zero integer which indicates the power to 
> which
>  > this factor is taken.
>  > """
>  > 
>  > I would remove the constants defined ("c" and "Nav"), however.
> The current unit list is a first draft, to be revised before version
> 1.0 of Mosaic. You are completely right about "Nav", which is the same
> as "mol" and thus redundant. However, "c" frequently occurs in derived
> unit, e.g. "cm-1 c" for frequency, which is heavily used in
> spectroscopy.

In any case, this kind of constant would go in a module and not in the base

>  > We may want to add "a unit string must be parseable by udunits"?
> The problem with that statement is that we don't control udunits.  In
> general, it's not a good idea to define a data format by the
> capacities of a piece of software. It's fine to have such a comment as
> a statement of intention, of course.


> Felix Höfling writes:
>  > I find udunits' grouping into SI-base units, SI-derived units etc. very  
>  > reasonable. Let's keep it for H5MD rather than introducing a different  
>  > subset.
> That was my original idea for Mosaic, but I changed my mind for the
> following reasons:
> 1) The point of having a restricted set of units is to permit error
>    checking. Allowing a unit that is more likely to be a typo than
>    a choice is ultimately of no benefit. A general-purpose library
>    such as udunits can't limit the allowed units, but a domain-specific
>    format such as Mosaic can.
> 2) The distinction between SI-base and SI-derived is logical for a
>    metrologist, but irrelevant for practical use. I don't expect
>    SI-base to be sufficient for much of molecular data, if only
>    because of the lack of energy units.
> 3) Fewer units means a reduced risk of errors if automatic conversion
>    is attempted (see below).
>  > Actually, whether a reader can "understand" a small or large set of units  
>  > is mainly a matter of the database defining the units. Do I overlook  
>  > something here? Why not copying the full list from udunits?
> See 1) above.

Also, to get an idea of what's possible with udunits I had to play a bit.
Providing an explicit list seems simpler.

>  > BTW, a more advanced functionality that discriminates between "simple" and 
>  > "advanced" readers is automatic conversion between units ...
> Indeed, but conversion is a very tricky business. SI has two traps for
> unit converters:
>  - Dimensionless units: rad, sr, and mol
>    Is pi dimensionless or measured in rad? Both make sense, and automatic
>    conversion needs to know which convention was used.
>    I am actually considering to remove "rad" from the allowed units in
>    Mosaic, and make "deg" a dimensionless constant equal to 180/pi.
>    That's much closer to the reality of unit use in computational chemistry
>    than the SI system.
>  - Dimensionally equal but incompatible units: 1/s, Hz, Bq
>    It's OK to convert Hz and Bq to 1/s, but not among each other.
>    Converting 1/s to Hz or Bq is in general not allowed. The problem
>    disappears if Hz and Bq are not allowed.

Ok, so we need to settle on what can go into a unit.

(Most of this is copied from Mosaic, which means I should not forget to add a
license statement somewhere. BTW, Konrad, do you know if we can include your
CC-BY in our GPL "code"?)

"unit" is a scalar attribute of type variable length string. "unit" consists of
a sequence of unit factors separated by a space. A unit factor is either a
number (an integer or a decimal fraction) or a unit symbol optionally followed
by a non-zero integer which indicates the power to which this factor is taken. A
unit symbol may include a SI-prefix factor. 


  - "nm3" stands for cubic nanometers

  - "nm ps-1" stands for nanometers per picosecond

  - "60 s" stands for a minute

Each unit symbol may occur only once in the units field. There may also
be at most one numeric factor, which must be the first one.

"unit" may be encoded as ASCII or UTF8.

The list of available symbols, in the case where no "units" module is present,
is XXX.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]