[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] [EXTERNAL] Re: Units Module Question

From: Felix Höfling
Subject: Re: [h5md-user] [EXTERNAL] Re: Units Module Question
Date: Tue, 13 May 2014 10:06:56 +0200
User-agent: Opera Mail/12.16 (Linux)

Am 12.05.2014, 19:06 Uhr, schrieb Hart, David Blaine <address@hidden>:

Am 10.05.2014, 19:10 Uhr, schrieb Pierre de Buyl

> On Sat, May 10, 2014 at 02:46:04PM +0200, Konrad Hinsen wrote:
>> Hart, David Blaine writes:
>>  > I want to convert “Kcal mol-1 Å-1” into “J mol-1 m-1”, which leads
>> me to want to  > use the unit string “4.184 10+10 J mol-1 m-1”. But
>> I’m not sure that’s valid. But  > is seems weird that it would be
>> invalid since there are so many numeric conversion  > factors that
>> are multiplied by a factor of ten, like “1.602e-19 C”
>> written as
>>  > “1.602 10-19 C”.
>>  >
>>  > This isn’t a big deal, since the SI prefixes can usually make it
>> so that the  > decimal only has to be shifted one or two places in
>> the numeric factor, but I  > thought I’d mention it.
>> How about “4.184e10 J mol-1 m-1” ?
> I did think of that but this possibility is only implicit in the
> specification (both Mosaic and H5MD). It seems however logical to
> accept the scientific notation for the numeric factor.

An explicit syntax grammar would be very helpful here. I had one
interpretation of how a unit string should look like and I thought that
there is no doubt about it. But now I realise that many other
interperations are possible:

The first (optional) part is a number in non-scientific notation (integer
or decimal fraction—what is the decimal sign? "." in English, "," in
German). Thus "4.184e10" is excluded although it seems very natural.

It is unclear from the wording of the spec whether the number can be
followed by an exponent, e.g., whether 4.184-10 would be allowed
(evaluating to pow(4.184, -10)).

It is probably intended that the string "1.602 10-19 C" is valid. On first
reading, however, I thought it has 2 numeric factors and is not covered.
It seems that the second part is a "numeric (unit) factor", to be
distinguished from the leading "number".

I actually think you are right, and that the string "1.602 10-19 C" is not valid. I would take this reading because it is a "unit" and not a conversion factor, which means I was misusing the "unit" metadata when I was trying to give a conversion factor in my units. But it would still make sense to have a single numeric unit factor, especially a power of ten, as it is descriptive of the units. It would then make sense not to allow scientific notation, since that should probably apply to the data itself, while the units module only defines the unit, not a conversion factor which should be applied to the data first, if so desired. I'm not sure if this was the intent with Mosaic and the Units module, but it would make sense to me, now.

Conversion factors were explicitly included in our considerations (e.g., "60 s"). It is the way the udunits2 library defines non-SI units, see the XML files there.

I spent a lot of time reading the English version of the SI units brochure last Friday, and I finally understood what they meant by "coherent units". For example, the "coherent units" for dynamic viscosity is "Pa s". The brochure says that it is acceptable to use the SI prefixes, but that means they are no longer "coherent units" -- i.e., using "kPa s" means that it is no longer clear that you are talking about viscosity, but could be talking about something completely different, but using "10+3 Pa s" specifies the magnitude of the measurements while keeping the coherent SI units.

I don't really understand what the purpose of such "coherent units" should be and why even SI prefixes are bad. Is it to avoid non-sense like "kPa ms"?

At the end of the day H5MD should reflect what people actually need. For example, a lot of sensible rheology research uses centipoise, 1 cP = "10-3 Pa s" or "mPa s", even if the numbers in front of the unit are large. If I got it right, this is a "coherent" unit? Anyway, it is a valid string for the H5MD units module.

If  "1.602 10-19 C" would be valid, then the reading of the spec would
also allow for  "10-19 C 1.602" (because nothing is said about the
position of the "number").

To avoid this kind of confusion I suggest to either include "10" as the
only possible numeric unit symbol. (What about 2, positive integers?) Or
we call "1.602 10-19" the (leading) number and make its format explicit.

@David: in the course of the development of the units module, there was
also support for non-SI units which were dropped later on. I believe
because there was no concensus of what should be included. Your use case
involving "kcal" would probably be covered more naturally by such an
extension of the units module, see the udunits2 library:

I realized this as I was reading the SI Brochure, and it seems the units module has already accounted for this by defining the "system" metadata within the /h5md/modules/units/ group. If I wanted to propose a, for example, "CGS-ESU" system, that would be easily defined as a different "system". And if I want to convert my energy from Kcal to J, I should multiply through the 4.184 J / Kcal conversion factor on my data first, then set the unit to "J", rather than set the unit to "4.184 J", since that isn't a unit, it's an equation. :-)

You find such definitions in the Git history of h5md. I've always had in mind that the conversion factor is part of the unit string. This allows one to internally use non-SI units but express (not convert) them in SI units. The file format shall not enforce any kind of data conversion (in particular if it is lossy as for floating-point multiplication).

Thanks, and sorry for muddying the waters on what the "unit" was.


Stirring up the waters is actually helpful to point at possible deficiencies of the specification.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]