h5md-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] Make "time" optional?


From: Pierre de Buyl
Subject: Re: [h5md-user] Make "time" optional?
Date: Tue, 15 Jul 2014 14:33:51 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

On Tue, Jul 15, 2014 at 12:15:49PM +0200, Konrad Hinsen wrote:
> Felix Höfling writes:
> 
>  > We have had some discussion on the time dataset in the context of Monte
>  > Carlo simulations. If I remember well the outcome was that in the case of
>  > no physical time, time is simply identical (=linked) to step.
> 
> That's not an option in H5MD 1.0.0 because "step" must be an integer and
> "time" a float.
> 
>  > Our intentation was to have no optional parts in the core H5MD
>  > element---for the sake of making reading simple.
> 
> The whole "particles" and "observables" groups are optional!
> 
>  > Whether such a decision was wise or not, I don't know. But it has
>  > been fixed now for H5MD 1.x. Making step or time optional would
>  > break compatibility with 1.0 and would make 1.0 basically
>  > obsolete.
> 
> Right. But sooner or later, that will happen. As people start using
> H5MD for more and more applications, weaknesses in the definition will
> appear and need to be fixed in a new version. Otherwise people will
> simply bend the rules and create somewhat non-conforming files. That's
> what has happened with the PDB format.
> 
> I think it would be useful to collect feedback from H5MD users and
> compile a list of recommendations: How should I represent ??? in H5MD?
> Then, after a while, see which solutions are not really good ones,
> and take them into account in a revision of H5MD.

This is a very good point! A wiki was discussed some time ago with not much
success (which I understand, I am not myself a fan). We need HEP, H5MD
Enhancements proposals :-)

Solutions:
- Use a repository of strictly formatted documents, like the PEP index
  http://legacy.python.org/dev/peps/ (this could go in the main repo).
- Use a wiki (this could go on the github h5md group).
- Use the discussion page http://nongnu.org/h5md/discussion.html 
  Felix already posted the constant time step increment there actually. In
  september (thanks git blame).

>  > Thus I don't think it is a good idea. Nevertheless, I am
>  > open to extend the interpretation of step/time (but the fields must
>  > be present). For example, step could also just numerate the
>  > snapshots stored, without reference to any simulation order.
> 
> I have less problems with "step" than with "time", although I do see
> situations where, like Olaf described, the step values are made up and
> meaningless. But at least numbering steps from 1 to N doesn't create
> any false illusions about what information is available. Making up
> time values is worse because it suggests to the reader that there is
> some meaningful time-like quantity in the simulation.


> Pierre de Buyl writes:
> 
>  > In a more general idea about step/time, I have an idea since a long time. I
>  > didn't want it for H5MD 1.0 to avoid any confusion. But storing step and 
> time
>  > when step is simply step[i] = STEP_SIZE*i and time[i] = STEP_SIZE*DT*i is 
> a bit
>  > of a waste. We could define a proper setup for regularly sampled data, for 
> which
>  > step[0], STEP_SIZE, time[0] and DT should be given.
> 
> Good idea, and not just to avoid wasting space. It would also contain
> the message to the reader "this is regularly sampled data". For some
> analyses this makes a big difference. For example, computing time
> correlation functions of regularly sampled data is straightforward and
> efficient, whereas it is cumbersome, slow, and imprecise for irregular
> time series.

This is just good practice, actually!

> Right now, the only way to check if a time series is regular is to
> check all the time labels. However, these are floats and thus subject
> to round-off error. I'll bet that in practice, analysis software will
> simply assume the time series to be equally spaced and not bother to
> check. I'll also bet that sooner or later this will lead to wrong
> results being published.

All the points you mentioned (abuse of the format, rule-bending) will cause
trouble if not handled properly.

This all might lead to H5MD 2.0 arriving sooner than expected but it is much
preferable to have that than to have H5MD fade into oblivion! Depending on the
backward-compatible character of the changes (as mentioned by Peter) it might be
either 1.1 or 2.0. As Peter wrote, nothing to worry about but we should indeed
respect the versioning that we've chosen.

P



reply via email to

[Prev in Thread] Current Thread [Next in Thread]