h5md-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] box and observables


From: Felix Höfling
Subject: Re: [h5md-user] box and observables
Date: Mon, 23 Sep 2013 10:27:28 +0200
User-agent: Opera Mail/12.15 (Linux)

Am 23.09.2013, 09:47 Uhr, schrieb Konrad Hinsen <address@hidden>:

Peter Colberg writes:

 > On Fri, Sep 20, 2013 at 01:01:14PM +0200, Konrad Hinsen wrote:
> > My understanding is that the "box" information in each subgroup under > > "particles" can be different. Otherwise, why have it in each subgroup?
 > > But if it can be different, then which one should be linked to
 > > "observables"?
 >
 > That would be a new interpretation.
 >
 > H5MD started with a single box group, located at /particles/box.

OK, so the idea is that there is a single dataset or time series with
box information for the whole trajectory. I didn't see this clearly
stated anywhere.

 > Then it was noted that the special box group is misplaced in
 > /particles, next to the subsystem groups.

The main inconvenience is that it prevents any subsystem from being
called "box". That's not much of restriction semantically, but it
requires all software to treat "box" differently from any other group
name.

 > So box was moved and replicated to the subsystem groups. Then the
 > discussion about the observables group came up, and it was
 > replicated to observables…

At least it would have been consistent to move it to the subsystem
groups there as well.

 > Why don't we just move the box to the H5MD root?

Fine with me. If it's meant to be the one and only box information for
the trajectory, it might as well be at the root level.


A single box group for the whole file raises some practical issues,
however. Suppose I have two subsystems (in my typical case "protein"
and "solvent") sampled at different time steps: the protein very
frequently, the solvent much less so. I thought that each subsystem
could have its own box data with the same sampling, which makes
reading both together quite straightforward.

With a single box group, finding the right box step to go with a
specific subsystem position step can be very expensive, since H5MD
makes no guarantees about matching information in different subgroups.
It is reasonable to assume that data with the same step number goes
together, of course, but that step number can be in very different
positions in the data array. H5MD doesn't even guarantee that step
numbers are monotonically increasing, so finding the box step for a
given position step could at worst require reading the complete box
time series.

Konrad

Wow, there has been a lot of activity on the list during my vacation! Konrad, welcome back to the discussion. I appreciate your contributions very much, in particular the protein point of view. It makes H5MD relevant for a much larger simulation community.

The reason that there are several box groups is primarily that different subgroups have different sampling intervals. For trajectory subgroups, the writer has to output the box along with the particle data on the same time grid. The reader takes the box within the respective subgroup, all other box information is irrelevant. From the point of view of a single subgroup, this is easy, the time grids are congruent, and the information is complete.

From the outer perspective the seemingly scattered box information may appear confusing and inconsistent. But the rule is simply that the "closest" box information is relevant. For this reason (and since the /particle root group may be absent), the box is also stored in /observables where it is considered a physical observable (NPT simulations), not just an appendage to some real data. Assuming that there is only one simulation box, the box is not stored inside each particle subgroup but at the main level. Actually, this latter point deviates from the structure in /particles and may be debated.

Your point on finding the matching step in the box data is very valid. (I didn't even think of non-monotonic steps, searching monotonic steps is worse enough.) To ensure a simple matching via dataset indices, the H5MD spec says "A specific requirement for box groups inside particles is that the step and time datasets exactly match those of the corresponding position groups."

Best regards,

Felix



reply via email to

[Prev in Thread] Current Thread [Next in Thread]