[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [h5md-user] box data as part of trajectory/position
From: |
Felix Höfling |
Subject: |
Re: [h5md-user] box data as part of trajectory/position |
Date: |
Wed, 12 Sep 2012 10:53:41 +0200 |
User-agent: |
Opera Mail/12.01 (Linux) |
Hi Pierre,
Am 12.09.2012, 09:52 Uhr, schrieb Pierre de Buyl
<address@hidden>:
Hi Felix,
On Mon, Sep 10, 2012 at 09:14:55AM +0200, Felix Höfling wrote:
I thought about the box again since I feel not really comfortable with
the
current specification. I find it a bit awkward that the observables
group
must be present if a file contains trajectory data only. Further, the
box
information is only needed in conjuction with position data. If only
velocities are stored (for some reason), the box is not needed. And the
maybe strongest point last: for time-dependent boxes, there shall be a
simple way to retrieve the corresponding box size for a given entry in
the
position time series. (Currently, the box may be stored at different
intervals than the positions).
My suggestion is to link the box much tighter to the position data. The
box group in observables may still be present and can be realised by
appropriate hard links. The following suggestion ensures that the box
data
are available within each position group consistently using the same
time
grid as the position data:
trajectory
\-- group1
...
One open point: how can we efficiently store the information for a fixed
box size (which is a pretty widespread case)? If the edges and offset
datasets contain always the same entries, they may pack well, but they
have to be unpacked for accessing any data point. An alternative would
be
to indicate the non-changing box size transparently, e.g., by an
additional attribute and different dataset extents (with fixed size).
trajectory
\-- group1
| \-- position
| | \-- value
| | \-- step
| | \-- time
| \-- box
| +-- type
| \-- edges [D][D]
| \-- offset [D]
(Note that the extents of edges depend on the box type, either [D]
or [D][D].)
I prefer to turn your suggestion around, if you don't mind: keep the
data in
observables, with the option to link from the trajectory groups if
needed.
The thing that I think you would like to avoid is to carry "observables"
even
though all you want is a trajectory (with box information indeed). On
the other
hand, if one wants to find the box information, it is in
"/trajectory/groupname/..." where "groupname" depends on the file...
Even if the
data is linked, this seems more cumbersome to me. The specification of
several
boxes seems to me to be a more of an exceptional event.
My suggestions is less cumbersome than you describe. The box is mostly
relevant for the interpretation of position data, and then all information
is contained in "/trajectory/group/position" without resorting to a
different root group. The position data is exceptional in this respect due
to the typically used periodic boundaries.
If the box information itself is needed, I agree. It should not be deduced
from some trajectory group. Therefore I suggested to keep it in
observables as it is. But not every information needs to be stored. If the
box is not in observables, a H5MD reader refuses retrieving it from the
file (although it could by looking up some strange trajectory group).
Please consider the following example as a reason to keep that data in
observables. In the case of a varying volume simulation, one may want to
keep
only the thermodynamical observables: energy, temperature, ..., box
size. That
is: all "order 1 in storage" information as opposed to "order N"
information
(particle information).
Finally, your scheme is compatible with the current draft as "additional
data"
is not illegal for H5MD, while the reverse would not be true (missing
data in
observables).
I would like to make /observables/box and thus /observables non-mandatory.
At the same time, my suggestion makes the box information mandatory if
position data are present (but stored in trajectory/group/position).
So far, the only mandatory root group should be /h5md. I though about
providing the space dimension explicitly as attribute in /parameters (or
/h5md). It is cumbersome to deduce it from data set extensions of, e.g.,
box/offset.
For your application, all you need to do is providing links from
observables/box to the position data. On the writer's side, this is not
much overhead, while the reader has to access only a single subgroup
(.../position) and file format itself becomes more flexible.
As far as the time correspondance is concerned, in my mind this could be
done
as: the box information is stored only when it changes so that what
would be
looking for is the maximum time in "/observables/box/edges/time" that is
lower
than or equal to the requested time. That or require that to each
timestep in
the trajectory matches one in the box information.
I have concerns that "than or equal to a time/step" can be implemented
efficiently. For example, how would you do so using h5py? numpy.where is
an option, but inefficient (it requires the whole time series of box to be
read in, the comparison is done for each access to a position item).
My suggestion works by indexing, which is simple and highly efficient.
Now, for the fixed in time issue. From the current draft:
"""
For all box kinds, if the data for edges,offset is stored as a single
dataset,
it is considered fixed in time. Else, it should comply to the step, time
and
value organization.
"""
I think that this is good. It is simple to parse and does not involve
extra
attributes.
I overlooked this passage. Am I correct when reading it as either for the
static case
observables
\-- box
+-- type
\-- edges [D]
\-- offset [D]
or as for the fluctuating box:
observables
\-- box
+-- type
\-- edges
\-- step [var]
\-- time [var]
\-- value [var][D]
\-- offset
\-- step [var]
\-- time [var]
\-- value [var][D]
Shall we make the static case explicit in the draft as well?
Cheers,
Felix