h5md-user
[Top][All Lists]

## Re: [h5md-user] tentative of synthesis: parameters, box size, particle n

 From: Konrad Hinsen Subject: Re: [h5md-user] tentative of synthesis: parameters, box size, particle number Date: Wed, 12 Oct 2011 15:31:52 +0200

```On 5 Oct, 2011, at 10:16 , Felix Höfling wrote:

> I think the above scheme naturally includes triclinic boxes. What is then
> the benefit of the attribute "kind"? If it is set to "cubic", one could
> use scalars instead of matrices for the edge value. But such an approach
> could easily result in an endless distinction of cases for H5MD readers.
> On the other hand, knowing that the box is cuboid/orthorhombic would
> simplify the computation of, e.g., the box volume. (In the present scheme,
> one needs to compute the determinant of the edges matrix—which is not hard
> if an algorithm like numpy.linalg.det is available.)

For me the important difference between a cubic and a triclinic box is that a
cubic box is guaranteed to be cubic at all times, whereas a general triclinic
box may happen to be cubic at some instant and then change. That's why it is
not sufficient to say "store the general lattice vectors for a triclinic box,
and let the reader check the geometry to see if it is cubic". The reader would
have to do that check for all time steps. So yes, I consider it important to
store information about "cubicity" somehow, even if the box size and shape is
then always stored fully (three lattice vectors).

> If we find it really necessary to provide additional meta information as
> box kind or time dependence, I would prefer a Boolean scheme that allows
> to combines these features independently (time_dependence: true/false,
> orthogonal: true/false, internal_symmetries: true/false).

Some things are naturally boolean (e.g. time dependence), others aren't (e.g.
box shape). I'd prefer not to be dogmatic about how metadata is stored.

On 6 Oct, 2011, at 11:13 , Pierre de Buyl wrote:

> Ok, I realized only recently (this week) that the set of symmetry
> transformations is not to be applied to the coordinates but that
> it is used to specify that one can take replicas of the box.
>
> I fear that including it right now would be premature. Is there
> experience by anyone of using that kind of box information?

I don't use it personally, but many people in my field do, either for crystal
simulations or for truncated octahedron simulations.

But I think this might be a good occasion to consider how future extensions
might be handled, because I agree that we can't define everything that needs to
be defined immediately if we ever want to get a first version ready. A feature
that I think every non-trivial file format should have is modularity. More
specifically, it should be possible to add information in such a way that
programs not aware of that addition can safely pretend it's not there.

It would be perfectly fine with me to leave stuff like symmetry transforms out
of the first release, but make sure that programs who need it can put it
somewhere in a safe way. Successful extensions can then after a while be

>> A file should always, in my opinion, contain the observables group.
>> I go on after point 3.
>>
> This is to be discussed. I prefer a separate paramters group which is
> mandatory, the trajectory or observables group may be present or not,
> independently of each other. A H5MD file may also used as input of a
> simulation, then the trajectory group makes perfectly sense while
> 'observables' should contain the outcome of the simulation. And there are
> other parameters like space dimension that would not fit well into the
> observables group.
>
> Konrad and Peter, what do you think?

I am always for modularity: don't make anything compulsory unless it is
required for the correct interpretation of all kinds of files you expect to
handle. I agree that it makes sense to store just the input for a simulation,
and in that case there is no point in having output-related groups.

> I'm not sure whether H5MD shall cover all types of simulations. I prefer
> to stick to particle-based methods.

Me too, at least initially. But it should be possible to store H5MD data plus
other data (perhaps following other data models) in the same file. Which is why
I think that H5MD should not define the structure of an HDF5 file, but the
structure of an HDF5 group, which would be labelled as H5MD-conforming by some
metadata attribute. In most cases that group would be a file's root group, but
that may well change one day.

As a motivation for allowing multiple data models in a single file, here's a
proposal to store complete scientific papers, with data and executable code, in
a single HDF5 file:

http://dirac.cnrs-orleans.fr/plone/software/activepapers

Such approaches are only possible if each data item accepts to be only one
inhabitant among others in a file.

--
---------------------------------------------------------------------
Centre de Biophysique Moléculaire, CNRS Orléans
Synchrotron Soleil - Division Expériences
Saint Aubin - BP 48
91192 Gif sur Yvette Cedex, France
Tel. +33-1 69 35 97 15
E-Mail: research AT khinsen DOT fastmail DOT net
---------------------------------------------------------------------

```