[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] Variable-size particle groups

From: Pierre de Buyl
Subject: Re: [h5md-user] Variable-size particle groups
Date: Tue, 29 May 2012 20:24:43 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

On Tue, May 29, 2012 at 06:34:04AM -0400, Peter Colberg wrote:
> On Tue, May 29, 2012 at 11:19:44AM +0200, Olaf Lenz wrote:
> > After Peter's mailing, I have had a first thorough look at H5MD. From
> > our point of view, Peter describes a very valid point - the number of
> > particles in a simulation can vary. This is not only true in the case of
> > grand-canonical simulations, but also other state-of-the-art-schemes
> > have a varying particle number, e.g. the ADResS-scheme, where the level
> > of detail might vary in different regions. As an example, think of a
> > protein-water simulation, where the protein and the surrounding nm of
> > water is simulated on an atomistic level of detail, while the water
> > further away is simulated on a coarse-grained level with less
> > interaction sites per water molacule. I believe that this kind of
> > schemes will become more important in the future, so allowing to store
> > trajectories with varying particle number may become important.
> You are spot on with coarse graining. This is exactly what I intend to do.
> > On 05/29/2012 10:15 AM, Felix Höfling wrote:
> > >> H5MD implements an optional dataset “range” inside each
> > >> trajectory subgroup, next to the other datasets groups “step” and
> > >> “time”.
> > 
> > Besides making the format more complex, as Felix remarked, I believe
> > that forcing Peter's definition upon the format would also have major
> > impact upon parallel IO.
> > 
> > I think a relatively simple solution to avoid making the format more
> > complex while still allowing for varying particle number would be to
> > specify that if the subgroup "range" exists in a time-dependent dataset,
> > the subgroup "value" is to be interpreted in the way Peter described,
> > otherwise it uses the simple definition.
> Ok, then such a “range” dataset should be optional.
> The point of parallel I/O is very interesting: How would this be
> implemented in practice? To warn you, I have not used parallel
> HDF5 yet.
> I would assume that e.g. for a parallel MPI simulation, one would need
> a designated process to extend the “value”, “step”, “time” datasets
> on each time-step, after which all processes perform a write to the
> their slice of the newly appended region of the “value” dataset.
> Then adding a “range” dataset should not change this requirement.
> There would still be a single process to extend the datasets. The
> designated process would further communicate to the other processes
> the new range with regard to “value”, after which all processes
> perform a write to their sub-range.
> Are my assumptions on parallel HDF5 I/O anything close to reality?
If I recall correctly (but cannot find the reference right now), the size and
pattern of the data may impact severely the performance of HDF5. This should
also be checked.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]