h5md-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] Dataset layouts


From: Felix Höfling
Subject: Re: [h5md-user] Dataset layouts
Date: Wed, 01 Jul 2015 10:43:37 +0200
User-agent: Opera Mail/12.16 (Linux)

Am 24.06.2015, 22:55 Uhr, schrieb Peter Colberg <address@hidden>:

On Tue, Jun 16, 2015 at 11:30:28AM +0200, Felix Höfling wrote:
Peter, did I understand correctly: parallel reading of a dataset must take into account whether the dataset is compact or not, otherwise the data are
inconsistent between the MPI processes? (Actually, such a mis-use of the
HDF5 library should raise an exception in my opinion.)

Fortunately not, the inconsistency only arises when only one process
writes to the compact dataset, and subsequently all processes read
from that dataset. Metadata writes go to the per-process cache, and
metadata reads are from the per-process cache. To keep the per-process
caches in sync, metadata writes must be collective.

With respect to writing, I don't see any need to require compactness. The writer application "knows" whether it uses the MPI interface or not and can act accordingly. Second, h5py does a great job in writing H5MD files so far.
I would not like to break this kind of support by making compactness
mandatory.

I agree, though h5py should allow the compact layout for efficiency.

For reading, on the other hand, the implementation of a reader has to be
simple for the sake of robustness. Querying the storage layout before
reading may be one complication that can be avoided by specifying the
layout. (This reminds me of an endless discussion about the string type.)

I tested what happens when all processes read a scalar dataset with
contiguous layout. It actually works fine. I get the same read times
as for the compact layout.

How about we include in the specification that the scalar "step"/"time"
dataset SHOULD use a compact layout?

Peter

Hi Peter,

(Sorry for the delayed reply.)

If I understand your examples correctly, the issue of parallel reading/writing is only an issue within the same application using MPI. If the programme (not the file format) is implemented consistently, there is no problem. For me, there is no need to put more restrictions in the H5MD specification. We should keep it as simple/minimal as possible, also to avoid conflicting specifications.

However, it is a good idea to share your experience. I think that the "implementation" section on the H5MD web page is the perfect place for this. Would this meet your concerns?
http://nongnu.org/h5md/implementation.html#compact-datasets

Best,

Felix



reply via email to

[Prev in Thread] Current Thread [Next in Thread]