[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Future of GPU support in Espresso
From: |
Rudolf Weeber |
Subject: |
Future of GPU support in Espresso |
Date: |
Tue, 12 May 2020 10:32:56 +0200 |
User-agent: |
Mutt/1.9.4 (2018-02-28) |
Dear Espresso users,
when taking stock of how we, the Espresso core team, spend our development
time, it became clear that maintaining the GPU support in Espresso makes up a
disproportionately large fraction.
Last month alone, the core team spent more than 50 hours on dealing with new
linux distributions, compiler versions and library dependencies related to GPU
support. It was all build and test infrastructure, no improvement in terms of
functionality or performance was achieved for Espresso.
Not for the first time, we asked ourselves, whether we should drop GPU support
from Espresso and spend the development time on other aspects related to
Espresso, where we see a larger benefit.
Before making a decision, we would like to discuss this with you, the users.
The affected methods for which a CPU alternative would have to be used are:
* GPU lattice Boltzmann and electrokinetics
* GPU charge P3M method
* GPU dipolar Barnes-Hut and direct summation method
To help us make an informed decision, please let us know, if you are currently
using any of these methods and roughly what kind of systems you are looking at:
* number of particles
* volume fraction
* active methods (electrostatics, magnetostatics, lattice Boltzmann,
electrokinetics, virtual sites, ...)
* how many time steps in a simulation
* how many simulations
* what is the relative importance of time to solution for a single simulation
compared to the entire bunch of simulations in a project to you? (Note:
Dropping GPU support will likely increase the time to solution of a single
simulation. On the other hand, compute time on GPUs is often not as readily
available as compute time on pure CPU systems. It may therefore be possible to
run more simulations in parallel if GPUs are not required.)
* what GPUs do you have access to, and how many?
We hope that gathering answer to these questions will let us figure out, how to
proceed.
Below, please find some more technical notes.
Regards, Rudolf
Details on the high maintenance effort of GPU support:
* GPUs are not readily available in public continuous integration testing
services. Therefore, GPU-testing has to be performed on infrastructure we
operate ourselves, both, in terms of hardware and software.
* Nvidia places a lot of restrictions on which version of their software is to
be used with which compiler GCC and Clang compiler version
* There are subtle differences of opinion on correct C++ between the components
involved
* NVidias compiler requirements are not necessarily the default typically
installed with linux distros such as Ubuntu.
* Several of these issues have to be dealt with every time a new Ubuntu version
is released. (We use Ubuntu for testing GPU support).
Notes on lattice Boltzmann (LB):
* The GPU LB, along with the other GPu methods, are single precision. This
limits their accuracy. E.g., mass is not exactly conserved in our GPU LB, due
to rounding issues. It is unclear, how big an issue that is.
* The CPU LB implementation uses double precision.
* Switching the GPU LB to double precision would render it mostly unusable on
cheaper (<=500 Euro) gaming cards, as they are often used in desktops. These
have very poor double precision performance. Cards with good double precision
performance cost 5-10 times that amount.
* The time to solution for the CPU LB (using double precision compared to
single precisoin on GPU) is currently 2x-3x that of the GPU LB for a
Lennard-Jones+LB system with 10% volume fraction and an LB lattice constant
comparable to the Lennard-Jones sigma. This ratio is expected to improve as we
switch from our custom LB implementation to that provided by the Walberla
package.
* The hardware configuration of modern compute clusters with GPUs is not well
suited for Espresso simulations with GPU LB. The ratio of CPU core to GPU is
typically 10:1 to 20:1. For systems with less than 100k particles, Espresso
will neither use the GPU nor the CPU cores efficiently.
* Compute capacity without GPUs is much more readily available. It is also
cheaper, unless one can fully load the GPU, which is typically not the case for
soft matter simulations.
Notes on electrokinetics:
* Currently, only a single precision GPU implementation is available in Espresso
* Independently of the decision on GPU support, lattice Boltzmann and
electrokinetics will be provided by the Walberla package in the future. In a
first step, this will be a well-optimized CPU version in double precision.
Notes on electrostatics:
The CPU-based P3M method can be used instead of the GPU-based one.
Notes on magnetostatics:
* Due to the 1/r^3 decay and the random summation order, the use of single
precision in the GPU code is a relevant limitation to accuracy.
* The double-precision dipolar direct summation will be MPI-parallelized on the
CPU, allowing for better time to solution for larger systems.
* For systems with more than 10k particles, the dipolar P2NFFT method from the
ScaFaCoS library can be used for systems with open boundaries.
--
Dr. Rudolf Weeber
Institute for Computational Physics
Universität Stuttgart
Allmandring 3
70569 Stuttgart
Germany
Phone: +49(0)711/685-67717
Email: address@hidden
http://www.icp.uni-stuttgart.de/~icp/Rudolf_Weeber
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Future of GPU support in Espresso,
Rudolf Weeber <=