Motivation
We are three second-year graduate students in the astronomy department. For our research, we perform simulations with massively parallel codes (e.g. moving mesh fluid dynamics code Arepo and particle-in-cell code tristan-mp) to study various astrophysical systems.
Efficient analysis and visualization of the simulation data is crucial to our work.
With 3D simulations, visualizing the data is a challenge even when the
simulation box has evenly-spaced grid points, simply because the screen is 2D.
One common simplification technique is to just plot a couple of 2D slices
through the 3D box, which inevitably misses a lot of structures in the simulation.
A further challenge arises when the simulation grid is not uniform, which is
the case for many cosmological simulations, including simulations generated by
Arepo.
A lot of recent effort in the community has gone into developing volume-rendering software to visualize the full 3D volumetric data sets. In particular, yt is a widely used, open source software, primarily written in Python with MPI, for this purpose. However, it lacks GPU support, which could potentially speedup and increase the efficiency of the algorithm.
In addition, yt
and other visualization software only support output files from limited number of simulation
codes and data structures, which unfortunately does not include our codes.
For our final project, we developed an algorithm which combines GPU with MPI to render 3D data of arbitrary grid type at lightning fast speed.
We would like to visualize 3D scalar fields that are large data set outputs of simulations for both structured and unstructured grids. The goal is to create a rapid method to render the whole 3D data using raycasting with GPUs and MPI so that we can visualize the outputs of simulations we use in our research projects.
Our Data
The data comes from astrophysical fluid dynamics simulations used in our research. These simulations are run with a veriety of codes, including the moving mesh code Arepo and the particle-in-cell code tristan-mp. As mentioned above, both codes produces different kind of outputs. In the case of Arepo, the data is an unstructured Voronoi mesh, while tristan-mp generates structured data. The file size of the outputs depend strongly on the size of the grid/mesh, and on the resolution employed in the simulations, ranging from a few MB up to a few GB.
Program Design
Raycasting is highly parallelizable, so we use GPUs to have each thread calculate a single picture in our image. In addition, we choose to further boost our method by using MPI to distribute the calculation of different frames in the movie across a GPU cluster. This is the most efficient way to use a cluster for generating movies, as the frame calculations are independent of each other and the algorithm strongly scales. Our code, Pavoreal, can be run by importing the pavoreal package and writing a simple and short driver script and setting up a configuration file, making it easy to use for a variety of applications.
Usage
The code is used by setting up a configuration file which specifies the camera path/orientation, transfer functions, data, output directory, and gridding/smoothing parameters and importing the pavoreal package and running a simple driver script (see README or the download page for details and examples).
Performance
The MPI version of our code scales strongly, since there is no communication between processes. Using a GPU, as opposed to a CPU, on each process has significant advantages, making the total runtime of the code 100-1000 times faster for the various data sets. Some optimizations we implemented include:
Optimization
Insights
Future Work
We would like to extend this project by using PyOpenGL, or the GL module included in PyCUDA, to create an interactive version of pavoreal by having the GPU memory feed directly into video output. Unfortunately, PyCUDA was compiled without GL support on resonance, so our current method generates images for every frame (with matplotlib) and movies on the CPU. The data transfer between GPU and CPU, and the file writing actually takes a significant amount of time (on order of a second per frame) compared to the rendering calculations, so the interactive mode would also help visualize data even faster.
Reflections
We definitely enjoyed harnessing the raw power of GPUs and also viewing the final movies we created, which were a nice reward. Some of the simulations have never been visualized before and it was exciting to see what they looked like. One of the challenging aspects of the project was dealing with the unstructured data sets. Binning data on GPUs is not straightforward since naive CPU-like implementations can lead to bank conflicts. Working out the geometry of the camera grid was frustrating, especially after we changed the data to be cast to a texture because initially we fixed the camera and rotated the object, but textures cannot be rotated so we have to rotate the camera instead and rework all the code and conventions.