## Cosmological Calculations with GPUs

A while ago, I wrote a series of posts on how astronomers are learning to use GPUs  in astronomy: The Use of Graphical Processing Units (GPU’s) In Astronomy, GPU’s in Astronomy: Critical Decisions for Early Adopters, GPU’s vs. CPU’s: Apples vs. Oranges?. I hope they serve as a good introduction to GPUs for non-specialists. Broadly speaking, GPUs provide best performance (x100 or more speed-ups vs. CPUs) for brute force, highly compute intensive algorithms.

This week, I have been reading a paper by Bard et al. in the inaugural edition of Astronomy and Computing, with the title  “Cosmological calculations on the GPU.” Their scientific motivation was to understand the processing power needed to extract cosmological information from new surveys such as LSST, the Dark Energy Survey and PanSTARRS.  Current methods of calculating cosmological quantities scale as O(number of data^2), which are not feasible for datasets containing billions of points. Fortunately, these calculations are easy to parallelize, as they involve independent calculations of the same quantity. Bard et al. investigate how to use GPUs for these calculations, rather than clusters of CPUs, with input data drawn from published simulated data sets.

Specifically, the paper addresses the calculations of the two-point angular correlation function and the aperture mass statistic.  The code took advantage of the Compute Unified Device Architecture (CUDA) framework, developed by NVIDIA Corp, which abstracts the details of the GPU architecture away from the developer.  The paper gives full details of the algorithms and implementations. The code is available on github at https://github.com/djbard/ccogs.

The calculations were all run with a Tesla M2070 GPU card with Fermi architecture and 5.25 GB on-chip global memory (GPUs used for science generally have a large on-chip memory).

Here, I  summarize their important results, as illustrated in their Tables 1 and 2 and Figure 1  below:

• GPUs can speed up the calculations of the angular correlation function by more than x100 (Fig 1)
• GPUs can speed up calculations of the aperture mass statistic by x100-300, depending on filter aperture and size of reconstruction grid (Table 2 and Fig 1).

Finally, the fast processing times of the GPUs enable exact calculation of these two quantities and their uncertainties for all angular scales: the tree-algorithm approximations used on CPUs do not provide this benefit.