GPU-based acceleration of radio interferometry point source visibility simulations in the MeqTrees framework
Modern radio interferometer arrays are powerful tools for obtaining high resolution images of low frequency electromagnetic radiation signals in deep space. While single dish radio telescopes convert the electromagnetic radiation directly into an image of the sky (or sky intensity map), interferometers convert the interference patterns between dishes in the array into samples of the Fourier plane (UV-data or visibilities). A subsequent Fourier transform of the visibilities yields the image of the sky. Conversely, a sky intensity map comprising a collection of point sources can be subjected to an inverse Fourier transform to simulate the corresponding Point Source Visibilities (PSV). Such simulated visibilities are important for testing models of external factors that affect the accuracy of observed data, such as radio frequency interference and interaction with the ionosphere. MeqTrees is a widely used radio interferometry calibration and simulation software package that contains a Point Source Visibility module. Unfortunately, calculation of visibilities is computationally intensive: it requires application of the same Fourier equation to many point sources across multiple frequency bands and time slots. There is great potential for this module to be accelerated by the highly parallel Single-Instruction-Multiple-Data (SIMD) architectures in modern commodity Graphics Processing Units (GPU). With many traditional high performance computing techniques requiring high entry and maintenance costs, GPUs have proven to be a cost effective and high performance parallelisation tool for SIMD problems such as PSV simulations. This thesis presents a GPU/CUDA implementation of the Point Source Visibility calculation within the existing MeqTrees framework. For a large number of sources, this implementation achieves an 18x speed-up over the existing CPU module. With modications to the MeqTrees memory management system to reduce overheads by incorporating GPU memory operations, speed-ups of 25x are theoretically achievable. Ignoring all serial overheads, and considering only the parallelisable sections of code, speed-ups reach up to 120x.