Acceleration of the noise suppression component of the DUCHAMP source-finder
The next-generation of radio interferometer arrays - the proposed Square Kilometre Array (SKA) and its precursor instruments, The Karoo Array Telescope (MeerKAT) and Australian Square Kilometre Path finder (ASKAP) - will produce radio observation survey data orders of magnitude larger than current sizes. The sheer size of the imaged data produced necessitates fully automated solutions to accurately locate and produce useful scientific data for radio sources which are (for the most part) partially hidden within inherently noisy radio observations (source extraction). Automated extraction solutions exist but are computationally expensive and do not yet scale to the performance required to process large data in practical time-frames. The DUCHAMP software package is one of the most accurate source extraction packages for general (source shape unknown) source finding. DUCHAMP's accuracy is primarily facilitated by the Ã trous wavelet reconstruction algorithm, a multi-scale smoothing algorithm which suppresses erratic observation noise. This algorithm is the most computationally expensive and memory intensive within DUCHAMP and consequently improvements to it greatly improve overall DUCHAMP performance. We present a high performance, multithreaded implementation of the Ã trous algorithm with a focus on 'desktop' computing hardware to enable standard researchers to do their own accelerated searches. Our solution consists of three main areas of improvement: single-core optimisation, multi-core parallelism and the efficient out-of-core computation of large data sets with memory management libraries. Efficient out-of-core computation (data partially stored on disk when primary memory resources are exceeded) of the Ã trous algorithm accounts for 'desktop' computing's limited fast memory resources by mitigating the performance bottleneck associated with frequent secondary storage access. Although this work focuses on 'desktop' hardware, the majority of the improvements developed are general enough to be used within other high performance computing models. Single-core optimisations improved algorithm accuracy by reducing rounding error and achieved a 4X serial performance increase which scales with the filter size used during reconstruction. Multithreading on a quad-core CPU further increased performance of the filtering operations within reconstruction to 22X (performance scaling approximately linear with increased CPU cores) and achieved 13X performance increase overall. All evaluated out-of-core memory management libraries performed poorly with parallelism. Single-threaded memory management partially mitigated the slow disk access bottleneck and achieved a 3.6X increase (uniform for all tested large data sets) for filtering operations and a 1.5X increase overall. Faster secondary storage solutions such as Solid State Drives or RAID arrays are required to process large survey data on 'desktop' hardware in practical time-frames.