RUS  ENG
Full version
JOURNALS // Proceedings of the Institute for System Programming of the RAS // Archive

Proceedings of ISP RAS, 2015 Volume 27, Issue 6, Pages 315–334 (Mi tisp200)

This article is cited in 1 paper

Processing of raw astronomical data of large volume by MapReduce model

S. V. Gerasimova, A. V. Mesheryakovb, I. Yu. Kolosova, E. S. Glotova, I. S. Popova

a Lomonosov Moscow State University Faculty CMC
b Space Research Institute of the Russian Academy of Sciences

Abstract: Exponential grow of volume, increased quality of data in current (SDSS, DES, PanSTARRS) and incoming sky surveys (LSST) open new horizons for astrophysics but require new approaches to data processing especially big data technologies and cloud computing. This work presents a MapReduce-based approach to solve a major and important computational task in astrophysics — raw astronomical image data processing. We present architecture of Hadoop-based astrophysical pipeline which combines following steps of data processing: background removal, projection, co-addition, PSF-modelling, sky objects features extraction from images. The architecture uses modern implementations of astrophysical image processing algorithms from software packages SWarp, PSFEx, SExtractor. These tools are integrated in MapReduce procedures. The pipeline steps are joined in two phases. First phase — "raw" data processing — includes background removal, projection and images co-edition. Results of the first phase are preprocessed and co-added images into so called cells. Cells interleave by borders. Interleavings help us to process correctly on the second stage large sky objects on borders. The second stage includes steps of PSF-modelling and creation of the sky catalogue by extraction of sky object properties from cells. Experiments showed linear scalability of all processing steps and small impact of Hadoop infrastructure on entire performance costs. We used one filter data (red) from the Stripe82 dataset. All experiments are made inside cloud platform Microsoft Azure HDInsight.

Keywords: MapReduce, Hadoop, sky survey, big data, cloud computing, image processing.

DOI: 10.15514/ISPRAS-2015-27(6)-20



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2024