Abstract:
The goal of this survey is to analyze the global trends of development of massive data collections and related infrastructures in the world aimed at the evaluation of the opportunities for the shared usage of such collections during research, decision making, and problem solving in various data intensive domains (DIDs) in Russia. The representative set of DIDs selected for the survey includes astronomy, genomics and proteomics, neuroscience (human brain investigation), materials science, and Earth sciences. For each of such DIDs, the strategic initiatives (or large projects) in the USA and Europe aimed at creation of big data collections and the respective infrastructures planned up to 2025 are briefly overviewed. The information technology projects aimed at the development of the infrastructures supporting access to and analysis of such data collections are also briefly overviewed. The set of large data collections included into the survey and expected to be created soon is planned to be used as a reference point for the design and development of the research infrastructures for data management and analysis making them compatible with the foreign open research infrastructures. In particular, the data collections considered in the survey, the goals of their creation and the researches planned to be accomplished based on them make it possible to proceed to the design and implementation of the advanced components of the research infrastructures, such as, for example, conceptualization facilities of the application domains to be investigated in data intensive research, respective metamodels, components intended for data reuse and reproducing of programs and workflows, etc.
Keywords:fourth paradigm; data intensive domains; research infrastructures; data collections; big data.