Abstract:
In the modern world, artificial intelligence (AI) technologies are becoming an important part of life, and the problem of lack of data for training models is becoming relevant. Limited access to real data due to privacy and lack of information hinders the development of AI and machine learning-based systems. In recent years, so-called “trusted AI” systems have also been actively developing, focusing on safety, reliability, and ethics. These systems eliminate the problems of bias and opacity of algorithms by providing explanations for their decisions. In response to the lack of data, the concept of synthetic data arises, which allows AI models to learn on artificially created but realistic data. This approach helps to overcome the difficulties associated with the lack of real data and contributes to the creation of more effective and unbiased AI models. This paper considers the possibility of using data generation quality indicators as an indicator of the quality of using this data for machine learning tasks.