P. V. Bezmaternyh, E. L. Pliskin, V. V. Farsobina, “Information system for structured documents OCR quality control”, Informatsionnye Tekhnologii i Vychslitel'nye Sistemy, 2018, Issue 2,Pages <nobr>94

APPLIED ASPECTS OF COMPUTER SCIENCE

Information system for structured documents OCR quality control

P. V. Bezmaternyh^a, E. L. Pliskin^b, V. V. Farsobina^b

^a Smart Engines Service, Moscow, Russia
^b Federal Research Center “Computer Science and Control” of Russian Academy of Sciences, Moscow, Russia

Abstract: To date, the computational experiment remains a daily routine procedure during development of machine learning (ML) based software, such as optical character recognition (OCR). Well-known approach of «continuous integration» (CI) is a natural choice for the development of ML software. CI involves frequent centralized program builds and execution of bench tests. This generates a large amount of test results, which should be readily available to developers for error analysis and software version comparison. This article suggests the architecture of the automatic quality control system for the structured documents OCR, including collection, storage and display of bench test results. The results of all software tests are loaded into the database. Builds and bench tests can execute on virtual servers running various operating systems (OS). For stability, the web-server and database use different hardware from the build server. Web technologies are used both for automatic uploading of test results to the database and for servicing user queries.

Keywords: computer experiment, machine learning, data processing, web applications, regression testing, continuous integration, quality control.

DOI: 10.14357/20718632180208