Abstract:
Currently, large time series are used in a wide range of subject areas. Modern time series DBMSs (TSDBMS) offer, however, a modest set of built-in tools for data mining. The use of third-party time series mining systems to undesirable overhead costs for exporting data outside the TSDBMS, converting data and importing analysis results. At the same time, there is a topical issue of the embedding of data mining methods into relational DBMSs (RDBMS), which dominate the market of data management tools. However, there are still no developments of time series mining methods in RDBMS. The article proposes an approach to the management and mining of time series data within the RDBMS based on the matrix profile concept. A matrix profile is a data structure that, for each subsequence of a time series, stores the index of and the distance to its nearest neighbor. The matrix profile serves as the basis for detecting motifs, anomalies and other primitives of time series mining. The proposed approach is implemented in the PostgreSQL RDBMS. The experimental results showed a higher efficiency of the proposed approach compared to the TSDBMS InfluxDB and OpenTSDB.