|
SEMINARS |
Principle Seminar of the Department of Probability Theory, Moscow State University
|
|||
|
The epsilon-complexity of continuous functions and novel methodology of time series segmentation B. S. Darhovskya, A. Piryatinskab a Institute of Systems Analysis, Russian Academy of Sciences b San Francisco State University |
|||
Abstract: Off-line analysis of long time series should begin with testing of the hypothesis about its "homogeneity", that is, one should check whether the generating mechanism was not changed in the process of data collection. If this hypothesis is rejected, and the time series is concatenated from different ( long enough) pieces such that each of these pieces is generated by its own mechanism, then for carrying out the further analysis it is necessary to find points of "gluing together", i.e., those moments of time when the generating mechanism was changed. Without implementation of such procedure results of the analysis of a time series can't be adequate. In case when a time series is generated by probabilistic mechanisms, the described problem is well known change-point problem (in the off-line regime). However, not always a time series is generated by probabilistic mechanisms, but even if it is the case, to find the change-points without prior information of the process model is very difficult. The question arises: could we find an “intrinsic” characteristic of a time series such that: 1) it allows us to perform a segmentation into the homogeneous pieces 2) it does not depend on the type of the data generating mechanism (stochastic or not)? We believe that our novel concept - the epsilon- complexity of an individual continuous function - is such a characteristic. The concept of the epsilon- complexity is in line with general idea of A.N. Kolomogorov about complexity of an object. We show that for “almost any” function satisfying H¨older condition the epsilon-complexity can be characterized by a pair of real numbers which we call the epsilon-complexity coefficients. This fact allows to offer principally new methodology of time series segmentation, and the methodology does not depend of the nature (stochastic, deterministic or mixed) of a time series and does not use any prior information on generation mechanisms. The key hypothesis of this methodology is as follows: change of the generating mechanism involves change of mean values of the epsilon-complexity coefficients, and therefore detecting the moments of changes is reduced to the known problem of change-point detection in the mean value of random sequences . The results of computer simulations demonstrate the efficiency of the proposed methodology. |