Abstract:
This study addresses the problem of obtaining an aggregated forecast of railway freight transportation. To improve the quality of aggregated forecast, the time series clusterization problem is solved in such a way that the time series in each cluster belong to the same distribution. To solve the clusterization problem, it is necessary to estimate the distance between empirical distributions of the time series. A two-sample test based on the Kullback–Leibler distance between histograms of the time series is introduced. Theoretical and experimental research of the suggested test is provided. Also, as a demonstration, the clusterization of a set of railway time series based on the Kullback–Leibler distance between time series is obtained.
Keywords:empirical distribution function; distance between histograms; Kullback–Leibler distance; two-sample problem.