Abstract:
The paper addresses the classification problem in multidimensional spaces. The authors propose a supervised modification of the t-distributed Stochastic Neighbor Embedding Algorithm. Additional features of the proposed modification are that, unlike the original algorithm, it does not require retraining if new data are added to the training set and can be easily parallelized. The novel method was applied to detect intrinsic plagiarism in a collection of documents. The authors also tested the performance of their algorithm using synthetic data and showed that the quality of classification is higher with the algorithm than without or with other algorithms for dimension reduction.