Abstract:
Univariate decision trees, used in the processing of sparse large dimentional data, have low computational efficiency. Multivariate decision trees are more expressive when classifying data, but overfit on small datasets. The paper proposes a method for learning trees with multidimensional nonlinear splitters, which improves the accuracy of classification on sets of images and texts. This is achieved by jointly optimizing the distance from the objects of the training dataset to the separating hyperplane and the data impurity criterion when building each node of the tree. Test results confirm the effectiveness of the method.
Keywords:decision trees, kernel splits, kernel trees, slack re-scaling, random forests.