RUS  ENG
Full version
JOURNALS // Chelyabinskiy Fiziko-Matematicheskiy Zhurnal // Archive

Chelyab. Fiz.-Mat. Zh., 2025 Volume 10, Issue 1, Pages 53–69 (Mi chfmj422)

Mathematics

Surrogate modeling of combinatorial bounds of overfitting for threshold decision rules

Sh. Kh. Ishkina

Federal Research Center “Computer Science and Control” of RAS, Russia

Abstract: The problem of constructing a surrogate model for fast computation of generalizarion ability estimates for threshold decision rules is solved. The process of collecting a training set for a model is described. The sample consists of pairs (sample, target), and each sample is a family of threshold decision rules, the target is a value of generalization ability bound for the family. Based on existing studies of generalization ability bound estimation conducted within the combinatorial theory of overfitting, a list of features that describe the training set samples was developed. Various machine learning algorithms were considered, and the neural network model with a mean percentage error of $2.8\%$ was chosen as the best based on testing results. Based on the analysis of the significance of features, it is shown that when constructing overfitting estimates, it is not enough to use only the number of classifiers and the minimum accuracy of classifiers in the family. It is necessary to take into account the internal structure of the family (splitting by the number of errors) and the relationship between classifiers (connectivity). The resulting model can be used in feature selection problems when constructing decision trees, neural networks, and in boosting algorithms to control overfitting.

Keywords: surrogate modeling, combinatorial theory of overfitting, complete cross-validation, generalization ability, threshold classifier, computational complexity.

UDC: 519.25

Received: 19.07.2024
Revised: 21.12.2024

DOI: 10.47475/2500-0101-2025-10-1-53-69



© Steklov Math. Inst. of RAS, 2025