A. S. Anafiev, A. S. Karyuk, “Overview of approaches to solving the hyperparameters optimization problem for the Machine Learning algorithms”, Taurida Journal of Computer Science Theory and Mathematics, 2022, Issue 2,Pages <nobr>30

Overview of approaches to solving the hyperparameters optimization problem for the Machine Learning algorithms

A. S. Anafiev^a, A. S. Karyuk

^a V. I. Vernadsky Crimean Federal University, Simferopol

Abstract: Hyperparameter tuning is critical for the correct functioning of Machine Learning models. Finding the best combination of hyperparameters lies at the heart of many Machine Learning applications. Lets consider some definitions to state the main problem. Under the algorithm model, we understand a parametric family of mappings $ A = \{\varphi(x, \theta) \mid \theta \in \Theta\}, $ where $\varphi: X \times \Theta \rightarrow Y$ is some fixed mapping, $X$ is the set of objects, and $Y$ is the set of outcomes of the unknown target function in Machine Learning problem. The dependence of the algorithm model $A$ on hyperparameters is determined by a template of algorithm models or simply an algorithm template. A template is an operator $t:\Gamma \rightarrow \mathcal{A}$, where $\Gamma$ is the set of possible hyperparameter values, and $\mathcal{A}$ is the set of possible algorithms. If, in this case, $\Gamma=\Gamma_1 \times \ldots \times \Gamma_k$, where $\Gamma_j$ is the set of permissible values for hyperparameter $\gamma_j$, ${j=\overline{1,n}}$, then the template is called $k$-parametric. In other words, a $k$-parametric template defines the form of the mapping $\varphi$ of the algorithm model $A\in\mathcal{A}$ based on hyperparameters $\gamma_1, \ldots, \gamma_k$. Thus, when solving a machine learning problem, we can select the following two stages: the model selection stage and the training stage. In the model selection, the algorithm model is chosen using the template and hyperparameters, and then, in the training stage, using the training method [bibVoron] and the found model, the optimal algorithm (decision rule) is determined. Hyperparameters can specify the number of neurons in the hidden layer of a neural network, the maximum depth of a binary decision tree, the number of nearest neighbors in metric classification algorithms, and so on. The number of hyperparameters for the algorithm template can be quite large, and the optimal values of hyperparameters for the same algorithm may vary depending on the task the algorithm is solving and the input data it has to work with. Usually, optimal hyperparameter values are not obvious, and they need to be chosen. The selection process can take a long time, so there are various approaches to finding them, each with its advantages and disadvantages. The article provides an overview of existing algorithms for solving the hyperparameter optimization problem for machine learning algorithms and proposes a new approach based on optimization with precedent initial information. Let's assume that for some sets of hyperparameter values, we already know the quality metric values of the algorithm (for example, for randomly generated hyperparameter values). In this case, we are dealing with a case of supervised learning (regression problem), where the instances are sets of hyperparameters, and the responses are the quality metric values. Therefore, we can consider the hyperparameter tuning problem as an optimization problem with precedent initial information, for which one can use both a neural network-based or a tree-based approaches as outlined by prof. Donskoy V. I. The quality of the proposed approach for solving the hyperparameter optimization problem in machine learning models in real-world tasks is planned to be thoroughly investigated in future research.

Keywords: machine learning, hyperparameters tuning, neural networks.

UDC: 519.7

MSC: 65K10, 68T05