Abstract:
The article is devoted to the problem of feature selection in regression models estimated using the ordinary least squares method. Models constructed as a result of such selection are often inadequate and poorly interpreted. For the first time, the definitions of “quite interpretable” and “RTF-adequate” regression models are formulated. The previously proposed effective algorithm for solving the problem of feature selection is considered. On its basis, an algorithm has been developed for constructing quite interpretable and RTF-adequate linear regression models. In it, for each regression, the following tests are sequentially carried out: “informativeness” of variables, multicollinearity, correspondence of coefficients signs to the physical meaning of factors, adequacy of model in terms of coefficient of determination and significance in general according to Fisher's F-test, and significance of the coefficients according to the Student's t-test. The proposed algorithm is implemented as a program for the Gretl econometric package. The developed program is universal and can be used to solve a wide range of data analysis tasks.
Keywords:feature selection, ordinary least squares, quite interpretable and RTF-adequate regression, variable “informativeness” criterion, multicollinearity, Fisher's F-test, Student's t-test.