RUS  ENG
Full version
SEMINARS

Principle Seminar of the Department of Probability Theory, Moscow State University
October 30, 2019 16:45, Moscow, MSU, auditorium 12-24


Probabilistic methods of feature selection

A. Kozhevin

Lomonosov Moscow State University

Abstract: The thesis is devoted to some methods for variable selection. This problem is
not only of theoretical interest but also has a variety of applications, see,
for example, Buhlmann, van de Geer (2011), Bolon-Canedo, Alonso-Betanoz (2018).
Chapter 1 presents a modification of the MDR method proposed by Ritchie et al.
(2001) and developed by
Velez et al. (2007), Gui et al. (2011), Bulinsky (2012), Gola et al. (2015) and
others. The main focus of the chapter is on the analysis of stratified samples.
For the constructed estimates of the used error functional, strong consistency
is proved.
Chapters 2 and 3 develop information approaches to identifying significant
factors, see, for example, Bennasar et al. (2014), Vergara, Estevez (2014).
The second chapter discusses a new estimate of conditional entropy in a mixed
model (which includes, in particular, logistic regression) when the vector of
predictors has an absolutely continuous distribution and the response variable
is a discrete random variable. For the proposed estimate, its asymptotic
unbiasedness and $ L^2$-consistency under very weak conditions are proved.
In the third chapter, the estimate of mutual information in a mixed model is
constructed. Asymptotic unbiasedness and $ L ^ 2$-consistency are also proved
for it. The consistency of the procedure for variable selection based on the
introduced estimate of mutual information is proved when the number of
significant factors is known. The proofs use conditional mathematical
expectations, probabilistic inequalities, estimates of the rate of convergence
in the central limit theorem, and other techniques.
Theoretical results are supplemented by computer simulations. Comparison
with recent papers
by Coelho et al. (2016), Gao et al. (2017), Macedo et al. (2019) also provided.
The thesis consists of 118 pages, the bibliography contains more than 100 items.


© Steklov Math. Inst. of RAS, 2024