Abstract:
Investigation of many complex phenomena involves data sets of high dimensions. This is typical for many medical and biological studies especially in Genetics and Pharmacology. We treat the binary response variable (showing, e.g., the state of patient's health) depending on $n$ discrete factors (explanatory variables). A very important problem is to find the most significant among them. The aim of the paper is to establish the necessary and sufficient conditions for strong consistency of the specified estimates, employing the cross-validation, of the error arising in prediction algorithm for the response variable. The impact of the choice of penalty function is discussed as well. The obtained results provide a basis for the well-known MDR-method widely used in genetic data analysis.
Key words and phrases:binary response variable, significant factors, penalty function, cross-validation, MDR-method, SLLN for arrays, strong consistency of estimates.