|
SEMINARS |
Seminar on Probability Theory and Mathematical Statistics
|
|||
|
Novel Ensembling Techniques for Mining Complex High Dimensional Data Susmita Datta |
|||
Abstract: High-throughput technologies in genomics and proteomics promoted the need to develop novel statistical methods for handling and analyzing enormous amounts of high dimensional data that are being produced on a daily bases in laboratories around the world. In this work, we propose novel methodologies to summarize the information in the data in terms of clustering and classification techniques. In particular, we find the optimal clustering algorithm for a given data amongst a collection of algorithms in terms of multiple performance criteria. We use stochastic optimization technique of cross entropy to rank aggregate a list of distances of multiple ordered lists to achieve this. We also use the concept of rank aggregation, boosting and bagging to form an ensemble classifier which always performs the best in terms of all classification performance measures. We illustrate the methodologies through simulated and real life microarray and mass spectrometry data. Language: English |