RUS  ENG
Full version
JOURNALS // Chelyabinskiy Fiziko-Matematicheskiy Zhurnal // Archive

Chelyab. Fiz.-Mat. Zh., 2018 Volume 3, Issue 2, Pages 227–236 (Mi chfmj102)

Informatics, Computer Science and Control

Analysis of the texts for predicting the churn of ISP

A. A. Karyakina, D. S. Botov

Chelyabinsk State University, Chelyabinsk, Russia

Abstract: The possibility of forecasting the churn of customers based on the data of the Russian ISP are considered. The basic stages and approaches to the preliminary processing of the texts of operators’ comments have been determined. It’s offered to use classification algorithms such as the logistic regression, $k$-nearest neighbors method, the gradient boosting, the naive Bayesian algorithm. As a sample, an array of input data from 23 features of 380 000 subscribers was formed. Typos are correcting with using the Dahmerau — Levenshtein distance and lemmatizing of the textual information, and then they are converted into a feature vector using the TF-IDF method and are added to the model. The main approaches of categorical features coding are determined. The forecast models are constructed. Comparison of the results of the study with different classifiers is made and conclusions are drawn.

Keywords: prediction, clients churn, ISP, python, customers calls, classification, analysis of texts, tf-idf.

UDC: 004.855.5

Received: 31.12.2017
Revised: 04.05.2018

DOI: 10.24411/2500-0101-2018-13209



© Steklov Math. Inst. of RAS, 2024