Predicting customer churn in the financial industry
Fagerholm, Fredrik (2022)
Fagerholm, Fredrik
2022
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2022032825697
https://urn.fi/URN:NBN:fi-fe2022032825697
Tiivistelmä
The focus of this thesis is to investigate how customer churn can be modelled at a company in the financial industry, by exploring the computational means of predicting customer churn. The goal is to build a working model for predicting the churn of borrowers and at the same time explore the main drivers of churn. The study is carried out in the form of a single-case study, and the purpose of this thesis is to present the theory and methods used, as well as document the process and findings.
The problem is defined as a classification task, and a random forest model is evaluated through cross-validation. Since the problem is framed in this way, the definition of what constitutes a churn event has a major impact on the applicability of the results. Therefore, great attention is given to this issue. Appropriate metrics are used to assess the performance of the model. The choice of evaluation metrics for the classifiers' performance is especially important due to the class imbalance in the data. The metrics AUC and Cumulative gain are chosen since these are insensitive to class imbalance, compared to metrics such as Accuracy score.
The model appears to capture the churn phenomenon quite well for the prepared data and is able to accurately predict which customers will churn during cross-validation. The model has an AUC score of 0.74.
The problem is defined as a classification task, and a random forest model is evaluated through cross-validation. Since the problem is framed in this way, the definition of what constitutes a churn event has a major impact on the applicability of the results. Therefore, great attention is given to this issue. Appropriate metrics are used to assess the performance of the model. The choice of evaluation metrics for the classifiers' performance is especially important due to the class imbalance in the data. The metrics AUC and Cumulative gain are chosen since these are insensitive to class imbalance, compared to metrics such as Accuracy score.
The model appears to capture the churn phenomenon quite well for the prepared data and is able to accurately predict which customers will churn during cross-validation. The model has an AUC score of 0.74.