GitHub - aws-samples/churn-prediction-with-text-and-interpretability: Predict customer churn with text and interpretability.

Churn Prediction with Text and Interpretability

Customer churn, the loss of current customers, is a problem faced by a wide range of companies. When trying to retain customers, it is in a company’s best interest to focus their efforts on customers who are more likely to leave, but companies need a way to detect customers who are likely to leave before they have decided to leave. Users prone to churn often leave clues to their disposition in user behavior and customer support chat logs which can be detected and understood using Natural Language Processing (NLP) tools.

Here, we demonstrate how to build a churn prediction model that leverages both text and structured data (numerical and categorical) which we call a bi-modal model architecture. We use Amazon SageMaker to prepare, build, and train the model. Detecting customers who are likely to churn is only part of the battle, finding the root cause is an essential part of actually solving the issue. Since we are not only interested in the likelihood of a customer churning but also in the driving factors, we complement the prediction model with an analysis into feature importance for both text and non-text inputs.

The categorical and numerical data is from Kaggle: Customer Churn Prediction 2020 and was combined with a synthetic text dataset we created using GPT-2.

Blog Post

Medium / Towards Data Science blog post

Installation

git clone https://github.com/aws-samples/churn-prediction-with-text-and-interpretability.git
conda create -n py39 python=3.9
conda activate py39
cd churn-prediction-with-text-and-interpretability
pip install -r requirements.txt

Download categorical/numerical data and combine with synthetic text data

Download categorical/numerical data - Customer Churn Prediction 2020 May require Kaggle account. Download train.csv and store in data folder.
Run script to combine categorical data with synthetic text data (../scripts)
```
python create_dataset.py
```

Run in Notebook

An example notebook to run the entire pipeline and print/visualize the results in included in ../notebook.

Run in Terminal

The python scripts to prepare the data, train and evaluate the model, as well as interpret the model, are stored in ../scripts. The parameters used for training and interpreting the model are stored in ../model/params.yaml.

Prepare the data:
```
python preprocess.py
```
Train and evaluate the model:
```
python train.py
```

Interpret the trained model (text):

python interpret.py --churn 1 --speaker Customer

Credits

Packages:
Datasets:
- Customer Churn Prediction 2020 (with synthetic text dataset)
Models:
- GPT2, Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever
- BERT, Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
- Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, Reimers, Nils and Gurevych, Iryna

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

model

model

notebook

notebook

scripts

scripts

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Churn Prediction with Text and Interpretability

Blog Post

Installation

Download categorical/numerical data and combine with synthetic text data

Run in Notebook

Run in Terminal

Credits

Security

License

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
model		model
notebook		notebook
scripts		scripts
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

aws-samples/churn-prediction-with-text-and-interpretability

Folders and files

Latest commit

History

Repository files navigation

Churn Prediction with Text and Interpretability

Blog Post

Installation

Download categorical/numerical data and combine with synthetic text data

Run in Notebook

Run in Terminal

Credits

Security

License

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages