22 Jun Predict Customers Churn with TabPy
Customer churn is a common problem across businesses in many sectors. If you want to grow as a company, you have to invest in acquiring new customers. So, whenever a customer leaves, it’s a significant loss. This means as a business owner you have to invest time and effort in replacing them.
The ability to predict when a customer is likely to leave is very important, which is why churn analytics enables organizations to identify and analyze the factors influencing customer churn.
Here, we will use Telco Customer Churn dataset to build a machine learning model, which will be used in TabPy for making our predictions.
Let’s get started;
Step1: Installation
You need to have Anaconda on your machine, if you don’t. You can download with the steps below;
- Visit Anaconda.com/downloads
- Select Windows
- Download the .exe installer
- Open and run the .exe installer
- Open the Anaconda Prompt and run some Python code to test
Step2: Install TabPy & Connect with Tableau Desktop
Install TabPy on your Machine and connect with Tableau Desktop; you can visit our previous tutorial on how to do that.
Step3: Building the Model
Here, our model will be built based on any supervised learning approach model, whereby with our training data , the model gets to capture the relationship between our features and target. We trained our model using Logistic Regression Algorithm.
The code is shown below;
1 import pandas as pd 2 import numpy as np 3 import seaborn as sns 4 from matplotlib import pyplot as plt 5 df = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv') 6 df.head() 7 df.info() 8 df.describe() 9 df.drop('customerID', axis=1, inplace=True) 10 df.head() 11 df.columns = df.columns.str.lower().str.replace(' ', '_') 12 df.churn = (df.churn == 'Yes').astype(int) 13 df_encoding= pd.get_dummies(df, drop_first=True) 14 categorical = ['gender', 'seniorcitizen', 'partner', 'dependents', 15 'phoneservice', 'multiplelines', 'internetservice', 16 'onlinesecurity', 'onlinebackup', 'deviceprotection', 17 'techsupport', 'streamingtv', 'streamingmovies', 18 'contract', 'paperlessbilling', 'paymentmethod'] 19 numerical = ['tenure', 'monthlycharges', 'totalcharges'] 20 X = df_encoding.drop('churn', axis=1) 21 # Target 22 y = df_encoding['churn'] 23 from sklearn.model_selection import train_test_split 24 X_train_full, X_test, y_train_full, y_test = train_test_split(X, y, test_size=0.2, random_state=1) 25 X_train, X_valid, y_train, y_valid = train_test_split(X_train_full, y_train_full, test_size=0.2, random_state=1) 26 print("Training Data Size: ", len(y_train)) 27 print("Validation Data Size: ", len(y_valid)) 28 print("Testing Data Size: ", len(y_test)) 29 from sklearn.linear_model import LogisticRegression 30 model = LogisticRegression(solver='liblinear', random_state=1) 31 model.fit(X_train, y_train) 32 y_val_pred = model.predict_proba(X_valid) 33 y_test_pred = model.predict_proba(X_test) 34 y_test_pred 35 print('LogisticRegression Training Accuracy: ', round(model.score(X_train, y_train), 2)) 36 print('LogisticRegression Validation Accuracy: ', round(model.score(X_valid, y_valid), 2)) 37 print('LogisticRegression Testing Accuracy: ', round(model.score(X_test, y_test), 2))
We were able to generate these scores:
1 array([[0.93961448, 0.06038552], 2 [0.92559415, 0.07440585], 3 [0.69169312, 0.30830688], 4 ..., 5 [0.99088494, 0.00911506], 6 [0.81732224, 0.18267776], 7 [0.35925132, 0.64074868]]) 8 LogisticRegression Training Accuracy: 0.87 9 LogisticRegression Validation Accuracy: 0.81 10 LogisticRegression Testing Accuracy: 0.81
The output of logistic regression is usually a probability, which means the probability our observation is positive, or y = 1. For our case, it’s the probability that the customer will churn.
It is also possible to look at factors or possible features responsible for this churn in our dataset. We can do that by carrying out a feature importance.
Now that we’ve trained our model, we will save it on our machine with pycaret.classification.save_model, for which we can use inside Tableau by passing the link to the location of the file.
Step4: Working with Tableau
Here, we ensure TabPy is connected with Tableau Desktop. We confirm that by opening http://localhost:9004/
After then, you open a calculated field and put in the code below. Drag this field in the Pane and see your predictions values.
1 SCRIPT_REAL("import pandas as pd 2 import pycaret.classification 3 the_model=pycaret.classification.load_model ('C:/Users/Cndro/Downloads/churn_model') 4 X_pred = pd.DataFrame({'gender':_arg1, 5 'SeniorCitizen':_arg2, 'Partner':_arg3, 6 'Dependents':_arg4,'tenure':_arg5, 7 'PhoneService':_arg6, 8 'MultipleLines':_arg7,'InternetService':_arg8, 9 'OnlineSecurity':_arg9,'OnlineBackup':_arg10, 10 'DeviceProtection':_arg11, 11 'TechSupport':_arg12,'StreamingTV':_arg13, 12 'StreamingMovies':_arg14,'Contract':_arg15, 13 'PaperlessBilling':_arg16,'PaymentMethod':_arg17, 14 'MonthlyCharges':_arg18,'TotalCharges':_arg19}) 15 pred = pycaret.classification.predict_model(the_model,X_pred) 16 return pred['Label'].tolist()", 17 ATTR([gender]),ATTR([SeniorCitizen]),ATTR([Partner]), ATTR([Dependents]), 18 ATTR([Tenure]),ATTR([PhoneService]),ATTR([MultipleLines]), ATTR([InternetService]), 19 ATTR([OnlineSecurity]),ATTR([OnlineBackup]), ATTR([DeviceProtection]),ATTR([TechSupport]), 20 ATTR([StreamingTV]),ATTR([StreamingMovies]), ATTR([Contract]),ATTR([PaperlessBilling]), 21 ATTR([PaymentMethod]),ATTR([MonthlyCharges]), ATTR([TotalCharges])
Hope you found this article helpful. Thanks for reading.
No Comments