Using Logistic Regression to Predict Customer Retention

 Introduction

The aim of this project will be to generate insights on customer preference, create a logistic regression machine learning model, to equip both primary and secondary stakeholders with insights to make informed decisions on optimizing marketing strategies, sales targeting efforts, user experience on the firm's web platforms, and to innovate services and products, leading to the achievement of business goals and increase in revenue. 

The logistic regression machine learning algorithm will be created directly from the database using BigqueryML.


Project Structure

 Exploratory Data Analysis

Perform data wrangling using SQL to access the number of site visits,  measure the conversion rate and select a machine learning model to make predictions.


With the results of these queries, we can formulate a hypothesis on how our customers are attracted to our web platform and what exactly our customers expect from our service. 

This allows us to complement our marketing efforts by optimising the C.T.A (call to action) along the customer journey and to create effective re-marketing and customer win-back strategies.

To obtain further knowledge as to how relevant a brand is to prospective customers, I need to understand how these web users find us. Answering the business question of what marketing channels and strategies are our most efficient.

Exploring on, I write a query to understand which of our products are in top demand, this helps us to further understand our brand identity among our customers. This query also helps me to understand the revenue rate generated from specific categories and products.

Model Selection

However, to increase the accuracy of these strategies, I selected a logistic regression machine learning algorithm. This algorithm will inform us of the probable likeliness of current customers making subsequent purchases from the firm's web platform.

Observing the training specifications of the regression model.

This is the schema of the ML model, these are the features I have selected for the model, and the expected output is a single column that is named 'predicted_will_buy_on_return_visits' which predicts the customer segment which will return to engage with our web platform and make a purchase from the site.

I have explored the data, created a regression model to make predictions, however,  to understand how well our model performed, I have evaluated it and these are the results.

 Feature Engineering

The features selected for the algorithm is a combination of the time spent by each customer on the firm's web platform, the bounce rate for each web session and the unique visitor identification number. The goal of this algorithm will be to predict which customers will make subsequent purchases from the web platform.

This basically means re-engineering the model to be more accurate and to clearly understand the relationship which exists between the customer's first visit, total time spent on the site and how far the customer progressed into the conversion funnel.

Summary

The goal of this data analytics project is to identify the possibility that a  customer segment will make subsequent purchases from the firm's web platform. 
To achieve this, I performed exploratory data analysis 

After exploring the data, I chose features for the logistic regression machine learning model, which based on the available data, and on these chosen features, gave us insights on what customer cohort to optimise our marketing and customer success efforts to. To ensure the prediction model was accurate in describing this cohort, I selected additional features and re-created the model to evaluate its performance. The result of this analytics informs me that there is a 98% chance customers whose time on-site fits the model's description are likely to make recurrent purchases. 

Duration

From the data requirement understanding to analysis and implementation, this project timeline was for two weeks. However, similar projects might take up to two months depending on the business requirements, data requirements and business domain type.

During this project I collaborated with stakeholders (senior executives, marketing team, sales team, developers and customer success teams), empowering them to make informed decisions with insights obtained from my analysis to achieve business goals.Â