Using Funk SVD to Effectively Target Customers with Coupons

Reid
8 min readFeb 16, 2021

Introduction

Businesses often use coupons to entice customers to come into their stores, make purchases, and hopefully are converted into loyal customers. Bringing customers into your store is often a very difficult task but using data analytics and data science can help companies create effective and engaging ways to do this.

One way a company can accomplish this is by creating an effective coupon program that are personalized to each customers tastes, profiles, and purchase histories. This project will demonstrate how to use Funk Singular Value Decomposition (Funk SVD) to create a recommendation system that can effectively recommend individualized coupons to customers. The data used in this project is from Starbucks and mimics data related to their coupon, customer, and transaction data. It was provided by the Starbucks for use in capstone projects through Udacity’s Data Science program.

Data Exploration, Cleaning, & Transformation:

There are three datasets provided by Starbucks:

1. Coupon Data

2. Customer Profile Data

3. Transaction History

Coupon Data: The coupon dataset contains all characteristics of the coupons offered by Starbucks.

It contains 5 coupons in total and contains information about how the coupon is offered, the type of coupon (BOGO, informational discount), the coupons value, and how long the coupon lasts (in days).

I came away with a few observations after quickly looking at the data:

- The most common types of coupons are BOGO deals

- All offered are distributed through the email channel, and only 2 are distributed through social media

- The coupon with the longest duration and most difficulty does not have the highest reward, which seems counter intuitive.

- The reward to difficulty ratio is 1 for three of the coupons. Only one coupon has a ratio less than one when not considering the coupon with no reward and no difficulty.

The categorical variables should be transformed using hot encoding. I performed this transformation for the “offer_type” and “channel” columns. This means that each category is now has its own column and is represented by a 1 and 0.

Customer Data: There are 17,000 customers in this dataset and 4 columns. The columns and their data types are illustrated in this summary table:

First each variable could be looked at individually:

Looking at these variables individually, there are a few things that caught my attention:

1) Males are the most the largest of the three gender groups.

2) There is a large concentration of customers just below 120 years, which is odd.

3) Most customers joined the program recently, and there appears to be three stair steps where customers joined the program.

4) Income is concentrated on the lower end of the spectrum, as noted by the age distribution’s tail on the right.

The age distribution seemed odd so additional exploration was done. It seemed odd to have a large concentration over 100, since not many people live to be that old.

Based on the first five rows, the first impression it appears these customers did not want to input any personal information. There is no gender or income information and all had age values of 118. I decided to see what proportion of these rows contained null values for salaries and genders to see how accurate my hunch was:

It appears that there were very few records with gender and salary information. Although these records may not appear to be useful, I did not remove them from the dataset. These users are still useful since we can still use their purchase history to make predictions for future users who match these customer profiles. I did decide to make one change in this dataset. I updated the age of these records to NaN values since keeping them at 118 may skew the models since 118 is still a valid integer.

After looking at each variable individually, I wanted to explore relationships between these features and see if there is anything that sticks out. Some things I hypothesized were possible were that purchasing preferences and habits may be a result of differences in characteristics such as age, gender, income, and how long a customer has been in the loyalty program.

Distributions of income and ages between genders of Starbucks customers:

The age distribution appears to be fairly similar between gender groups. However, for the income distribution it is interesting to see the male category is more heavily concentrated in lower income brackets when compared to the others. The age distribution seems to skew younger for the male category, so this may contribute to, or explain, the concentration of lower incomes.

Based on the previous distributions, I wanted to see if being different generations would make a difference. Many studies show differences in preferences and spending habits between generations. For example, Millennials are often referenced as a group that spends a lot of money on eating out, so it would be interesting to add this feature to the dataset as a categorical variable. I used ages to do this and plotted the generation data and income data to get a sense of it:

This was interesting since Generation X, Baby Boomers, and the Silent generation have similar widely distributed incomes. Meanwhile, Millennials and Generation Z both have incomes that are heavily concentrated below $80,000, since they are younger and just starting their careers.

Transaction Data: The data contains four columns with approximately 306,000 rows.

The event column includes three different categories:

1) offer received: Indicates a customer received the coupon offer through the channels they were targeted in.

2) offer viewed: Indicates when a customer views the coupon that was sent to them.

3) offer completed: Indicates a customer met the terms of the coupon and redeemed the offer.

The most frequent “event” value is “offer received”. This aligns with my expectations since everyone should receive a coupon offer, even if they do not look at it or do not complete the offers terms. One thing to note related to this column is that a customer can complete an offer without ever viewing it.

Because of this, I made a few assumptions when creating my model:

- When a customer completes an offer without viewing it, this means that the coupon was not a driving factor for the purchase

- The only time a coupon is successful is when a customer first view the offer and then completes the same offer.

Finally, the “value” column contains a dictionary where the key indicated extra information for the row. The keys were “offer id”, “amount”, and “reward”, and their dictionary values contained information whether it was the transaction amount, offer id for the coupon, or how much the reward was.

This format may have been good for data collection, but was difficult to work with for analysis. I transformed the data so each dictionary key had its own column and encoded the data so the row values were what was contained in the value pair from the dictionary.

Model Selection & Data Transformation:

After looking at the data, I decided to use a Funk SVD model. Funk SVD is useful in this situation since it uses collaborative filtering and can make recommendations even if a user has not interacted with an item before. This model will take in a customer as input and output a coupon recommendation. I performed a few data transformation steps in order to use this model:

1) Created a binary target variable: The target variable indicates if the coupon was successful. The target variable is 1 if the coupon was successful and a 0 in all other cases.

2) I joined the transcript data, customer data, and offer data into one dataset.

3) Created a user-item matrix.

4) Determined how many latent features to use.

5) Split the data into test and training sets.

6) Train the model.

7) Assess the model.

These steps are detailed in my Python files. I used sum square error (SSE) as my metric to assess the performance of the model, which calculates the difference between predicted and actual values. This allows for comparison between models and let’s us see if improvements are being made for predictions. Minimizing this metric is ideal and my target is to have a model with less than 4%.

Conclusion:

Using Funk SVD, I was able to use previous customer, coupon, and transaction data to build a recommendation system that can present coupons to customers that are effective in bringing them into the store to make purchases. The model is able to take a customer as input and output a single coupon suggestion to present to the customer.

The biggest challenge I had was the performance of my code when calculating the target variable using the effectiveness logic. Calculating this took just under 2 hours to do, and I believe I can either refactor my code or use distributed computing to improve this calculation time. Although it was acceptable for the code to run slowly during this prototyping phase, that probably will not be effective when putting this into a live production environment.

--

--