Multi-Armed Bandits#
Step 1: Simulate the Dataset
We’ll create a dataset where each potential client has a set of features (context) and can receive one of several possible messages.
The goal is to learn which message works best for different segments based on their context.
Key Components:
Contextual features: These could be behavioral data such as the number of SMSs read, the average time taken to respond, etc.
Actions: The different messages that can be sent.
Rewards: The reply rate, indicating whether the client responded to the message.
import numpy as np
import pandas as pd
# Set seed for reproducibility
np.random.seed(42)
# Parameters
n_customers = 1000 # Number of customers
n_messages = 5 # Number of different messages
n_features = 4 # Number of contextual features
# Simulate customer features (contexts)
X = np.random.rand(n_customers, n_features)
# Simulate rewards for each message (action)
# We assume that different contexts have different optimal messages
true_coefficients = np.random.rand(n_messages, n_features)
noise = np.random.randn(n_customers, n_messages) * 0.1
rewards = X @ true_coefficients.T + noise
# Convert rewards to probabilities (between 0 and 1)
reply_probabilities = 1 / (1 + np.exp(-rewards))
# Generate actual replies (binary rewards) based on probabilities
y = np.random.binomial(1, reply_probabilities)
# Create a DataFrame to store the dataset
columns = [f'feature_{i+1}' for i in range(n_features)] + [f'message_{i+1}' for i in range(n_messages)]
data = np.hstack((X, y))
df = pd.DataFrame(data, columns=columns)
# Display the first few rows of the dataset
df.head()
feature_1 | feature_2 | feature_3 | feature_4 | message_1 | message_2 | message_3 | message_4 | message_5 | |
---|---|---|---|---|---|---|---|---|---|
0 | 0.374540 | 0.950714 | 0.731994 | 0.598658 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 |
1 | 0.156019 | 0.155995 | 0.058084 | 0.866176 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 |
2 | 0.601115 | 0.708073 | 0.020584 | 0.969910 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 |
3 | 0.832443 | 0.212339 | 0.181825 | 0.183405 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 |
4 | 0.304242 | 0.524756 | 0.431945 | 0.291229 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 |
Step 2: Test the Contextual Bandit Model Assuming your team’s model is implemented as a function contextual_bandit_predict(X), which takes customer features and predicts the best message, we can simulate running the model on this dataset.
def contextual_bandit_predict(X):
"""
Dummy implementation for the sake of testing.
Replace this with your team's actual model.
"""
# For simplicity, let's assume it picks the message with the highest predicted probability
predicted_rewards = X @ true_coefficients.T
return np.argmax(predicted_rewards, axis=1)
# Simulate running the bandit model
predicted_messages = contextual_bandit_predict(X)
# Calculate the actual rewards for the predicted messages
actual_rewards = [y[i, predicted_messages[i]] for i in range(n_customers)]
# Evaluate the performance: average reward
average_reward = np.mean(actual_rewards)
print(f'Average Reward: {average_reward:.4f}')
Average Reward: 0.7440
The higher the average reward (at least above random = 0.5) the better