Part IX: Advanced Topics

Chapter 26: Recommender Systems

Reading Time: 3.5 hours | Prerequisites: Ch 15 (Neural Networks), Ch 12 (Dimensionality Reduction)

1. Learning Objectives

Welcome to Chapter 26! Recommender systems are the unseen engines driving the modern digital economy. By the end of this comprehensive chapter, you will be able to:

📚 Exam Tip

When studying for university exams or interviews, focus heavily on the difference between Explicit Feedback (ratings, reviews) and Implicit Feedback (clicks, watch time, purchases). Formulating matrix factorization for implicit feedback is a highly tested concept!

2. Introduction: The Era of Information Overload

Imagine walking into a library that contains every book ever written, but there are no shelves, no catalogs, and no librarians. You just see a mountain of paper. How do you find a book you'd like? This is the digital dilemma. The internet has infinite shelf space, leading to Information Overload.

A Recommender System (RecSys) is an information filtering system that predicts the "rating" or "preference" a user would give to an item. They are the primary catalyst for user retention and revenue generation in modern tech companies. According to industry reports, recommendations drive 35% of Amazon's sales and over 75% of what people watch on Netflix.

The Long Tail Phenomenon

In traditional retail, physical shelf space is expensive. Stores only stock "blockbuster" items—the head of the distribution. However, the internet allows for the stocking of millions of niche items. Recommender systems help users discover these niche items, which collectively make up the "Long Tail," often yielding more total sales than the blockbusters.

💡 Professor's Insight

Recommender systems fundamentally shift the economy from a scarcity mindset to an abundance mindset. Without ML, platforms with millions of items would collapse under their own weight. The algorithm becomes the digital curator.

3. Historical Background

The evolution of recommender systems closely mirrors the evolution of the internet itself, transitioning from simple manual curation to complex deep learning pipelines.

4. Conceptual Explanation

At a high level, recommenders predict the missing entries in a massive User-Item interaction matrix. Let's explore the core paradigms used to solve this.

4.1. Content-Based Filtering (CBF)

Content-based systems recommend items similar to those a user has liked in the past, based on item attributes. If you watch a lot of Sci-Fi movies directed by Christopher Nolan, the system will recommend other Sci-Fi movies or Nolan films.

4.2. Collaborative Filtering (CF)

CF relies entirely on past user-item interactions. It assumes that if users agreed in the past, they will agree in the future.

4.3. Matrix Factorization

A sophisticated form of CF. It decomposes the large, sparse user-item matrix into two smaller, dense matrices: a User Latent Matrix and an Item Latent Matrix. These latent factors automatically discover abstract concepts (like "action-packed" or "comedy") without explicit labels.

4.4. The Cold Start Problem

What happens when a brand new user joins, or a new movie is uploaded? CF fails because there are no interactions.

⚠️ Industry Alert

In production, nobody uses just one approach. Modern systems are Hybrids. A standard pipeline uses Collaborative Filtering for candidate generation (fetching the top 1000 items), and a complex Deep Learning model involving Content features for the final Ranking (sorting those 1000 items for the UI).

5. Mathematical Foundation

Let $R$ be the user-item interaction matrix of size $m \times n$ (where $m$ is users, $n$ is items). $r_{ui}$ is the rating given by user $u$ to item $i$.

TF-IDF (Content-Based)

To represent items as vectors based on textual content (e.g., plot summaries), we use Term Frequency-Inverse Document Frequency.

$$ \text{TF}(t, d) = \frac{\text{Count of term } t \text{ in document } d}{\text{Total terms in document } d} $$ $$ \text{IDF}(t) = \log \left( \frac{N}{\text{Number of documents containing term } t} \right) $$ $$ \text{TF-IDF}(t, d) = \text{TF}(t, d) \times \text{IDF}(t) $$

Similarity Metrics

Once users or items are vectors, we measure distance. Cosine similarity is the gold standard.

$$ \text{Cosine Similarity}(A, B) = \frac{A \cdot B}{||A|| \times ||B||} = \frac{\sum_{i=1}^{n} A_i B_i}{\sqrt{\sum_{i=1}^{n} A_i^2} \sqrt{\sum_{i=1}^{n} B_i^2}} $$

Matrix Factorization Objective

We want to find user matrix $P$ ($m \times k$) and item matrix $Q$ ($n \times k$) such that their dot product approximates the true ratings $R$. We minimize the squared error with L2 regularization to prevent overfitting.

$$ \min_{P, Q} \sum_{(u,i) \in K} (r_{ui} - p_u \cdot q_i^T)^2 + \lambda (||p_u||^2 + ||q_i||^2) $$

Where $K$ is the set of observed ratings, $k$ is the number of latent dimensions, and $\lambda$ is the regularization penalty.

6. Formula Derivations

How do we actually find the matrices $P$ and $Q$ from Section 5? We cannot use analytical solvers easily because the matrix $R$ is incredibly sparse (often 99% empty). Instead, we use iterative optimization algorithms.

6.1. Stochastic Gradient Descent (FunkSVD)

Simon Funk famously used this approach during the Netflix Prize. We calculate the prediction error for a specific rating:

$$ e_{ui} = r_{ui} - p_u \cdot q_i^T $$

We want to minimize the regularized loss $L$. We take the partial derivative of $L$ with respect to a single user parameter $p_{uk}$ and item parameter $q_{ik}$:

$$ \frac{\partial L}{\partial p_{uk}} = -2e_{ui}q_{ik} + 2\lambda p_{uk} $$ $$ \frac{\partial L}{\partial q_{ik}} = -2e_{ui}p_{uk} + 2\lambda q_{ik} $$

We then update the parameters in the opposite direction of the gradient (where $\gamma$ is the learning rate):

$$ p_{uk} \leftarrow p_{uk} + \gamma (e_{ui}q_{ik} - \lambda p_{uk}) $$ $$ q_{ik} \leftarrow q_{ik} + \gamma (e_{ui}p_{uk} - \lambda q_{ik}) $$

6.2. Alternating Least Squares (ALS)

ALS is preferred when we have implicit feedback (like clicks) and the data is massively distributed (e.g., using Apache Spark). Since both $P$ and $Q$ are unknown, the loss function is non-convex. But if we fix $Q$ as a constant, the function becomes convex quadratic with respect to $P$, and vice versa.

Step 1: Fix $Q$, take the derivative with respect to $p_u$, and set it to zero. Solve analytically for $p_u$.

$$ p_u = (Q^T Q + \lambda I)^{-1} Q^T R_u $$

Step 2: Fix $P$, solve analytically for $q_i$.

$$ q_i = (P^T P + \lambda I)^{-1} P^T R_i $$

We alternate between Step 1 and Step 2 until convergence.

7. Worked Numerical Examples

Let's manually compute a User-Based Collaborative Filtering prediction. Suppose we have 3 users and 3 movies. Ratings are out of 5.

User Movie A (Sci-Fi) Movie B (Action) Movie C (Romance)
Alice 5 4 ?
Bob 4 5 1
Charlie 1 2 5

Goal: Predict Alice's rating for Movie C.

Step 1: Compute User Similarities (Cosine Similarity) between Alice and others based on common items (A and B).

Cosine(Alice, Bob) = $(5 \times 4 + 4 \times 5) / (\sqrt{5^2 + 4^2} \times \sqrt{4^2 + 5^2}) = 40 / (6.4 \times 6.4) \approx 0.97$

Cosine(Alice, Charlie) = $(5 \times 1 + 4 \times 2) / (\sqrt{41} \times \sqrt{5}) = 13 / (6.4 \times 2.23) \approx 0.91$

Step 2: Predict using weighted average of ratings for Movie C.

$$ \text{Prediction} = \frac{\sum (\text{Similarity} \times \text{Rating})}{\sum |\text{Similarity}|} $$ $$ \text{Pred(Alice, C)} = \frac{(0.97 \times 1) + (0.91 \times 5)}{0.97 + 0.91} = \frac{0.97 + 4.55}{1.88} = \frac{5.52}{1.88} \approx 2.93 $$

Alice is predicted to give Movie C a ~2.9 rating, which makes sense as she is more similar to Bob (who hated it) than Charlie (who loved it), but the high similarity to Charlie pulls the average up slightly.

8. Visual Diagrams (ASCII Art)

Visualizing Matrix Factorization. We break a sparse matrix $R$ into dense $P$ and $Q^T$.


    [  Users × Items Matrix (R) ]          [ User Matrix (P) ]     [ Item Matrix Transposed (Q^T) ]
      (e.g., 4 users, 5 items)               (4 users, 2 latent factors)    (2 factors, 5 items)

        i1  i2  i3  i4  i5                        k1   k2                   i1  i2  i3  i4  i5
      +--------------------+                    +-------+                 +--------------------+
   u1 | 5   ?   4   ?   1  |                 u1 | 1.2  0.8|            k1 | 2.1 0.4 -1.1 0.9 0.2 |
   u2 | ?   ?   ?   2   5  |      ≈          u2 | -0.5 2.1|     ×      k2 | 0.8 1.5  2.2 0.3 1.9 |
   u3 | 3   1   ?   ?   ?  |                 u3 | 0.9 -0.2|                 +--------------------+
   u4 | ?   5   ?   4   ?  |                 u4 | 1.1  1.5|
      +--------------------+                    +-------+

The prediction for user 1, item 2 (u1, i2) is computed as:
Pred(u1, i2) = (1.2 * 0.4) + (0.8 * 1.5) = 0.48 + 1.20 = 1.68
💻 Code Challenge

Try to mentally calculate the predicted rating for User 3 and Item 1 (u3, i1) using the matrices above. Answer: (0.9 * 2.1) + (-0.2 * 0.8) = 1.89 - 0.16 = 1.73.

9. Flowcharts (ASCII Art)

Modern Large-Scale Recommender Architecture (The Two-Tower / Multi-Stage approach):


 +------------------+
 |   User Request   | (User ID, Context, Time)
 +--------+---------+
          |
          v
 +------------------+      Millions of items in Database
 | 1. Candidate     | <--- Filter down to ~1,000 items
 |    Generation    |      (Uses Fast CF, SVD, or Two-Tower ANN)
 +--------+---------+
          |
          v
 +------------------+      Hundreds of items
 | 2. Feature       | <--- Add heavy features (User Demographics,
 |    Engineering   |      Item Text, Real-time engagement stats)
 +--------+---------+
          |
          v
 +------------------+
 | 3. Scoring /     | <--- Heavy Deep Learning Model (NCF, DLRM)
 |    Ranking       |      Assigns probability/score to each item
 +--------+---------+
          |
          v
 +------------------+      Top 10-50 Items
 | 4. Re-Ranking /  | <--- Apply Business Logic, Diversity Filters,
 |    Filtering     |      Remove previously watched items
 +--------+---------+
          |
          v
 +------------------+
 | Final UI Render  |
 +------------------+

10. Python Implementation (From Scratch)

Let's build a simple Content-Based and Collaborative Filtering system using pure Pandas and Numpy.


import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer

# --- 1. Content-Based Filtering ---
print("--- Content-Based Filtering ---")
movies = pd.DataFrame({
    'movie_id': [1, 2, 3],
    'title': ['Interstellar', 'The Matrix', 'The Notebook'],
    'plot': ['Space travel black hole', 'Hacker discovers reality simulation', 'Poor boy rich girl romance']
})

# Calculate TF-IDF
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(movies['plot'])

# Compute Cosine Similarity between movies
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

def recommend_cbf(title, cosine_sim_matrix, df):
    idx = df.index[df['title'] == title].tolist()[0]
    sim_scores = list(enumerate(cosine_sim_matrix[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    # Get top recommendation (excluding itself)
    top_idx = sim_scores[1][0]
    return df['title'].iloc[top_idx]

print(f"If you liked 'Interstellar', you might like: {recommend_cbf('Interstellar', cosine_sim, movies)}")


# --- 2. User-Based Collaborative Filtering ---
print("\n--- Collaborative Filtering ---")
ratings_dict = {
    'User': ['Alice', 'Alice', 'Bob', 'Bob', 'Charlie', 'Charlie'],
    'Movie': ['M1', 'M2', 'M1', 'M2', 'M2', 'M3'],
    'Rating': [5, 4, 4, 5, 2, 5]
}
df = pd.DataFrame(ratings_dict)
user_item_matrix = df.pivot_table(index='User', columns='Movie', values='Rating').fillna(0)
print("User-Item Matrix:\n", user_item_matrix)

# Compute User Similarity
user_sim = cosine_similarity(user_item_matrix)
user_sim_df = pd.DataFrame(user_sim, index=user_item_matrix.index, columns=user_item_matrix.index)

print("\nUser Similarity Matrix:\n", user_sim_df)

11. TensorFlow Implementation (NCF)

Neural Collaborative Filtering replaces the inner product of Matrix Factorization with a neural architecture that can learn arbitrary non-linear interactions.


import tensorflow as tf
from tensorflow.keras.layers import Input, Embedding, Flatten, Dense, Concatenate
from tensorflow.keras.models import Model

def build_ncf_model(num_users, num_items, latent_dim=8):
    # Inputs
    user_input = Input(shape=(1,), name='user_input')
    item_input = Input(shape=(1,), name='item_input')

    # Embeddings (equivalent to Latent Factors P and Q)
    user_embedding = Embedding(num_users, latent_dim, name='user_emb')(user_input)
    item_embedding = Embedding(num_items, latent_dim, name='item_emb')(item_input)

    # Flatten embeddings
    user_vec = Flatten()(user_embedding)
    item_vec = Flatten()(item_embedding)

    # Concatenate user and item vectors
    concat = Concatenate()([user_vec, item_vec])

    # Deep Neural Network Layers
    fc1 = Dense(32, activation='relu')(concat)
    fc2 = Dense(16, activation='relu')(fc1)
    fc3 = Dense(8, activation='relu')(fc2)

    # Output layer (1 neuron predicting rating)
    output = Dense(1, activation='linear', name='rating_prediction')(fc3)

    model = Model(inputs=[user_input, item_input], outputs=output)
    model.compile(optimizer='adam', loss='mse', metrics=['mae'])
    
    return model

# Assume we have 1000 users and 5000 items
model = build_ncf_model(num_users=1000, num_items=5000)
model.summary()

# Training would look like:
# model.fit([train_user_ids, train_item_ids], train_ratings, epochs=5, batch_size=64)

12. Scikit-Learn and Surprise Pipeline

In practice, building CF algorithms from scratch is inefficient. The scikit-surprise library is the standard in Python for classical recommender systems.


# pip install scikit-surprise
from surprise import Dataset, Reader, SVD
from surprise.model_selection import cross_validate

# 1. Load built-in MovieLens 100K dataset
data = Dataset.load_builtin('ml-100k')

# 2. Initialize the SVD algorithm (Matrix Factorization)
algo = SVD(n_factors=100, n_epochs=20, lr_all=0.005, reg_all=0.02)

# 3. Run 5-fold cross-validation and print results
results = cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

# 4. Train on full dataset and predict
trainset = data.build_full_trainset()
algo.fit(trainset)

# Predict rating for User '196' and Item '302'
pred = algo.predict('196', '302')
print(f"\nPredicted rating: {pred.est:.2f}")

13. Indian Case Studies

🇮🇳 India Spotlight: Localizing Recommendations at Scale

India presents unique challenges for recommender systems due to its vast demographic diversity, linguistic variations, and varying internet speeds.

13.1. Flipkart: Cross-Lingual Product Recommendations

Flipkart caters to millions of users in Tier 2 and Tier 3 cities who often search in vernacular languages or "Hinglish". Their recommendation engine relies heavily on Knowledge Graphs and Multilingual Embeddings (like mBERT). If a user searches for "jute bags", the system maps it to "bori" or "thaila" in local contexts. Moreover, Flipkart adjusts recommendations based on the user's phone model and network speed, suggesting lighter apps or fewer images for low-bandwidth users.

13.2. Hotstar: Handling IPL Traffic Spikes

During the Indian Premier League (IPL), Disney+ Hotstar experiences unprecedented concurrency (often over 25+ million simultaneous viewers). Their recommendation system for VOD (Video on Demand) must gracefully degrade. They pre-compute a massive set of item-item similarities using ALS (Alternating Least Squares) offline and serve these pre-computed recommendations during high-traffic windows via fast Redis caches, rather than evaluating deep neural networks in real-time.

13.3. Spotify India: Hyper-Local Music Discovery

When Spotify entered India, they had to tackle the cold start problem for regional music (Punjabi, Tamil, Telugu). They created hybrid models combining acoustic features of the songs (Content-Based) with the listening habits of early adopters (Collaborative). Their "Punjabi 101" and "Bollywood Mush" playlists are curated using a mix of editorial insight and heavy algorithmic collaborative filtering.

14. Global Case Studies

14.1. The Netflix Prize

In 2006, Netflix released 100 million anonymous movie ratings and offered $1M to anyone who could improve their algorithm (Cinematch) by 10%. The competition ran for 3 years. The winning team, BellKor's Pragmatic Chaos, utilized a massive ensemble of 107 different algorithmic models. The key breakthrough was the inclusion of Temporal Dynamics—recognizing that user ratings shift over time (e.g., someone's taste in 2005 vs 2009) and that the baseline rating of a movie can change depending on when it was rated.

14.2. Amazon: Item-to-Item Collaborative Filtering

Amazon realized early on that computing user-user similarities on a matrix of millions of users was computationally unfeasible in real-time. Instead, they pre-computed item-item similarity offline. Because items (a toaster) don't change their characteristics rapidly, this matrix is stable. When you view a toaster, Amazon simply looks up the pre-computed row for that toaster and recommends the top items. This approach scales logarithmically and has been the backbone of modern e-commerce.

14.3. YouTube: Deep Neural Networks for Recommendations (2016)

YouTube processes 500 hours of video uploaded every minute. They formalized the Two-Stage Recommender Pipeline. The first stage (Candidate Generation) takes the user's history and context, and uses an extremely fast multi-class classifier to select hundreds of videos from a corpus of billions. The second stage (Ranking) uses a heavier Deep Neural Network with rich features (time since last watch, language, demographic) to assign a score and rank the final few dozen videos shown to the user.

15. Startup Applications

Many modern startups pivot their entire business model around superior recommendation engines.

16. Government Applications

Recommender systems are increasingly used in e-governance to improve citizen engagement and resource allocation.

17. Industry Applications Matrix

Here is a summary of how different industries leverage RecSys:

Industry Primary Algorithm Used Key Metric / Goal
E-Commerce (Amazon) Item-Item CF, Association Rules Conversion Rate, Average Order Value (AOV)
Streaming (Netflix, Spotify) Matrix Factorization, Deep Learning Watch Time, Monthly Active Users (MAU)
Social Media (Instagram, TikTok) Session-Based RNNs, Reinforcement Learning Session Length, Engagement (Likes/Shares)
Job Portals (LinkedIn, Naukri) Content-Based, Knowledge Graphs Click-Through Rate (CTR) on Job Posts

18. Mini Projects

🚀 Career Path: Portfolio Builders

To land a role as a Machine Learning Engineer specializing in personalization, implement these projects and host them on Streamlit.

Project 1: The Classic Movie Recommender

Goal: Build a web app that takes a user's favorite movies and returns 5 recommendations.

Project 2: E-commerce "Frequently Bought Together" Engine

Goal: Mine transaction logs to find item associations.

Project 3: Session-Based News Recommender

Goal: Recommend the next news article a user will click in their current session, without knowing their long-term history.

19. Exercises

Test your practical understanding. Try to solve these on paper or in a Jupyter Notebook.

  1. Compute the Cosine Similarity between Item A [1, 0, 1, 1] and Item B [0, 1, 1, 1].
  2. Write a Python function to compute the Pearson Correlation Coefficient between two arrays.
  3. Explain why Pearson Correlation is sometimes preferred over Cosine Similarity for user ratings. (Hint: Mean centering).
  4. Given a 5x5 rating matrix with 10 known ratings, calculate the sparsity of the matrix.
  5. Perform 1 iteration of Alternating Least Squares (ALS) by hand for a 2x2 matrix.
  6. Implement the TF-IDF formula from scratch without using Scikit-Learn.
  7. How does adding an L2 regularization term to FunkSVD prevent overfitting?
  8. Design a database schema for an e-commerce catalog to support fast Content-Based Filtering.
  9. What is the Cold Start Problem, and name two ways to overcome it for a new user.
  10. What is the Cold Start Problem for a new item? How does Multi-Armed Bandit help?
  11. Implement a basic $\epsilon$-greedy algorithm for recommending 5 new articles.
  12. What is the difference between Explicit Feedback and Implicit Feedback? Give 3 examples of each.
  13. Why is Root Mean Squared Error (RMSE) misleading when evaluating implicit feedback models?
  14. Define Precision@K and Recall@K. Calculate them for a system that recommended 10 items, 3 of which the user actually clicked.
  15. Explain NDCG (Normalized Discounted Cumulative Gain). Why is order important?
  16. If a recommender only suggests popular items, what problem does it create in the ecosystem?
  17. Design an A/B testing framework to compare two different recommendation models on a website.
  18. What is a Two-Tower Neural Network architecture? Why is it efficient for Candidate Generation?
  19. Explain the concept of 'Filter Bubble' in social media feeds.
  20. How can Knowledge Graphs enhance recommendations over standard Collaborative Filtering?

20. Multiple Choice Questions

1. Which algorithm is best suited when you only have implicit feedback (clicks, views) and large, sparse datasets?

  • A) K-Nearest Neighbors
  • B) Alternating Least Squares (ALS)
  • C) Pearson Correlation
  • D) TF-IDF
Correct Answer: B. ALS is specifically designed to handle implicit feedback efficiently, especially in distributed environments like Spark.

2. In Content-Based Filtering, what does IDF stand for?

  • A) Inverse Document Frequency
  • B) Internal Data Format
  • C) Item-Document Frequency
  • D) Inverse Distribution Factor
Correct Answer: A. Inverse Document Frequency penalizes words that appear in too many documents.

3. The Netflix Prize famously utilized which specific variation of Matrix Factorization?

  • A) Principal Component Analysis
  • B) Singular Value Decomposition (FunkSVD)
  • C) Non-Negative Matrix Factorization
  • D) Independent Component Analysis
Correct Answer: B. FunkSVD uses stochastic gradient descent to approximate the matrix ignoring missing values.

4. Which metric heavily penalizes a system if a highly relevant item is placed at rank 10 instead of rank 1?

  • A) Precision@K
  • B) RMSE
  • C) NDCG
  • D) Recall
Correct Answer: C. NDCG (Normalized Discounted Cumulative Gain) uses a logarithmic discount factor based on the rank position.

5. What is the primary disadvantage of User-Based Collaborative Filtering compared to Item-Based CF in large e-commerce sites?

  • A) It doesn't use ratings.
  • B) Users change preferences faster than items change features, making the matrix unstable.
  • C) It requires deep learning.
  • D) It cannot recommend new items.
Correct Answer: B. Item-item matrices are much more stable and can be pre-computed offline.

6. A user just signed up and hasn't rated anything. This is known as:

  • A) The Sparse Matrix Problem
  • B) The Filter Bubble
  • C) The Cold Start Problem
  • D) The Long Tail
Correct Answer: C. The Cold Start Problem.

7. In Neural Collaborative Filtering (NCF), what replaces the standard dot product used in Matrix Factorization?

  • A) Convolutional Layers
  • B) A Multi-Layer Perceptron (MLP)
  • C) Recurrent Neural Networks
  • D) TF-IDF Vectors
Correct Answer: B. An MLP is used to learn arbitrary non-linear interactions between user and item embeddings.

8. What is "Serendipity" in the context of recommender systems?

  • A) Recommending the most popular items.
  • B) Recommending items exactly similar to past history.
  • C) The ability of the system to recommend surprising, yet appealing items.
  • D) The speed of the recommendation algorithm.
Correct Answer: C. Serendipity helps break the filter bubble and improves long-term user satisfaction.

9. In a Two-Stage architecture (like YouTube's), what is the goal of the first stage?

  • A) To perfectly rank 10 items.
  • B) To quickly reduce the corpus from millions to a few hundred candidates.
  • C) To extract image features from videos.
  • D) To compute the final NDCG score.
Correct Answer: B. Candidate Generation focuses on high recall and extremely fast inference.

10. Which technique is used to evaluate a recommender system's impact on actual business metrics (like revenue) in production?

  • A) Cross-validation
  • B) RMSE calculation
  • C) A/B Testing
  • D) Leave-one-out Evaluation
Correct Answer: C. A/B testing randomly routes users to different models to measure real-world performance.

21. Interview Questions

Top product companies (Amazon, Meta, Netflix) frequently ask these conceptual and architectural questions.

  1. Design Amazon's recommender system: Walk through candidate generation, ranking, and the use of item-item collaborative filtering.
  2. Handling implicit feedback: If a user watches 5 minutes of a 2-hour movie, is that a positive or negative signal? How do you model this mathematically?
  3. Cold Start Mitigation: You are launching a brand new music app. You have 0 users. How do you generate the very first recommendations?
  4. Matrix Factorization vs Deep Learning: When would you choose standard ALS over Neural Collaborative Filtering? Explain the trade-offs in latency and accuracy.
  5. Diversity vs Relevance: How do you modify an algorithm to ensure that the top 5 recommended news articles are not all about the exact same topic?
  6. Metrics interpretation: Your offline NDCG went up by 5%, but in online A/B testing, the click-through rate dropped. What could have caused this?
  7. Real-time updates: How do you update user embeddings in real-time as they are actively clicking items in their current session?
  8. Explain the Math: Write down the loss function for Matrix Factorization with L2 regularization and derive the gradient update rule on the whiteboard.
  9. Scale: How do you compute the nearest neighbors for a user when there are 100 million items? (Hint: Approximate Nearest Neighbors, FAISS, ScaNN).
  10. Bias and Fairness: How do you ensure your job recommendation algorithm doesn't inadvertently discriminate based on gender or geography?

22. Research Problems

For those looking to pursue a Master's or PhD in Recommender Systems, here are some cutting-edge open problems:

23. Key Takeaways

24. References & Further Reading

Appendix A: Complete Hybrid Recommender Source Code

For students wanting to build a complete production-grade system, here is an extended Python implementation of a Hybrid Recommender System combining Content-Based and Collaborative Filtering using object-oriented principles. This is meant to serve as a comprehensive reference.


# ==========================================
# HYBRID RECOMMENDER SYSTEM PIPELINE
# ==========================================
import numpy as np
import pandas as pd
from scipy.sparse import csr_matrix
from sklearn.neighbors import NearestNeighbors
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class HybridRecommender:
    """
    A robust Hybrid Recommender combining Item-Item Collaborative Filtering
    with Content-Based Filtering using TF-IDF on item metadata.
    """
    def __init__(self, cf_weight=0.7, cb_weight=0.3):
        self.cf_weight = cf_weight
        self.cb_weight = cb_weight
        self.item_factors = None
        self.user_factors = None
        self.tfidf_matrix = None
        self.cosine_sim = None
        self.model_knn = NearestNeighbors(metric='cosine', algorithm='brute', n_neighbors=20, n_jobs=-1)
        self.item_mapper = {}
        self.item_inv_mapper = {}
        self.user_mapper = {}
        self.user_inv_mapper = {}
        
    def fit_collaborative(self, ratings_df, user_col='userId', item_col='movieId', rating_col='rating'):
        """ Fits the collaborative filtering model using k-NN on the sparse user-item matrix. """
        logger.info("Fitting Collaborative Filtering model...")
        
        # Mapping IDs to continuous indices
        unique_users = ratings_df[user_col].unique()
        unique_items = ratings_df[item_col].unique()
        
        self.user_mapper = {user_id: i for i, user_id in enumerate(unique_users)}
        self.item_mapper = {item_id: i for i, item_id in enumerate(unique_items)}
        self.user_inv_mapper = {i: user_id for i, user_id in enumerate(unique_users)}
        self.item_inv_mapper = {i: item_id for i, item_id in enumerate(unique_items)}
        
        user_indices = [self.user_mapper[i] for i in ratings_df[user_col]]
        item_indices = [self.item_mapper[i] for i in ratings_df[item_col]]
        
        self.user_item_matrix = csr_matrix((ratings_df[rating_col], (user_indices, item_indices)),
                                           shape=(len(unique_users), len(unique_items)))
        
        self.model_knn.fit(self.user_item_matrix.T)
        logger.info("Collaborative Filtering model fitted successfully.")

    def fit_content_based(self, items_df, item_col='movieId', text_col='description'):
        """ Fits the content-based model using TF-IDF on item descriptions. """
        logger.info("Fitting Content-Based Filtering model...")
        tfidf = TfidfVectorizer(stop_words='english', max_features=5000)
        
        # Ensure items align with the CF matrix
        items_df['mapped_id'] = items_df[item_col].map(self.item_mapper)
        items_df = items_df.dropna(subset=['mapped_id']).sort_values('mapped_id')
        
        self.tfidf_matrix = tfidf.fit_transform(items_df[text_col])
        self.cosine_sim = linear_kernel(self.tfidf_matrix, self.tfidf_matrix)
        logger.info("Content-Based Filtering model fitted successfully.")
        
    def get_cf_recommendations(self, item_id, n_recommendations=10):
        if item_id not in self.item_mapper:
            return {}
        idx = self.item_mapper[item_id]
        distances, indices = self.model_knn.kneighbors(self.user_item_matrix.T[idx], n_neighbors=n_recommendations+1)
        
        raw_recommends = \
            sorted(list(zip(indices.squeeze().tolist(), distances.squeeze().tolist())), key=lambda x: x[1])[:0:-1]
        
        # Convert distances to similarity scores (1 - distance)
        return {self.item_inv_mapper[idx]: (1 - dist) for idx, dist in raw_recommends}
        
    def get_cb_recommendations(self, item_id, n_recommendations=10):
        if item_id not in self.item_mapper:
            return {}
        idx = self.item_mapper[item_id]
        sim_scores = list(enumerate(self.cosine_sim[idx]))
        sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
        sim_scores = sim_scores[1:n_recommendations+1]
        
        return {self.item_inv_mapper[i]: score for i, score in sim_scores}

    def recommend(self, item_id, n_recommendations=10):
        logger.info(f"Generating hybrid recommendations for item {item_id}")
        cf_preds = self.get_cf_recommendations(item_id, n_recommendations * 2)
        cb_preds = self.get_cb_recommendations(item_id, n_recommendations * 2)
        
        hybrid_scores = {}
        all_items = set(cf_preds.keys()).union(set(cb_preds.keys()))
        
        for item in all_items:
            cf_score = cf_preds.get(item, 0.0)
            cb_score = cb_preds.get(item, 0.0)
            hybrid_scores[item] = (cf_score * self.cf_weight) + (cb_score * self.cb_weight)
            
        sorted_hybrid = sorted(hybrid_scores.items(), key=lambda x: x[1], reverse=True)
        return sorted_hybrid[:n_recommendations]

# Example Usage Block
if __name__ == "__main__":
    # Dummy data generation for testing the HybridRecommender
    np.random.seed(42)
    users = np.random.randint(1, 1000, size=5000)
    items = np.random.randint(1, 500, size=5000)
    ratings = np.random.randint(1, 6, size=5000)
    
    ratings_df = pd.DataFrame({'userId': users, 'movieId': items, 'rating': ratings})
    
    # Generate dummy item metadata
    unique_items_df = pd.DataFrame({'movieId': np.unique(items)})
    vocab = ["action", "romance", "space", "alien", "comedy", "drama", "thriller", "heist", "magic", "historical"]
    descriptions = [" ".join(np.random.choice(vocab, size=5)) for _ in range(len(unique_items_df))]
    unique_items_df['description'] = descriptions
    
    recommender = HybridRecommender(cf_weight=0.6, cb_weight=0.4)
    recommender.fit_collaborative(ratings_df)
    recommender.fit_content_based(unique_items_df)
    
    sample_item = unique_items_df['movieId'].iloc[0]
    recs = recommender.recommend(sample_item, n_recommendations=5)
    print(f"Top 5 Hybrid Recommendations for item {sample_item}:")
    for item, score in recs:
        print(f"Item ID: {item}, Score: {score:.4f}")

Appendix B: Sample Clickstream Dataset (JSON)

Below is a simulated dataset of 500 user interaction logs. This represents the raw implicit feedback that is typically ingested by Apache Kafka in a production recommender system. You can copy this data to test your algorithms.


[
  {"user_id": "U721", "session_id": "S1001", "item_id": "I042", "timestamp": "2023-10-01T08:12:34Z", "event_type": "view", "duration_sec": 45},
  {"user_id": "U721", "session_id": "S1001", "item_id": "I089", "timestamp": "2023-10-01T08:14:02Z", "event_type": "click", "duration_sec": 120},
  {"user_id": "U314", "session_id": "S1002", "item_id": "I042", "timestamp": "2023-10-01T08:15:10Z", "event_type": "purchase", "duration_sec": 300},
  {"user_id": "U089", "session_id": "S1003", "item_id": "I112", "timestamp": "2023-10-01T08:16:05Z", "event_type": "view", "duration_sec": 12},
  {"user_id": "U555", "session_id": "S1004", "item_id": "I999", "timestamp": "2023-10-01T08:20:00Z", "event_type": "add_to_cart", "duration_sec": 40},
  {"user_id": "U721", "session_id": "S1001", "item_id": "I042", "timestamp": "2023-10-01T08:21:34Z", "event_type": "view", "duration_sec": 145},
  {"user_id": "U721", "session_id": "S1001", "item_id": "I089", "timestamp": "2023-10-01T08:24:02Z", "event_type": "click", "duration_sec": 20},
  {"user_id": "U314", "session_id": "S1002", "item_id": "I042", "timestamp": "2023-10-01T08:25:10Z", "event_type": "purchase", "duration_sec": 350},
  {"user_id": "U089", "session_id": "S1003", "item_id": "I112", "timestamp": "2023-10-01T08:26:05Z", "event_type": "view", "duration_sec": 10},
  {"user_id": "U555", "session_id": "S1004", "item_id": "I999", "timestamp": "2023-10-01T08:30:00Z", "event_type": "add_to_cart", "duration_sec": 60},
  {"user_id": "U123", "session_id": "S1005", "item_id": "I101", "timestamp": "2023-10-01T08:31:00Z", "event_type": "view", "duration_sec": 5},
  {"user_id": "U123", "session_id": "S1005", "item_id": "I102", "timestamp": "2023-10-01T08:32:00Z", "event_type": "view", "duration_sec": 8},
  {"user_id": "U123", "session_id": "S1005", "item_id": "I103", "timestamp": "2023-10-01T08:35:00Z", "event_type": "click", "duration_sec": 45},
  {"user_id": "U444", "session_id": "S1006", "item_id": "I201", "timestamp": "2023-10-01T08:40:00Z", "event_type": "purchase", "duration_sec": 120},
  {"user_id": "U444", "session_id": "S1006", "item_id": "I202", "timestamp": "2023-10-01T08:42:00Z", "event_type": "view", "duration_sec": 15},
  {"user_id": "U999", "session_id": "S1007", "item_id": "I301", "timestamp": "2023-10-01T08:45:00Z", "event_type": "add_to_cart", "duration_sec": 90},
  {"user_id": "U999", "session_id": "S1007", "item_id": "I302", "timestamp": "2023-10-01T08:48:00Z", "event_type": "view", "duration_sec": 22},
  {"user_id": "U888", "session_id": "S1008", "item_id": "I401", "timestamp": "2023-10-01T08:50:00Z", "event_type": "click", "duration_sec": 33},
  {"user_id": "U888", "session_id": "S1008", "item_id": "I402", "timestamp": "2023-10-01T08:55:00Z", "event_type": "purchase", "duration_sec": 210},
  {"user_id": "U777", "session_id": "S1009", "item_id": "I501", "timestamp": "2023-10-01T09:00:00Z", "event_type": "view", "duration_sec": 7},
  {"user_id": "U777", "session_id": "S1009", "item_id": "I502", "timestamp": "2023-10-01T09:05:00Z", "event_type": "add_to_cart", "duration_sec": 55},
  {"user_id": "U666", "session_id": "S1010", "item_id": "I601", "timestamp": "2023-10-01T09:10:00Z", "event_type": "click", "duration_sec": 110},
  {"user_id": "U666", "session_id": "S1010", "item_id": "I602", "timestamp": "2023-10-01T09:15:00Z", "event_type": "view", "duration_sec": 18},
  {"user_id": "U555", "session_id": "S1011", "item_id": "I701", "timestamp": "2023-10-01T09:20:00Z", "event_type": "purchase", "duration_sec": 400},
  {"user_id": "U555", "session_id": "S1011", "item_id": "I702", "timestamp": "2023-10-01T09:25:00Z", "event_type": "view", "duration_sec": 25},
  {"user_id": "U444", "session_id": "S1012", "item_id": "I801", "timestamp": "2023-10-01T09:30:00Z", "event_type": "add_to_cart", "duration_sec": 70},
  {"user_id": "U444", "session_id": "S1012", "item_id": "I802", "timestamp": "2023-10-01T09:35:00Z", "event_type": "view", "duration_sec": 12},
  {"user_id": "U333", "session_id": "S1013", "item_id": "I901", "timestamp": "2023-10-01T09:40:00Z", "event_type": "click", "duration_sec": 65},
  {"user_id": "U333", "session_id": "S1013", "item_id": "I902", "timestamp": "2023-10-01T09:45:00Z", "event_type": "purchase", "duration_sec": 280},
  {"user_id": "U222", "session_id": "S1014", "item_id": "I011", "timestamp": "2023-10-01T09:50:00Z", "event_type": "view", "duration_sec": 9},
  {"user_id": "U222", "session_id": "S1014", "item_id": "I012", "timestamp": "2023-10-01T09:55:00Z", "event_type": "add_to_cart", "duration_sec": 45},
  {"user_id": "U111", "session_id": "S1015", "item_id": "I021", "timestamp": "2023-10-01T10:00:00Z", "event_type": "click", "duration_sec": 130},
  {"user_id": "U111", "session_id": "S1015", "item_id": "I022", "timestamp": "2023-10-01T10:05:00Z", "event_type": "view", "duration_sec": 20},
  {"user_id": "U000", "session_id": "S1016", "item_id": "I031", "timestamp": "2023-10-01T10:10:00Z", "event_type": "purchase", "duration_sec": 500},
  {"user_id": "U000", "session_id": "S1016", "item_id": "I032", "timestamp": "2023-10-01T10:15:00Z", "event_type": "view", "duration_sec": 30},
  {"user_id": "U123", "session_id": "S1017", "item_id": "I041", "timestamp": "2023-10-01T10:20:00Z", "event_type": "add_to_cart", "duration_sec": 85},
  {"user_id": "U123", "session_id": "S1017", "item_id": "I042", "timestamp": "2023-10-01T10:25:00Z", "event_type": "view", "duration_sec": 14},
  {"user_id": "U234", "session_id": "S1018", "item_id": "I051", "timestamp": "2023-10-01T10:30:00Z", "event_type": "click", "duration_sec": 75},
  {"user_id": "U234", "session_id": "S1018", "item_id": "I052", "timestamp": "2023-10-01T10:35:00Z", "event_type": "purchase", "duration_sec": 320},
  {"user_id": "U345", "session_id": "S1019", "item_id": "I061", "timestamp": "2023-10-01T10:40:00Z", "event_type": "view", "duration_sec": 11},
  {"user_id": "U345", "session_id": "S1019", "item_id": "I062", "timestamp": "2023-10-01T10:45:00Z", "event_type": "add_to_cart", "duration_sec": 50},
  {"user_id": "U456", "session_id": "S1020", "item_id": "I071", "timestamp": "2023-10-01T10:50:00Z", "event_type": "click", "duration_sec": 140},
  {"user_id": "U456", "session_id": "S1020", "item_id": "I072", "timestamp": "2023-10-01T10:55:00Z", "event_type": "view", "duration_sec": 22},
  {"user_id": "U567", "session_id": "S1021", "item_id": "I081", "timestamp": "2023-10-01T11:00:00Z", "event_type": "purchase", "duration_sec": 600},
  {"user_id": "U567", "session_id": "S1021", "item_id": "I082", "timestamp": "2023-10-01T11:05:00Z", "event_type": "view", "duration_sec": 35},
  {"user_id": "U678", "session_id": "S1022", "item_id": "I091", "timestamp": "2023-10-01T11:10:00Z", "event_type": "add_to_cart", "duration_sec": 95},
  {"user_id": "U678", "session_id": "S1022", "item_id": "I092", "timestamp": "2023-10-01T11:15:00Z", "event_type": "view", "duration_sec": 16},
  {"user_id": "U789", "session_id": "S1023", "item_id": "I101", "timestamp": "2023-10-01T11:20:00Z", "event_type": "click", "duration_sec": 85},
  {"user_id": "U789", "session_id": "S1023", "item_id": "I102", "timestamp": "2023-10-01T11:25:00Z", "event_type": "purchase", "duration_sec": 360},
  {"user_id": "U890", "session_id": "S1024", "item_id": "I111", "timestamp": "2023-10-01T11:30:00Z", "event_type": "view", "duration_sec": 13},
  {"user_id": "U890", "session_id": "S1024", "item_id": "I112", "timestamp": "2023-10-01T11:35:00Z", "event_type": "add_to_cart", "duration_sec": 60},
  {"user_id": "U901", "session_id": "S1025", "item_id": "I121", "timestamp": "2023-10-01T11:40:00Z", "event_type": "click", "duration_sec": 150},
  {"user_id": "U901", "session_id": "S1025", "item_id": "I122", "timestamp": "2023-10-01T11:45:00Z", "event_type": "view", "duration_sec": 24},
  {"user_id": "U012", "session_id": "S1026", "item_id": "I131", "timestamp": "2023-10-01T11:50:00Z", "event_type": "purchase", "duration_sec": 700},
  {"user_id": "U012", "session_id": "S1026", "item_id": "I132", "timestamp": "2023-10-01T11:55:00Z", "event_type": "view", "duration_sec": 40},
  {"user_id": "U123", "session_id": "S1027", "item_id": "I141", "timestamp": "2023-10-01T12:00:00Z", "event_type": "add_to_cart", "duration_sec": 105},
  {"user_id": "U123", "session_id": "S1027", "item_id": "I142", "timestamp": "2023-10-01T12:05:00Z", "event_type": "view", "duration_sec": 18},
  {"user_id": "U234", "session_id": "S1028", "item_id": "I151", "timestamp": "2023-10-01T12:10:00Z", "event_type": "click", "duration_sec": 95},
  {"user_id": "U234", "session_id": "S1028", "item_id": "I152", "timestamp": "2023-10-01T12:15:00Z", "event_type": "purchase", "duration_sec": 400},
  {"user_id": "U345", "session_id": "S1029", "item_id": "I161", "timestamp": "2023-10-01T12:20:00Z", "event_type": "view", "duration_sec": 15},
  {"user_id": "U345", "session_id": "S1029", "item_id": "I162", "timestamp": "2023-10-01T12:25:00Z", "event_type": "add_to_cart", "duration_sec": 70},
  {"user_id": "U456", "session_id": "S1030", "item_id": "I171", "timestamp": "2023-10-01T12:30:00Z", "event_type": "click", "duration_sec": 160},
  {"user_id": "U456", "session_id": "S1030", "item_id": "I172", "timestamp": "2023-10-01T12:35:00Z", "event_type": "view", "duration_sec": 26},
  {"user_id": "U567", "session_id": "S1031", "item_id": "I181", "timestamp": "2023-10-01T12:40:00Z", "event_type": "purchase", "duration_sec": 800},
  {"user_id": "U567", "session_id": "S1031", "item_id": "I182", "timestamp": "2023-10-01T12:45:00Z", "event_type": "view", "duration_sec": 45},
  {"user_id": "U678", "session_id": "S1032", "item_id": "I191", "timestamp": "2023-10-01T12:50:00Z", "event_type": "add_to_cart", "duration_sec": 115},
  {"user_id": "U678", "session_id": "S1032", "item_id": "I192", "timestamp": "2023-10-01T12:55:00Z", "event_type": "view", "duration_sec": 20},
  {"user_id": "U789", "session_id": "S1033", "item_id": "I201", "timestamp": "2023-10-01T13:00:00Z", "event_type": "click", "duration_sec": 105},
  {"user_id": "U789", "session_id": "S1033", "item_id": "I202", "timestamp": "2023-10-01T13:05:00Z", "event_type": "purchase", "duration_sec": 440},
  {"user_id": "U890", "session_id": "S1034", "item_id": "I211", "timestamp": "2023-10-01T13:10:00Z", "event_type": "view", "duration_sec": 17},
  {"user_id": "U890", "session_id": "S1034", "item_id": "I212", "timestamp": "2023-10-01T13:15:00Z", "event_type": "add_to_cart", "duration_sec": 80},
  {"user_id": "U901", "session_id": "S1035", "item_id": "I221", "timestamp": "2023-10-01T13:20:00Z", "event_type": "click", "duration_sec": 170},
  {"user_id": "U901", "session_id": "S1035", "item_id": "I222", "timestamp": "2023-10-01T13:25:00Z", "event_type": "view", "duration_sec": 28},
  {"user_id": "U012", "session_id": "S1036", "item_id": "I231", "timestamp": "2023-10-01T13:30:00Z", "event_type": "purchase", "duration_sec": 900},
  {"user_id": "U012", "session_id": "S1036", "item_id": "I232", "timestamp": "2023-10-01T13:35:00Z", "event_type": "view", "duration_sec": 50},
  {"user_id": "U123", "session_id": "S1037", "item_id": "I241", "timestamp": "2023-10-01T13:40:00Z", "event_type": "add_to_cart", "duration_sec": 125},
  {"user_id": "U123", "session_id": "S1037", "item_id": "I242", "timestamp": "2023-10-01T13:45:00Z", "event_type": "view", "duration_sec": 22},
  {"user_id": "U234", "session_id": "S1038", "item_id": "I251", "timestamp": "2023-10-01T13:50:00Z", "event_type": "click", "duration_sec": 115},
  {"user_id": "U234", "session_id": "S1038", "item_id": "I252", "timestamp": "2023-10-01T13:55:00Z", "event_type": "purchase", "duration_sec": 480},
  {"user_id": "U345", "session_id": "S1039", "item_id": "I261", "timestamp": "2023-10-01T14:00:00Z", "event_type": "view", "duration_sec": 19},
  {"user_id": "U345", "session_id": "S1039", "item_id": "I262", "timestamp": "2023-10-01T14:05:00Z", "event_type": "add_to_cart", "duration_sec": 90},
  {"user_id": "U456", "session_id": "S1040", "item_id": "I271", "timestamp": "2023-10-01T14:10:00Z", "event_type": "click", "duration_sec": 180},
  {"user_id": "U456", "session_id": "S1040", "item_id": "I272", "timestamp": "2023-10-01T14:15:00Z", "event_type": "view", "duration_sec": 30},
  {"user_id": "U567", "session_id": "S1041", "item_id": "I281", "timestamp": "2023-10-01T14:20:00Z", "event_type": "purchase", "duration_sec": 1000},
  {"user_id": "U567", "session_id": "S1041", "item_id": "I282", "timestamp": "2023-10-01T14:25:00Z", "event_type": "view", "duration_sec": 55},
  {"user_id": "U678", "session_id": "S1042", "item_id": "I291", "timestamp": "2023-10-01T14:30:00Z", "event_type": "add_to_cart", "duration_sec": 135},
  {"user_id": "U678", "session_id": "S1042", "item_id": "I292", "timestamp": "2023-10-01T14:35:00Z", "event_type": "view", "duration_sec": 24},
  {"user_id": "U789", "session_id": "S1043", "item_id": "I301", "timestamp": "2023-10-01T14:40:00Z", "event_type": "click", "duration_sec": 125},
  {"user_id": "U789", "session_id": "S1043", "item_id": "I302", "timestamp": "2023-10-01T14:45:00Z", "event_type": "purchase", "duration_sec": 520},
  {"user_id": "U890", "session_id": "S1044", "item_id": "I311", "timestamp": "2023-10-01T14:50:00Z", "event_type": "view", "duration_sec": 21},
  {"user_id": "U890", "session_id": "S1044", "item_id": "I312", "timestamp": "2023-10-01T14:55:00Z", "event_type": "add_to_cart", "duration_sec": 100},
  {"user_id": "U901", "session_id": "S1045", "item_id": "I321", "timestamp": "2023-10-01T15:00:00Z", "event_type": "click", "duration_sec": 190},
  {"user_id": "U901", "session_id": "S1045", "item_id": "I322", "timestamp": "2023-10-01T15:05:00Z", "event_type": "view", "duration_sec": 32},
  {"user_id": "U012", "session_id": "S1046", "item_id": "I331", "timestamp": "2023-10-01T15:10:00Z", "event_type": "purchase", "duration_sec": 1100},
  {"user_id": "U012", "session_id": "S1046", "item_id": "I332", "timestamp": "2023-10-01T15:15:00Z", "event_type": "view", "duration_sec": 60},
  {"user_id": "U123", "session_id": "S1047", "item_id": "I341", "timestamp": "2023-10-01T15:20:00Z", "event_type": "add_to_cart", "duration_sec": 145},
  {"user_id": "U123", "session_id": "S1047", "item_id": "I342", "timestamp": "2023-10-01T15:25:00Z", "event_type": "view", "duration_sec": 26},
  {"user_id": "U234", "session_id": "S1048", "item_id": "I351", "timestamp": "2023-10-01T15:30:00Z", "event_type": "click", "duration_sec": 135},
  {"user_id": "U234", "session_id": "S1048", "item_id": "I352", "timestamp": "2023-10-01T15:35:00Z", "event_type": "purchase", "duration_sec": 560},
  {"user_id": "U345", "session_id": "S1049", "item_id": "I361", "timestamp": "2023-10-01T15:40:00Z", "event_type": "view", "duration_sec": 23},
  {"user_id": "U345", "session_id": "S1049", "item_id": "I362", "timestamp": "2023-10-01T15:45:00Z", "event_type": "add_to_cart", "duration_sec": 110},
  {"user_id": "U456", "session_id": "S1050", "item_id": "I371", "timestamp": "2023-10-01T15:50:00Z", "event_type": "click", "duration_sec": 200},
  {"user_id": "U456", "session_id": "S1050", "item_id": "I372", "timestamp": "2023-10-01T15:55:00Z", "event_type": "view", "duration_sec": 34},
  {"user_id": "U567", "session_id": "S1051", "item_id": "I381", "timestamp": "2023-10-01T16:00:00Z", "event_type": "purchase", "duration_sec": 1200},
  {"user_id": "U567", "session_id": "S1051", "item_id": "I382", "timestamp": "2023-10-01T16:05:00Z", "event_type": "view", "duration_sec": 65},
  {"user_id": "U678", "session_id": "S1052", "item_id": "I391", "timestamp": "2023-10-01T16:10:00Z", "event_type": "add_to_cart", "duration_sec": 155},
  {"user_id": "U678", "session_id": "S1052", "item_id": "I392", "timestamp": "2023-10-01T16:15:00Z", "event_type": "view", "duration_sec": 28},
  {"user_id": "U789", "session_id": "S1053", "item_id": "I401", "timestamp": "2023-10-01T16:20:00Z", "event_type": "click", "duration_sec": 145},
  {"user_id": "U789", "session_id": "S1053", "item_id": "I402", "timestamp": "2023-10-01T16:25:00Z", "event_type": "purchase", "duration_sec": 600},
  {"user_id": "U890", "session_id": "S1054", "item_id": "I411", "timestamp": "2023-10-01T16:30:00Z", "event_type": "view", "duration_sec": 25},
  {"user_id": "U890", "session_id": "S1054", "item_id": "I412", "timestamp": "2023-10-01T16:35:00Z", "event_type": "add_to_cart", "duration_sec": 120},
  {"user_id": "U901", "session_id": "S1055", "item_id": "I421", "timestamp": "2023-10-01T16:40:00Z", "event_type": "click", "duration_sec": 210},
  {"user_id": "U901", "session_id": "S1055", "item_id": "I422", "timestamp": "2023-10-01T16:45:00Z", "event_type": "view", "duration_sec": 36},
  {"user_id": "U012", "session_id": "S1056", "item_id": "I431", "timestamp": "2023-10-01T16:50:00Z", "event_type": "purchase", "duration_sec": 1300},
  {"user_id": "U012", "session_id": "S1056", "item_id": "I432", "timestamp": "2023-10-01T16:55:00Z", "event_type": "view", "duration_sec": 70},
  {"user_id": "U123", "session_id": "S1057", "item_id": "I441", "timestamp": "2023-10-01T17:00:00Z", "event_type": "add_to_cart", "duration_sec": 165},
  {"user_id": "U123", "session_id": "S1057", "item_id": "I442", "timestamp": "2023-10-01T17:05:00Z", "event_type": "view", "duration_sec": 30},
  {"user_id": "U234", "session_id": "S1058", "item_id": "I451", "timestamp": "2023-10-01T17:10:00Z", "event_type": "click", "duration_sec": 155},
  {"user_id": "U234", "session_id": "S1058", "item_id": "I452", "timestamp": "2023-10-01T17:15:00Z", "event_type": "purchase", "duration_sec": 640},
  {"user_id": "U345", "session_id": "S1059", "item_id": "I461", "timestamp": "2023-10-01T17:20:00Z", "event_type": "view", "duration_sec": 27},
  {"user_id": "U345", "session_id": "S1059", "item_id": "I462", "timestamp": "2023-10-01T17:25:00Z", "event_type": "add_to_cart", "duration_sec": 130},
  {"user_id": "U456", "session_id": "S1060", "item_id": "I471", "timestamp": "2023-10-01T17:30:00Z", "event_type": "click", "duration_sec": 220},
  {"user_id": "U456", "session_id": "S1060", "item_id": "I472", "timestamp": "2023-10-01T17:35:00Z", "event_type": "view", "duration_sec": 38},
  {"user_id": "U567", "session_id": "S1061", "item_id": "I481", "timestamp": "2023-10-01T17:40:00Z", "event_type": "purchase", "duration_sec": 1400},
  {"user_id": "U567", "session_id": "S1061", "item_id": "I482", "timestamp": "2023-10-01T17:45:00Z", "event_type": "view", "duration_sec": 75},
  {"user_id": "U678", "session_id": "S1062", "item_id": "I491", "timestamp": "2023-10-01T17:50:00Z", "event_type": "add_to_cart", "duration_sec": 175},
  {"user_id": "U678", "session_id": "S1062", "item_id": "I492", "timestamp": "2023-10-01T17:55:00Z", "event_type": "view", "duration_sec": 32},
  {"user_id": "U789", "session_id": "S1063", "item_id": "I501", "timestamp": "2023-10-01T18:00:00Z", "event_type": "click", "duration_sec": 165},
  {"user_id": "U789", "session_id": "S1063", "item_id": "I502", "timestamp": "2023-10-01T18:05:00Z", "event_type": "purchase", "duration_sec": 680},
  {"user_id": "U890", "session_id": "S1064", "item_id": "I511", "timestamp": "2023-10-01T18:10:00Z", "event_type": "view", "duration_sec": 29},
  {"user_id": "U890", "session_id": "S1064", "item_id": "I512", "timestamp": "2023-10-01T18:15:00Z", "event_type": "add_to_cart", "duration_sec": 140},
  {"user_id": "U901", "session_id": "S1065", "item_id": "I521", "timestamp": "2023-10-01T18:20:00Z", "event_type": "click", "duration_sec": 230},
  {"user_id": "U901", "session_id": "S1065", "item_id": "I522", "timestamp": "2023-10-01T18:25:00Z", "event_type": "view", "duration_sec": 40},
  {"user_id": "U012", "session_id": "S1066", "item_id": "I531", "timestamp": "2023-10-01T18:30:00Z", "event_type": "purchase", "duration_sec": 1500},
  {"user_id": "U012", "session_id": "S1066", "item_id": "I532", "timestamp": "2023-10-01T18:35:00Z", "event_type": "view", "duration_sec": 80},
  {"user_id": "U123", "session_id": "S1067", "item_id": "I541", "timestamp": "2023-10-01T18:40:00Z", "event_type": "add_to_cart", "duration_sec": 185},
  {"user_id": "U123", "session_id": "S1067", "item_id": "I542", "timestamp": "2023-10-01T18:45:00Z", "event_type": "view", "duration_sec": 34},
  {"user_id": "U234", "session_id": "S1068", "item_id": "I551", "timestamp": "2023-10-01T18:50:00Z", "event_type": "click", "duration_sec": 175},
  {"user_id": "U234", "session_id": "S1068", "item_id": "I552", "timestamp": "2023-10-01T18:55:00Z", "event_type": "purchase", "duration_sec": 720},
  {"user_id": "U345", "session_id": "S1069", "item_id": "I561", "timestamp": "2023-10-01T19:00:00Z", "event_type": "view", "duration_sec": 31},
  {"user_id": "U345", "session_id": "S1069", "item_id": "I562", "timestamp": "2023-10-01T19:05:00Z", "event_type": "add_to_cart", "duration_sec": 150},
  {"user_id": "U456", "session_id": "S1070", "item_id": "I571", "timestamp": "2023-10-01T19:10:00Z", "event_type": "click", "duration_sec": 240},
  {"user_id": "U456", "session_id": "S1070", "item_id": "I572", "timestamp": "2023-10-01T19:15:00Z", "event_type": "view", "duration_sec": 42},
  {"user_id": "U567", "session_id": "S1071", "item_id": "I581", "timestamp": "2023-10-01T19:20:00Z", "event_type": "purchase", "duration_sec": 1600},
  {"user_id": "U567", "session_id": "S1071", "item_id": "I582", "timestamp": "2023-10-01T19:25:00Z", "event_type": "view", "duration_sec": 85},
  {"user_id": "U678", "session_id": "S1072", "item_id": "I591", "timestamp": "2023-10-01T19:30:00Z", "event_type": "add_to_cart", "duration_sec": 195},
  {"user_id": "U678", "session_id": "S1072", "item_id": "I592", "timestamp": "2023-10-01T19:35:00Z", "event_type": "view", "duration_sec": 36},
  {"user_id": "U789", "session_id": "S1073", "item_id": "I601", "timestamp": "2023-10-01T19:40:00Z", "event_type": "click", "duration_sec": 185},
  {"user_id": "U789", "session_id": "S1073", "item_id": "I602", "timestamp": "2023-10-01T19:45:00Z", "event_type": "purchase", "duration_sec": 760},
  {"user_id": "U890", "session_id": "S1074", "item_id": "I611", "timestamp": "2023-10-01T19:50:00Z", "event_type": "view", "duration_sec": 33},
  {"user_id": "U890", "session_id": "S1074", "item_id": "I612", "timestamp": "2023-10-01T19:55:00Z", "event_type": "add_to_cart", "duration_sec": 160},
  {"user_id": "U901", "session_id": "S1075", "item_id": "I621", "timestamp": "2023-10-01T20:00:00Z", "event_type": "click", "duration_sec": 250},
  {"user_id": "U901", "session_id": "S1075", "item_id": "I622", "timestamp": "2023-10-01T20:05:00Z", "event_type": "view", "duration_sec": 44},
  {"user_id": "U012", "session_id": "S1076", "item_id": "I631", "timestamp": "2023-10-01T20:10:00Z", "event_type": "purchase", "duration_sec": 1700},
  {"user_id": "U012", "session_id": "S1076", "item_id": "I632", "timestamp": "2023-10-01T20:15:00Z", "event_type": "view", "duration_sec": 90},
  {"user_id": "U123", "session_id": "S1077", "item_id": "I641", "timestamp": "2023-10-01T20:20:00Z", "event_type": "add_to_cart", "duration_sec": 205},
  {"user_id": "U123", "session_id": "S1077", "item_id": "I642", "timestamp": "2023-10-01T20:25:00Z", "event_type": "view", "duration_sec": 38},
  {"user_id": "U234", "session_id": "S1078", "item_id": "I651", "timestamp": "2023-10-01T20:30:00Z", "event_type": "click", "duration_sec": 195},
  {"user_id": "U234", "session_id": "S1078", "item_id": "I652", "timestamp": "2023-10-01T20:35:00Z", "event_type": "purchase", "duration_sec": 800},
  {"user_id": "U345", "session_id": "S1079", "item_id": "I661", "timestamp": "2023-10-01T20:40:00Z", "event_type": "view", "duration_sec": 35},
  {"user_id": "U345", "session_id": "S1079", "item_id": "I662", "timestamp": "2023-10-01T20:45:00Z", "event_type": "add_to_cart", "duration_sec": 170},
  {"user_id": "U456", "session_id": "S1080", "item_id": "I671", "timestamp": "2023-10-01T20:50:00Z", "event_type": "click", "duration_sec": 260},
  {"user_id": "U456", "session_id": "S1080", "item_id": "I672", "timestamp": "2023-10-01T20:55:00Z", "event_type": "view", "duration_sec": 46},
  {"user_id": "U567", "session_id": "S1081", "item_id": "I681", "timestamp": "2023-10-01T21:00:00Z", "event_type": "purchase", "duration_sec": 1800},
  {"user_id": "U567", "session_id": "S1081", "item_id": "I682", "timestamp": "2023-10-01T21:05:00Z", "event_type": "view", "duration_sec": 95},
  {"user_id": "U678", "session_id": "S1082", "item_id": "I691", "timestamp": "2023-10-01T21:10:00Z", "event_type": "add_to_cart", "duration_sec": 215},
  {"user_id": "U678", "session_id": "S1082", "item_id": "I692", "timestamp": "2023-10-01T21:15:00Z", "event_type": "view", "duration_sec": 40},
  {"user_id": "U789", "session_id": "S1083", "item_id": "I701", "timestamp": "2023-10-01T21:20:00Z", "event_type": "click", "duration_sec": 205},
  {"user_id": "U789", "session_id": "S1083", "item_id": "I702", "timestamp": "2023-10-01T21:25:00Z", "event_type": "purchase", "duration_sec": 840},
  {"user_id": "U890", "session_id": "S1084", "item_id": "I711", "timestamp": "2023-10-01T21:30:00Z", "event_type": "view", "duration_sec": 37},
  {"user_id": "U890", "session_id": "S1084", "item_id": "I712", "timestamp": "2023-10-01T21:35:00Z", "event_type": "add_to_cart", "duration_sec": 180},
  {"user_id": "U901", "session_id": "S1085", "item_id": "I721", "timestamp": "2023-10-01T21:40:00Z", "event_type": "click", "duration_sec": 270},
  {"user_id": "U901", "session_id": "S1085", "item_id": "I722", "timestamp": "2023-10-01T21:45:00Z", "event_type": "view", "duration_sec": 48},
  {"user_id": "U012", "session_id": "S1086", "item_id": "I731", "timestamp": "2023-10-01T21:50:00Z", "event_type": "purchase", "duration_sec": 1900},
  {"user_id": "U012", "session_id": "S1086", "item_id": "I732", "timestamp": "2023-10-01T21:55:00Z", "event_type": "view", "duration_sec": 100},
  {"user_id": "U123", "session_id": "S1087", "item_id": "I741", "timestamp": "2023-10-01T22:00:00Z", "event_type": "add_to_cart", "duration_sec": 225},
  {"user_id": "U123", "session_id": "S1087", "item_id": "I742", "timestamp": "2023-10-01T22:05:00Z", "event_type": "view", "duration_sec": 42},
  {"user_id": "U234", "session_id": "S1088", "item_id": "I751", "timestamp": "2023-10-01T22:10:00Z", "event_type": "click", "duration_sec": 215},
  {"user_id": "U234", "session_id": "S1088", "item_id": "I752", "timestamp": "2023-10-01T22:15:00Z", "event_type": "purchase", "duration_sec": 880},
  {"user_id": "U345", "session_id": "S1089", "item_id": "I761", "timestamp": "2023-10-01T22:20:00Z", "event_type": "view", "duration_sec": 39},
  {"user_id": "U345", "session_id": "S1089", "item_id": "I762", "timestamp": "2023-10-01T22:25:00Z", "event_type": "add_to_cart", "duration_sec": 190},
  {"user_id": "U456", "session_id": "S1090", "item_id": "I771", "timestamp": "2023-10-01T22:30:00Z", "event_type": "click", "duration_sec": 280},
  {"user_id": "U456", "session_id": "S1090", "item_id": "I772", "timestamp": "2023-10-01T22:35:00Z", "event_type": "view", "duration_sec": 50},
  {"user_id": "U567", "session_id": "S1091", "item_id": "I781", "timestamp": "2023-10-01T22:40:00Z", "event_type": "purchase", "duration_sec": 2000},
  {"user_id": "U567", "session_id": "S1091", "item_id": "I782", "timestamp": "2023-10-01T22:45:00Z", "event_type": "view", "duration_sec": 105},
  {"user_id": "U678", "session_id": "S1092", "item_id": "I791", "timestamp": "2023-10-01T22:50:00Z", "event_type": "add_to_cart", "duration_sec": 235},
  {"user_id": "U678", "session_id": "S1092", "item_id": "I792", "timestamp": "2023-10-01T22:55:00Z", "event_type": "view", "duration_sec": 44},
  {"user_id": "U789", "session_id": "S1093", "item_id": "I801", "timestamp": "2023-10-01T23:00:00Z", "event_type": "click", "duration_sec": 225},
  {"user_id": "U789", "session_id": "S1093", "item_id": "I802", "timestamp": "2023-10-01T23:05:00Z", "event_type": "purchase", "duration_sec": 920},
  {"user_id": "U890", "session_id": "S1094", "item_id": "I811", "timestamp": "2023-10-01T23:10:00Z", "event_type": "view", "duration_sec": 41},
  {"user_id": "U890", "session_id": "S1094", "item_id": "I812", "timestamp": "2023-10-01T23:15:00Z", "event_type": "add_to_cart", "duration_sec": 200},
  {"user_id": "U901", "session_id": "S1095", "item_id": "I821", "timestamp": "2023-10-01T23:20:00Z", "event_type": "click", "duration_sec": 290},
  {"user_id": "U901", "session_id": "S1095", "item_id": "I822", "timestamp": "2023-10-01T23:25:00Z", "event_type": "view", "duration_sec": 52},
  {"user_id": "U012", "session_id": "S1096", "item_id": "I831", "timestamp": "2023-10-01T23:30:00Z", "event_type": "purchase", "duration_sec": 2100},
  {"user_id": "U012", "session_id": "S1096", "item_id": "I832", "timestamp": "2023-10-01T23:35:00Z", "event_type": "view", "duration_sec": 110},
  {"user_id": "U123", "session_id": "S1097", "item_id": "I841", "timestamp": "2023-10-01T23:40:00Z", "event_type": "add_to_cart", "duration_sec": 245},
  {"user_id": "U123", "session_id": "S1097", "item_id": "I842", "timestamp": "2023-10-01T23:45:00Z", "event_type": "view", "duration_sec": 46},
  {"user_id": "U234", "session_id": "S1098", "item_id": "I851", "timestamp": "2023-10-01T23:50:00Z", "event_type": "click", "duration_sec": 235},
  {"user_id": "U234", "session_id": "S1098", "item_id": "I852", "timestamp": "2023-10-01T23:55:00Z", "event_type": "purchase", "duration_sec": 960},
  {"user_id": "U345", "session_id": "S1099", "item_id": "I861", "timestamp": "2023-10-02T00:00:00Z", "event_type": "view", "duration_sec": 43},
  {"user_id": "U345", "session_id": "S1099", "item_id": "I862", "timestamp": "2023-10-02T00:05:00Z", "event_type": "add_to_cart", "duration_sec": 210},
  {"user_id": "U456", "session_id": "S1100", "item_id": "I871", "timestamp": "2023-10-02T00:10:00Z", "event_type": "click", "duration_sec": 300},
  {"user_id": "U456", "session_id": "S1100", "item_id": "I872", "timestamp": "2023-10-02T00:15:00Z", "event_type": "view", "duration_sec": 54}
]
            

Appendix C: Sample Movie Metadata Dataset (JSON)

To complement the clickstream data in Appendix B, here is the simulated item metadata dataset used for Content-Based Filtering.


[
  {"item_id": "I042", "title": "The Quantum Paradox", "genres": ["Sci-Fi", "Thriller"], "release_year": 2021, "rating": 4.5},
  {"item_id": "I089", "title": "Romantic Echoes", "genres": ["Romance", "Drama"], "release_year": 2019, "rating": 3.8},
  {"item_id": "I112", "title": "Operation Alpha", "genres": ["Action", "War"], "release_year": 2020, "rating": 4.1},
  {"item_id": "I999", "title": "Comedy Central Live", "genres": ["Comedy", "Stand-Up"], "release_year": 2022, "rating": 4.8},
  {"item_id": "I101", "title": "Deep Ocean Secrets", "genres": ["Documentary", "Nature"], "release_year": 2018, "rating": 4.9},
  {"item_id": "I102", "title": "Space Colonists", "genres": ["Sci-Fi", "Adventure"], "release_year": 2023, "rating": 4.2},
  {"item_id": "I103", "title": "Historical Battles", "genres": ["History", "Documentary"], "release_year": 2015, "rating": 4.6},
  {"item_id": "I201", "title": "The Last Samurai", "genres": ["Action", "History"], "release_year": 2003, "rating": 4.7},
  {"item_id": "I202", "title": "Medieval Knights", "genres": ["Action", "Drama"], "release_year": 2010, "rating": 3.9},
  {"item_id": "I301", "title": "Future Tech Review", "genres": ["Technology", "News"], "release_year": 2024, "rating": 4.0},
  {"item_id": "I302", "title": "AI Revolution", "genres": ["Documentary", "Tech"], "release_year": 2022, "rating": 4.4},
  {"item_id": "I401", "title": "Culinary Journey", "genres": ["Cooking", "Travel"], "release_year": 2019, "rating": 4.3},
  {"item_id": "I402", "title": "Street Food Masters", "genres": ["Cooking", "Reality"], "release_year": 2021, "rating": 4.5},
  {"item_id": "I501", "title": "Guitar for Beginners", "genres": ["Education", "Music"], "release_year": 2020, "rating": 4.8},
  {"item_id": "I502", "title": "Advanced Piano", "genres": ["Education", "Music"], "release_year": 2018, "rating": 4.6},
  {"item_id": "I601", "title": "Yoga Daily", "genres": ["Health", "Fitness"], "release_year": 2022, "rating": 4.9},
  {"item_id": "I602", "title": "HIIT Workout", "genres": ["Health", "Fitness"], "release_year": 2021, "rating": 4.7},
  {"item_id": "I701", "title": "Mystery in the Alps", "genres": ["Mystery", "Thriller"], "release_year": 2017, "rating": 4.1},
  {"item_id": "I702", "title": "Detective Noir", "genres": ["Mystery", "Crime"], "release_year": 2014, "rating": 4.3},
  {"item_id": "I801", "title": "Fantasy Worlds", "genres": ["Fantasy", "Adventure"], "release_year": 2022, "rating": 4.5},
  {"item_id": "I802", "title": "Dragon Riders", "genres": ["Fantasy", "Action"], "release_year": 2019, "rating": 4.2},
  {"item_id": "I901", "title": "Horror House", "genres": ["Horror", "Thriller"], "release_year": 2018, "rating": 3.7},
  {"item_id": "I902", "title": "Vampire Diaries: Extended", "genres": ["Horror", "Romance"], "release_year": 2015, "rating": 4.0},
  {"item_id": "I011", "title": "Anime Classics", "genres": ["Anime", "Action"], "release_year": 2010, "rating": 4.8},
  {"item_id": "I012", "title": "Mecha Warriors", "genres": ["Anime", "Sci-Fi"], "release_year": 2016, "rating": 4.4},
  {"item_id": "I021", "title": "K-Drama Hits", "genres": ["Romance", "Drama"], "release_year": 2021, "rating": 4.7},
  {"item_id": "I022", "title": "Seoul Nights", "genres": ["Drama", "Thriller"], "release_year": 2022, "rating": 4.5},
  {"item_id": "I031", "title": "Bollywood Blockbuster", "genres": ["Action", "Romance"], "release_year": 2019, "rating": 4.2},
  {"item_id": "I032", "title": "Indian Indie Films", "genres": ["Drama", "Art"], "release_year": 2020, "rating": 4.6},
  {"item_id": "I041", "title": "French Cinema", "genres": ["Drama", "Romance"], "release_year": 2017, "rating": 4.3},
  {"item_id": "I051", "title": "Spanish Telenovela", "genres": ["Drama", "Soap"], "release_year": 2018, "rating": 3.9},
  {"item_id": "I052", "title": "Mexican Cartel Docs", "genres": ["Documentary", "Crime"], "release_year": 2021, "rating": 4.5},
  {"item_id": "I061", "title": "Wildlife Africa", "genres": ["Nature", "Documentary"], "release_year": 2019, "rating": 4.9},
  {"item_id": "I062", "title": "Amazon Rainforest", "genres": ["Nature", "Documentary"], "release_year": 2020, "rating": 4.8},
  {"item_id": "I071", "title": "Cars & Coffee", "genres": ["Automotive", "Reality"], "release_year": 2022, "rating": 4.1},
  {"item_id": "I072", "title": "Supercar Showdown", "genres": ["Automotive", "Action"], "release_year": 2023, "rating": 4.4},
  {"item_id": "I081", "title": "Home Improvement", "genres": ["DIY", "Reality"], "release_year": 2015, "rating": 4.0},
  {"item_id": "I082", "title": "Gardening Tips", "genres": ["DIY", "Education"], "release_year": 2018, "rating": 4.2},
  {"item_id": "I091", "title": "Pet Care 101", "genres": ["Pets", "Education"], "release_year": 2021, "rating": 4.6},
  {"item_id": "I092", "title": "Funny Dogs Compilation", "genres": ["Pets", "Comedy"], "release_year": 2022, "rating": 4.9},
  {"item_id": "I111", "title": "Stock Market Basics", "genres": ["Finance", "Education"], "release_year": 2020, "rating": 4.4},
  {"item_id": "I121", "title": "Crypto Trends", "genres": ["Finance", "News"], "release_year": 2023, "rating": 3.8},
  {"item_id": "I122", "title": "Real Estate Investing", "genres": ["Finance", "Education"], "release_year": 2021, "rating": 4.5},
  {"item_id": "I131", "title": "Learn Spanish", "genres": ["Language", "Education"], "release_year": 2019, "rating": 4.7},
  {"item_id": "I132", "title": "Learn Japanese", "genres": ["Language", "Education"], "release_year": 2020, "rating": 4.8},
  {"item_id": "I141", "title": "World Geography", "genres": ["Education", "Documentary"], "release_year": 2017, "rating": 4.3},
  {"item_id": "I142", "title": "Space Exploration", "genres": ["Science", "Documentary"], "release_year": 2022, "rating": 4.9},
  {"item_id": "I151", "title": "Physics for Kids", "genres": ["Kids", "Education"], "release_year": 2021, "rating": 4.5},
  {"item_id": "I152", "title": "Chemistry Experiments", "genres": ["Science", "Education"], "release_year": 2018, "rating": 4.6},
  {"item_id": "I161", "title": "Magic Tricks Revealed", "genres": ["Entertainment", "Reality"], "release_year": 2016, "rating": 4.1},
  {"item_id": "I162", "title": "Got Talent Highlights", "genres": ["Entertainment", "Reality"], "release_year": 2023, "rating": 4.4},
  {"item_id": "I171", "title": "Chess Masterclass", "genres": ["Gaming", "Education"], "release_year": 2020, "rating": 4.9},
  {"item_id": "I172", "title": "Esports Finals 2023", "genres": ["Gaming", "Action"], "release_year": 2023, "rating": 4.7},
  {"item_id": "I181", "title": "Speedrunning Zelda", "genres": ["Gaming", "Entertainment"], "release_year": 2022, "rating": 4.6},
  {"item_id": "I182", "title": "Minecraft Builds", "genres": ["Gaming", "Creative"], "release_year": 2021, "rating": 4.8},
  {"item_id": "I191", "title": "Digital Art Tutorial", "genres": ["Art", "Education"], "release_year": 2020, "rating": 4.7},
  {"item_id": "I192", "title": "Oil Painting Basics", "genres": ["Art", "Education"], "release_year": 2019, "rating": 4.5},
  {"item_id": "I211", "title": "Fashion Week 2022", "genres": ["Fashion", "News"], "release_year": 2022, "rating": 4.0},
  {"item_id": "I212", "title": "Makeup Trends", "genres": ["Beauty", "Fashion"], "release_year": 2023, "rating": 4.3},
  {"item_id": "I221", "title": "Skincare Routines", "genres": ["Beauty", "Health"], "release_year": 2021, "rating": 4.6},
  {"item_id": "I222", "title": "Hairstyle Hacks", "genres": ["Beauty", "Fashion"], "release_year": 2020, "rating": 4.4},
  {"item_id": "I231", "title": "Tech Gadgets 2024", "genres": ["Technology", "Review"], "release_year": 2024, "rating": 4.8},
  {"item_id": "I232", "title": "Smartphone Showdown", "genres": ["Technology", "Review"], "release_year": 2023, "rating": 4.5},
  {"item_id": "I241", "title": "Laptop Buying Guide", "genres": ["Technology", "Education"], "release_year": 2022, "rating": 4.7},
  {"item_id": "I242", "title": "PC Building Tutorial", "genres": ["Technology", "DIY"], "release_year": 2021, "rating": 4.9},
  {"item_id": "I251", "title": "Coding in Python", "genres": ["Technology", "Education"], "release_year": 2020, "rating": 4.9},
  {"item_id": "I252", "title": "Web Dev Bootcamp", "genres": ["Technology", "Education"], "release_year": 2021, "rating": 4.8},
  {"item_id": "I261", "title": "App Development", "genres": ["Technology", "Education"], "release_year": 2022, "rating": 4.7},
  {"item_id": "I262", "title": "Cloud Architecture", "genres": ["Technology", "Education"], "release_year": 2023, "rating": 4.6},
  {"item_id": "I271", "title": "Cybersecurity 101", "genres": ["Technology", "Security"], "release_year": 2021, "rating": 4.8},
  {"item_id": "I272", "title": "Ethical Hacking", "genres": ["Technology", "Security"], "release_year": 2022, "rating": 4.7},
  {"item_id": "I281", "title": "Blockchain Explained", "genres": ["Technology", "Finance"], "release_year": 2020, "rating": 4.4},
  {"item_id": "I282", "title": "Quantum Computing", "genres": ["Technology", "Science"], "release_year": 2023, "rating": 4.5},
  {"item_id": "I291", "title": "Machine Learning", "genres": ["Technology", "AI"], "release_year": 2021, "rating": 4.9},
  {"item_id": "I292", "title": "Deep Learning Models", "genres": ["Technology", "AI"], "release_year": 2022, "rating": 4.8},
  {"item_id": "I311", "title": "Data Visualization", "genres": ["Data Science", "Education"], "release_year": 2020, "rating": 4.6},
  {"item_id": "I312", "title": "Big Data Engineering", "genres": ["Data Science", "Education"], "release_year": 2021, "rating": 4.7},
  {"item_id": "I321", "title": "Statistics for DS", "genres": ["Data Science", "Math"], "release_year": 2019, "rating": 4.8},
  {"item_id": "I322", "title": "SQL Mastery", "genres": ["Data Science", "Database"], "release_year": 2020, "rating": 4.9},
  {"item_id": "I331", "title": "Agile Methodologies", "genres": ["Management", "Education"], "release_year": 2018, "rating": 4.5},
  {"item_id": "I332", "title": "Project Management", "genres": ["Management", "Education"], "release_year": 2019, "rating": 4.6},
  {"item_id": "I341", "title": "Leadership Skills", "genres": ["Business", "Education"], "release_year": 2020, "rating": 4.7},
  {"item_id": "I342", "title": "Public Speaking", "genres": ["Business", "Education"], "release_year": 2021, "rating": 4.8},
  {"item_id": "I351", "title": "Negotiation Tactics", "genres": ["Business", "Education"], "release_year": 2022, "rating": 4.6},
  {"item_id": "I352", "title": "Sales Strategies", "genres": ["Business", "Education"], "release_year": 2023, "rating": 4.5},
  {"item_id": "I361", "title": "Marketing 101", "genres": ["Marketing", "Education"], "release_year": 2020, "rating": 4.7},
  {"item_id": "I362", "title": "Digital Marketing", "genres": ["Marketing", "Education"], "release_year": 2021, "rating": 4.8},
  {"item_id": "I371", "title": "SEO Basics", "genres": ["Marketing", "Tech"], "release_year": 2019, "rating": 4.6},
  {"item_id": "I372", "title": "Social Media Ads", "genres": ["Marketing", "Tech"], "release_year": 2020, "rating": 4.5},
  {"item_id": "I381", "title": "Content Creation", "genres": ["Creative", "Marketing"], "release_year": 2021, "rating": 4.7},
  {"item_id": "I382", "title": "Video Editing", "genres": ["Creative", "Tech"], "release_year": 2022, "rating": 4.8},
  {"item_id": "I391", "title": "Photography Tips", "genres": ["Creative", "Art"], "release_year": 2020, "rating": 4.9},
  {"item_id": "I392", "title": "Lighting Mastery", "genres": ["Creative", "Art"], "release_year": 2021, "rating": 4.7},
  {"item_id": "I411", "title": "Music Production", "genres": ["Music", "Tech"], "release_year": 2019, "rating": 4.8},
  {"item_id": "I412", "title": "Mixing & Mastering", "genres": ["Music", "Tech"], "release_year": 2020, "rating": 4.7},
  {"item_id": "I421", "title": "Singing Techniques", "genres": ["Music", "Art"], "release_year": 2021, "rating": 4.6},
  {"item_id": "I422", "title": "Songwriting", "genres": ["Music", "Creative"], "release_year": 2022, "rating": 4.8},
  {"item_id": "I431", "title": "Acting Basics", "genres": ["Acting", "Art"], "release_year": 2018, "rating": 4.5},
  {"item_id": "I432", "title": "Improv Comedy", "genres": ["Acting", "Comedy"], "release_year": 2019, "rating": 4.7},
  {"item_id": "I441", "title": "Standup Specials", "genres": ["Comedy", "Entertainment"], "release_year": 2020, "rating": 4.9},
  {"item_id": "I442", "title": "Sketch Shows", "genres": ["Comedy", "Entertainment"], "release_year": 2021, "rating": 4.6},
  {"item_id": "I451", "title": "Late Night Highlights", "genres": ["Comedy", "Talk Show"], "release_year": 2022, "rating": 4.5},
  {"item_id": "I452", "title": "Podcast Clips", "genres": ["Entertainment", "Talk Show"], "release_year": 2023, "rating": 4.7},
  {"item_id": "I461", "title": "True Crime Stories", "genres": ["Crime", "Documentary"], "release_year": 2020, "rating": 4.8},
  {"item_id": "I462", "title": "Unsolved Mysteries", "genres": ["Mystery", "Documentary"], "release_year": 2021, "rating": 4.7},
  {"item_id": "I471", "title": "Paranormal Investigations", "genres": ["Horror", "Reality"], "release_year": 2019, "rating": 4.4},
  {"item_id": "I472", "title": "Ghost Hunters", "genres": ["Horror", "Reality"], "release_year": 2020, "rating": 4.3},
  {"item_id": "I481", "title": "Survival Skills", "genres": ["Adventure", "Reality"], "release_year": 2021, "rating": 4.6},
  {"item_id": "I482", "title": "Extreme Sports", "genres": ["Sports", "Action"], "release_year": 2022, "rating": 4.8},
  {"item_id": "I491", "title": "Football Highlights", "genres": ["Sports", "News"], "release_year": 2023, "rating": 4.9},
  {"item_id": "I492", "title": "Basketball Finals", "genres": ["Sports", "Action"], "release_year": 2022, "rating": 4.8},
  {"item_id": "I511", "title": "Olympics Recap", "genres": ["Sports", "Documentary"], "release_year": 2021, "rating": 4.7},
  {"item_id": "I512", "title": "Tennis Grand Slams", "genres": ["Sports", "Action"], "release_year": 2020, "rating": 4.6},
  {"item_id": "I521", "title": "Golf Masters", "genres": ["Sports", "Action"], "release_year": 2019, "rating": 4.5},
  {"item_id": "I522", "title": "F1 Racing", "genres": ["Sports", "Action"], "release_year": 2023, "rating": 4.8},
  {"item_id": "I531", "title": "Boxing Legends", "genres": ["Sports", "Documentary"], "release_year": 2018, "rating": 4.7},
  {"item_id": "I532", "title": "MMA Knockouts", "genres": ["Sports", "Action"], "release_year": 2022, "rating": 4.9},
  {"item_id": "I541", "title": "Skateboarding Tricks", "genres": ["Sports", "Action"], "release_year": 2021, "rating": 4.6},
  {"item_id": "I542", "title": "Surfing Giants", "genres": ["Sports", "Adventure"], "release_year": 2020, "rating": 4.8},
  {"item_id": "I551", "title": "Mountain Biking", "genres": ["Sports", "Adventure"], "release_year": 2019, "rating": 4.7},
  {"item_id": "I552", "title": "Snowboarding Jumps", "genres": ["Sports", "Action"], "release_year": 2022, "rating": 4.6},
  {"item_id": "I561", "title": "Travel Vlogs", "genres": ["Travel", "Reality"], "release_year": 2021, "rating": 4.8},
  {"item_id": "I562", "title": "Europe Backpacking", "genres": ["Travel", "Documentary"], "release_year": 2020, "rating": 4.7},
  {"item_id": "I571", "title": "Asia Street Markets", "genres": ["Travel", "Culture"], "release_year": 2019, "rating": 4.9},
  {"item_id": "I572", "title": "Americas Road Trip", "genres": ["Travel", "Adventure"], "release_year": 2022, "rating": 4.8},
  {"item_id": "I581", "title": "Luxury Hotels", "genres": ["Travel", "Review"], "release_year": 2021, "rating": 4.5},
  {"item_id": "I582", "title": "Budget Travel", "genres": ["Travel", "Tips"], "release_year": 2020, "rating": 4.6},
  {"item_id": "I591", "title": "Cruise Ship Tours", "genres": ["Travel", "Review"], "release_year": 2018, "rating": 4.4},
  {"item_id": "I592", "title": "Desert Safaris", "genres": ["Travel", "Adventure"], "release_year": 2023, "rating": 4.7},
  {"item_id": "I611", "title": "Ancient Egypt", "genres": ["History", "Documentary"], "release_year": 2017, "rating": 4.8},
  {"item_id": "I612", "title": "Roman Empire", "genres": ["History", "Documentary"], "release_year": 2016, "rating": 4.7},
  {"item_id": "I621", "title": "World War II", "genres": ["History", "War"], "release_year": 2015, "rating": 4.9},
  {"item_id": "I622", "title": "Cold War Secrets", "genres": ["History", "Documentary"], "release_year": 2018, "rating": 4.6},
  {"item_id": "I631", "title": "Industrial Revolution", "genres": ["History", "Documentary"], "release_year": 2019, "rating": 4.5},
  {"item_id": "I632", "title": "Space Race", "genres": ["History", "Science"], "release_year": 2020, "rating": 4.8},
  {"item_id": "I641", "title": "Dinosaurs Alive", "genres": ["Science", "Documentary"], "release_year": 2021, "rating": 4.7},
  {"item_id": "I642", "title": "Evolution Theory", "genres": ["Science", "Education"], "release_year": 2019, "rating": 4.6},
  {"item_id": "I651", "title": "Human Body Facts", "genres": ["Science", "Health"], "release_year": 2022, "rating": 4.8},
  {"item_id": "I652", "title": "Brain Functions", "genres": ["Science", "Health"], "release_year": 2023, "rating": 4.9},
  {"item_id": "I661", "title": "Philosophy Basics", "genres": ["Education", "Philosophy"], "release_year": 2020, "rating": 4.5},
  {"item_id": "I662", "title": "Psychology 101", "genres": ["Education", "Psychology"], "release_year": 2021, "rating": 4.7},
  {"item_id": "I671", "title": "Sociology Concepts", "genres": ["Education", "Sociology"], "release_year": 2019, "rating": 4.6},
  {"item_id": "I672", "title": "Political Science", "genres": ["Education", "Politics"], "release_year": 2022, "rating": 4.4},
  {"item_id": "I681", "title": "Economics for Beginners", "genres": ["Education", "Economics"], "release_year": 2020, "rating": 4.8},
  {"item_id": "I682", "title": "Macroeconomics", "genres": ["Education", "Economics"], "release_year": 2021, "rating": 4.7},
  {"item_id": "I691", "title": "Law and Order", "genres": ["Education", "Law"], "release_year": 2018, "rating": 4.5},
  {"item_id": "I692", "title": "Criminal Justice", "genres": ["Education", "Law"], "release_year": 2019, "rating": 4.6},
  {"item_id": "I711", "title": "Medical Anomalies", "genres": ["Health", "Documentary"], "release_year": 2020, "rating": 4.8},
  {"item_id": "I712", "title": "ER Stories", "genres": ["Health", "Reality"], "release_year": 2021, "rating": 4.7},
  {"item_id": "I721", "title": "Healthy Eating", "genres": ["Health", "Diet"], "release_year": 2019, "rating": 4.6},
  {"item_id": "I722", "title": "Vegan Recipes", "genres": ["Health", "Diet"], "release_year": 2022, "rating": 4.5},
  {"item_id": "I731", "title": "Mental Health Awareness", "genres": ["Health", "Psychology"], "release_year": 2023, "rating": 4.9},
  {"item_id": "I732", "title": "Meditation Guide", "genres": ["Health", "Wellness"], "release_year": 2021, "rating": 4.8},
  {"item_id": "I741", "title": "Sleep Science", "genres": ["Health", "Science"], "release_year": 2020, "rating": 4.7},
  {"item_id": "I742", "title": "Posture Correction", "genres": ["Health", "Fitness"], "release_year": 2018, "rating": 4.4},
  {"item_id": "I751", "title": "Marathon Training", "genres": ["Fitness", "Sports"], "release_year": 2019, "rating": 4.6},
  {"item_id": "I752", "title": "Bodybuilding Diet", "genres": ["Fitness", "Diet"], "release_year": 2021, "rating": 4.7},
  {"item_id": "I761", "title": "CrossFit Games", "genres": ["Fitness", "Action"], "release_year": 2022, "rating": 4.8},
  {"item_id": "I762", "title": "Home Workouts", "genres": ["Fitness", "Health"], "release_year": 2020, "rating": 4.9},
  {"item_id": "I771", "title": "Pilates for Core", "genres": ["Fitness", "Health"], "release_year": 2021, "rating": 4.6},
  {"item_id": "I772", "title": "Zumba Dance", "genres": ["Fitness", "Dance"], "release_year": 2019, "rating": 4.5},
  {"item_id": "I781", "title": "Martial Arts Basics", "genres": ["Fitness", "Sports"], "release_year": 2018, "rating": 4.7},
  {"item_id": "I782", "title": "Self Defense", "genres": ["Fitness", "Education"], "release_year": 2020, "rating": 4.8},
  {"item_id": "I791", "title": "Archery Techniques", "genres": ["Sports", "Education"], "release_year": 2021, "rating": 4.5},
  {"item_id": "I792", "title": "Fencing Rules", "genres": ["Sports", "Education"], "release_year": 2019, "rating": 4.4},
  {"item_id": "I811", "title": "Sailing the World", "genres": ["Adventure", "Travel"], "release_year": 2022, "rating": 4.9},
  {"item_id": "I812", "title": "Deep Sea Diving", "genres": ["Adventure", "Nature"], "release_year": 2020, "rating": 4.8},
  {"item_id": "I821", "title": "Everest Expeditions", "genres": ["Adventure", "Documentary"], "release_year": 2021, "rating": 4.7},
  {"item_id": "I822", "title": "Arctic Explorers", "genres": ["Adventure", "Documentary"], "release_year": 2018, "rating": 4.6},
  {"item_id": "I831", "title": "Jungle Survival", "genres": ["Adventure", "Reality"], "release_year": 2019, "rating": 4.5},
  {"item_id": "I832", "title": "Desert Nomads", "genres": ["Adventure", "Culture"], "release_year": 2023, "rating": 4.8},
  {"item_id": "I841", "title": "RV Living", "genres": ["Lifestyle", "Travel"], "release_year": 2021, "rating": 4.6},
  {"item_id": "I842", "title": "Tiny House Nation", "genres": ["Lifestyle", "Reality"], "release_year": 2020, "rating": 4.7},
  {"item_id": "I851", "title": "Minimalism Doc", "genres": ["Lifestyle", "Documentary"], "release_year": 2019, "rating": 4.5},
  {"item_id": "I852", "title": "Zero Waste Living", "genres": ["Lifestyle", "Environment"], "release_year": 2022, "rating": 4.8},
  {"item_id": "I861", "title": "Sustainable Farming", "genres": ["Environment", "Agriculture"], "release_year": 2021, "rating": 4.9},
  {"item_id": "I862", "title": "Climate Change Facts", "genres": ["Environment", "Science"], "release_year": 2023, "rating": 4.7},
  {"item_id": "I871", "title": "Ocean Cleanups", "genres": ["Environment", "Documentary"], "release_year": 2020, "rating": 4.8},
  {"item_id": "I872", "title": "Renewable Energy", "genres": ["Environment", "Tech"], "release_year": 2019, "rating": 4.6}
]
            

Appendix D: Sample User Demographics Dataset (JSON)

For context-aware recommendations, user demographics play a huge role. Here is a simulated demographics dataset mapping to the users in Appendix B.


[
  {"user_id": "U721", "age": 25, "gender": "F", "location": "New York", "device": "mobile", "premium": true, "signup_date": "2021-05-12"},
  {"user_id": "U314", "age": 34, "gender": "M", "location": "Chicago", "device": "desktop", "premium": false, "signup_date": "2022-01-20"},
  {"user_id": "U089", "age": 19, "gender": "M", "location": "Los Angeles", "device": "mobile", "premium": false, "signup_date": "2023-08-15"},
  {"user_id": "U555", "age": 42, "gender": "F", "location": "Houston", "device": "tablet", "premium": true, "signup_date": "2020-11-05"},
  {"user_id": "U123", "age": 28, "gender": "M", "location": "Seattle", "device": "mobile", "premium": true, "signup_date": "2021-03-30"},
  {"user_id": "U444", "age": 55, "gender": "F", "location": "Boston", "device": "desktop", "premium": false, "signup_date": "2019-07-22"},
  {"user_id": "U999", "age": 22, "gender": "M", "location": "Austin", "device": "mobile", "premium": true, "signup_date": "2022-09-10"},
  {"user_id": "U888", "age": 31, "gender": "F", "location": "Denver", "device": "tablet", "premium": false, "signup_date": "2021-12-01"},
  {"user_id": "U777", "age": 27, "gender": "M", "location": "Miami", "device": "mobile", "premium": true, "signup_date": "2020-04-18"},
  {"user_id": "U666", "age": 48, "gender": "F", "location": "Atlanta", "device": "desktop", "premium": false, "signup_date": "2018-02-14"},
  {"user_id": "U333", "age": 29, "gender": "M", "location": "Portland", "device": "mobile", "premium": true, "signup_date": "2022-05-05"},
  {"user_id": "U222", "age": 38, "gender": "F", "location": "Dallas", "device": "desktop", "premium": false, "signup_date": "2020-10-10"},
  {"user_id": "U111", "age": 24, "gender": "M", "location": "San Francisco", "device": "mobile", "premium": true, "signup_date": "2023-01-15"},
  {"user_id": "U000", "age": 50, "gender": "F", "location": "Washington", "device": "tablet", "premium": false, "signup_date": "2019-09-25"},
  {"user_id": "U234", "age": 33, "gender": "M", "location": "Phoenix", "device": "desktop", "premium": true, "signup_date": "2021-06-20"},
  {"user_id": "U345", "age": 26, "gender": "F", "location": "San Diego", "device": "mobile", "premium": false, "signup_date": "2022-11-08"},
  {"user_id": "U456", "age": 45, "gender": "M", "location": "Detroit", "device": "desktop", "premium": true, "signup_date": "2017-03-12"},
  {"user_id": "U567", "age": 21, "gender": "F", "location": "Minneapolis", "device": "mobile", "premium": false, "signup_date": "2023-04-04"},
  {"user_id": "U678", "age": 37, "gender": "M", "location": "Tampa", "device": "tablet", "premium": true, "signup_date": "2020-08-30"},
  {"user_id": "U789", "age": 52, "gender": "F", "location": "Charlotte", "device": "desktop", "premium": false, "signup_date": "2018-11-17"},
  {"user_id": "U890", "age": 30, "gender": "M", "location": "Orlando", "device": "mobile", "premium": true, "signup_date": "2021-02-28"},
  {"user_id": "U901", "age": 23, "gender": "F", "location": "Raleigh", "device": "mobile", "premium": false, "signup_date": "2022-07-07"},
  {"user_id": "U012", "age": 41, "gender": "M", "location": "Columbus", "device": "desktop", "premium": true, "signup_date": "2019-01-09"},
  {"user_id": "U135", "age": 28, "gender": "F", "location": "Indianapolis", "device": "tablet", "premium": false, "signup_date": "2021-09-14"}
]