Part IX: Advanced Topics

Chapter 26: Recommender Systems

Reading Time: 3.5 hours | Prerequisites: Ch 15 (Neural Networks), Ch 12 (Dimensionality Reduction)

1. Learning Objectives

Welcome to Chapter 26! Recommender systems are the unseen engines driving the modern digital economy. By the end of this comprehensive chapter, you will be able to:

Understand the Motivation: Explain the necessity of recommender systems in combating information overload in the era of infinite choices.
Distinguish Core Approaches: Differentiate between Content-Based Filtering, Collaborative Filtering (User-Based and Item-Based), and Hybrid Models.
Master Matrix Factorization: Grasp the mathematics behind Singular Value Decomposition (SVD), Non-Negative Matrix Factorization (NMF), and Alternating Least Squares (ALS).
Implement Deep Learning Recommenders: Build and train Neural Collaborative Filtering (NCF) models and Session-based RNN recommenders using TensorFlow.
Tackle the Cold Start Problem: Apply heuristics and multi-armed bandits to recommend items to brand new users.
Evaluate Rigorously: Calculate and interpret RMSE, MAE, Precision@K, Recall@K, and NDCG to benchmark model performance.
Analyze Real-World Architectures: Dissect how giants like YouTube, Netflix, and Indian unicorns like Flipkart and Hotstar scale their recommendation pipelines.

📚 Exam Tip

When studying for university exams or interviews, focus heavily on the difference between Explicit Feedback (ratings, reviews) and Implicit Feedback (clicks, watch time, purchases). Formulating matrix factorization for implicit feedback is a highly tested concept!

2. Introduction: The Era of Information Overload

Imagine walking into a library that contains every book ever written, but there are no shelves, no catalogs, and no librarians. You just see a mountain of paper. How do you find a book you'd like? This is the digital dilemma. The internet has infinite shelf space, leading to Information Overload.

A Recommender System (RecSys) is an information filtering system that predicts the "rating" or "preference" a user would give to an item. They are the primary catalyst for user retention and revenue generation in modern tech companies. According to industry reports, recommendations drive 35% of Amazon's sales and over 75% of what people watch on Netflix.

The Long Tail Phenomenon

In traditional retail, physical shelf space is expensive. Stores only stock "blockbuster" items—the head of the distribution. However, the internet allows for the stocking of millions of niche items. Recommender systems help users discover these niche items, which collectively make up the "Long Tail," often yielding more total sales than the blockbusters.

💡 Professor's Insight

Recommender systems fundamentally shift the economy from a scarcity mindset to an abundance mindset. Without ML, platforms with millions of items would collapse under their own weight. The algorithm becomes the digital curator.

3. Historical Background

The evolution of recommender systems closely mirrors the evolution of the internet itself, transitioning from simple manual curation to complex deep learning pipelines.

1992 - The Tapestry System: The term "collaborative filtering" was coined by researchers at Xerox PARC. They built Tapestry, an electronic messaging system that allowed users to annotate documents. If User A and User B had similar annotations in the past, Tapestry would recommend documents liked by User A to User B.
1998 - Amazon's Item-to-Item CF: Amazon published a seminal paper on item-to-item collaborative filtering. Instead of finding similar users (which was computationally expensive), they found similar items based on co-purchases. This revolutionized e-commerce.
2006 to 2009 - The Netflix Prize: Netflix offered $1 Million to anyone who could improve their algorithm (Cinematch) by 10%. This competition popularized Matrix Factorization (specifically FunkSVD) and ensemble methods. The prize was won by "BellKor's Pragmatic Chaos".
2016 - Deep Learning Takes Over: YouTube published a landmark paper on using Deep Neural Networks for recommendations, splitting the architecture into Candidate Generation and Ranking phases. This became the industry standard blueprint.

4. Conceptual Explanation

At a high level, recommenders predict the missing entries in a massive User-Item interaction matrix. Let's explore the core paradigms used to solve this.

4.1. Content-Based Filtering (CBF)

Content-based systems recommend items similar to those a user has liked in the past, based on item attributes. If you watch a lot of Sci-Fi movies directed by Christopher Nolan, the system will recommend other Sci-Fi movies or Nolan films.

Pros: No need for data from other users. No cold-start problem for new items.
Cons: Over-specialization (the "filter bubble"). It will never recommend a romantic comedy if you've only watched Sci-Fi.

4.2. Collaborative Filtering (CF)

CF relies entirely on past user-item interactions. It assumes that if users agreed in the past, they will agree in the future.

User-Based CF: "Users who are similar to you also liked..." It computes similarities between rows in the user-item matrix.
Item-Based CF: "Items similar to this item are..." It computes similarities between columns. Item-based is generally preferred in e-commerce because items don't change their nature as quickly as users change their preferences.

4.3. Matrix Factorization

A sophisticated form of CF. It decomposes the large, sparse user-item matrix into two smaller, dense matrices: a User Latent Matrix and an Item Latent Matrix. These latent factors automatically discover abstract concepts (like "action-packed" or "comedy") without explicit labels.

4.4. The Cold Start Problem

What happens when a brand new user joins, or a new movie is uploaded? CF fails because there are no interactions.

New User: Solved using popular items, demographic targeting, or an onboarding questionnaire (e.g., "Select 3 genres you like").
New Item: Solved using Content-Based features or Multi-Armed Bandits (giving the new item a temporary visibility boost to gather initial clicks).

⚠️ Industry Alert

In production, nobody uses just one approach. Modern systems are Hybrids. A standard pipeline uses Collaborative Filtering for candidate generation (fetching the top 1000 items), and a complex Deep Learning model involving Content features for the final Ranking (sorting those 1000 items for the UI).

5. Mathematical Foundation

Let $R$ be the user-item interaction matrix of size $m \times n$ (where $m$ is users, $n$ is items). $r_{ui}$ is the rating given by user $u$ to item $i$.

TF-IDF (Content-Based)

To represent items as vectors based on textual content (e.g., plot summaries), we use Term Frequency-Inverse Document Frequency.

\text{TF}(t, d) = \frac{\text{Count of term } t \text{ in document } d}{\text{Total terms in document } d} $$ $$ \text{IDF}(t) = \log \left( \frac{N}{\text{Number of documents containing term } t} \right) $$ $$ \text{TF-IDF}(t, d) = \text{TF}(t, d) \times \text{IDF}(t)

Similarity Metrics

Once users or items are vectors, we measure distance. Cosine similarity is the gold standard.

\text{Cosine Similarity}(A, B) = \frac{A \cdot B}{||A|| \times ||B||} = \frac{\sum_{i=1}^{n} A_i B_i}{\sqrt{\sum_{i=1}^{n} A_i^2} \sqrt{\sum_{i=1}^{n} B_i^2}}

Matrix Factorization Objective

We want to find user matrix $P$ ($m \times k$) and item matrix $Q$ ($n \times k$) such that their dot product approximates the true ratings $R$. We minimize the squared error with L2 regularization to prevent overfitting.

\min_{P, Q} \sum_{(u,i) \in K} (r_{ui} - p_u \cdot q_i^T)^2 + \lambda (||p_u||^2 + ||q_i||^2)

Where $K$ is the set of observed ratings, $k$ is the number of latent dimensions, and $\lambda$ is the regularization penalty.

6. Formula Derivations

How do we actually find the matrices $P$ and $Q$ from Section 5? We cannot use analytical solvers easily because the matrix $R$ is incredibly sparse (often 99% empty). Instead, we use iterative optimization algorithms.

6.1. Stochastic Gradient Descent (FunkSVD)

Simon Funk famously used this approach during the Netflix Prize. We calculate the prediction error for a specific rating:

e_{ui} = r_{ui} - p_u \cdot q_i^T

We want to minimize the regularized loss $L$. We take the partial derivative of $L$ with respect to a single user parameter $p_{uk}$ and item parameter $q_{ik}$:

\frac{\partial L}{\partial p_{uk}} = -2e_{ui}q_{ik} + 2\lambda p_{uk} $$ $$ \frac{\partial L}{\partial q_{ik}} = -2e_{ui}p_{uk} + 2\lambda q_{ik}

We then update the parameters in the opposite direction of the gradient (where $\gamma$ is the learning rate):

p_{uk} \leftarrow p_{uk} + \gamma (e_{ui}q_{ik} - \lambda p_{uk}) $$ $$ q_{ik} \leftarrow q_{ik} + \gamma (e_{ui}p_{uk} - \lambda q_{ik})

6.2. Alternating Least Squares (ALS)

ALS is preferred when we have implicit feedback (like clicks) and the data is massively distributed (e.g., using Apache Spark). Since both $P$ and $Q$ are unknown, the loss function is non-convex. But if we fix $Q$ as a constant, the function becomes convex quadratic with respect to $P$, and vice versa.

Step 1: Fix $Q$, take the derivative with respect to $p_u$, and set it to zero. Solve analytically for $p_u$.

p_u = (Q^T Q + \lambda I)^{-1} Q^T R_u

Step 2: Fix $P$, solve analytically for $q_i$.

q_i = (P^T P + \lambda I)^{-1} P^T R_i

We alternate between Step 1 and Step 2 until convergence.

7. Worked Numerical Examples

Let's manually compute a User-Based Collaborative Filtering prediction. Suppose we have 3 users and 3 movies. Ratings are out of 5.

User	Movie A (Sci-Fi)	Movie B (Action)	Movie C (Romance)
Alice	5	4	?
Bob	4	5	1
Charlie	1	2	5

Goal: Predict Alice's rating for Movie C.

Step 1: Compute User Similarities (Cosine Similarity) between Alice and others based on common items (A and B).

Alice vector = [5, 4]
Bob vector = [4, 5]
Charlie vector = [1, 2]

Cosine(Alice, Bob) = $(5 \times 4 + 4 \times 5) / (\sqrt{5^2 + 4^2} \times \sqrt{4^2 + 5^2}) = 40 / (6.4 \times 6.4) \approx 0.97$

Cosine(Alice, Charlie) = $(5 \times 1 + 4 \times 2) / (\sqrt{41} \times \sqrt{5}) = 13 / (6.4 \times 2.23) \approx 0.91$

Step 2: Predict using weighted average of ratings for Movie C.

\text{Prediction} = \frac{\sum (\text{Similarity} \times \text{Rating})}{\sum |\text{Similarity}|} $$ $$ \text{Pred(Alice, C)} = \frac{(0.97 \times 1) + (0.91 \times 5)}{0.97 + 0.91} = \frac{0.97 + 4.55}{1.88} = \frac{5.52}{1.88} \approx 2.93

Alice is predicted to give Movie C a ~2.9 rating, which makes sense as she is more similar to Bob (who hated it) than Charlie (who loved it), but the high similarity to Charlie pulls the average up slightly.

8. Visual Diagrams (ASCII Art)

Visualizing Matrix Factorization. We break a sparse matrix $R$ into dense $P$ and $Q^T$.


    [  Users × Items Matrix (R) ]          [ User Matrix (P) ]     [ Item Matrix Transposed (Q^T) ]
      (e.g., 4 users, 5 items)               (4 users, 2 latent factors)    (2 factors, 5 items)

        i1  i2  i3  i4  i5                        k1   k2                   i1  i2  i3  i4  i5
      +--------------------+                    +-------+                 +--------------------+
   u1 | 5   ?   4   ?   1  |                 u1 | 1.2  0.8|            k1 | 2.1 0.4 -1.1 0.9 0.2 |
   u2 | ?   ?   ?   2   5  |      ≈          u2 | -0.5 2.1|     ×      k2 | 0.8 1.5  2.2 0.3 1.9 |
   u3 | 3   1   ?   ?   ?  |                 u3 | 0.9 -0.2|                 +--------------------+
   u4 | ?   5   ?   4   ?  |                 u4 | 1.1  1.5|
      +--------------------+                    +-------+

The prediction for user 1, item 2 (u1, i2) is computed as:
Pred(u1, i2) = (1.2 * 0.4) + (0.8 * 1.5) = 0.48 + 1.20 = 1.68

💻 Code Challenge

Try to mentally calculate the predicted rating for User 3 and Item 1 (u3, i1) using the matrices above. Answer: (0.9 * 2.1) + (-0.2 * 0.8) = 1.89 - 0.16 = 1.73.

9. Flowcharts (ASCII Art)

Modern Large-Scale Recommender Architecture (The Two-Tower / Multi-Stage approach):


 +------------------+
 |   User Request   | (User ID, Context, Time)
 +--------+---------+
          |
          v
 +------------------+      Millions of items in Database
 | 1. Candidate     | <--- Filter down to ~1,000 items
 |    Generation    |      (Uses Fast CF, SVD, or Two-Tower ANN)
 +--------+---------+
          |
          v
 +------------------+      Hundreds of items
 | 2. Feature       | <--- Add heavy features (User Demographics,
 |    Engineering   |      Item Text, Real-time engagement stats)
 +--------+---------+
          |
          v
 +------------------+
 | 3. Scoring /     | <--- Heavy Deep Learning Model (NCF, DLRM)
 |    Ranking       |      Assigns probability/score to each item
 +--------+---------+
          |
          v
 +------------------+      Top 10-50 Items
 | 4. Re-Ranking /  | <--- Apply Business Logic, Diversity Filters,
 |    Filtering     |      Remove previously watched items
 +--------+---------+
          |
          v
 +------------------+
 | Final UI Render  |
 +------------------+

10. Python Implementation (From Scratch)

Let's build a simple Content-Based and Collaborative Filtering system using pure Pandas and Numpy.


import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer

# --- 1. Content-Based Filtering ---
print("--- Content-Based Filtering ---")
movies = pd.DataFrame({
    'movie_id': [1, 2, 3],
    'title': ['Interstellar', 'The Matrix', 'The Notebook'],
    'plot': ['Space travel black hole', 'Hacker discovers reality simulation', 'Poor boy rich girl romance']
})

# Calculate TF-IDF
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(movies['plot'])

# Compute Cosine Similarity between movies
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

def recommend_cbf(title, cosine_sim_matrix, df):
    idx = df.index[df['title'] == title].tolist()[0]
    sim_scores = list(enumerate(cosine_sim_matrix[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    # Get top recommendation (excluding itself)
    top_idx = sim_scores[1][0]
    return df['title'].iloc[top_idx]

print(f"If you liked 'Interstellar', you might like: {recommend_cbf('Interstellar', cosine_sim, movies)}")


# --- 2. User-Based Collaborative Filtering ---
print("\n--- Collaborative Filtering ---")
ratings_dict = {
    'User': ['Alice', 'Alice', 'Bob', 'Bob', 'Charlie', 'Charlie'],
    'Movie': ['M1', 'M2', 'M1', 'M2', 'M2', 'M3'],
    'Rating': [5, 4, 4, 5, 2, 5]
}
df = pd.DataFrame(ratings_dict)
user_item_matrix = df.pivot_table(index='User', columns='Movie', values='Rating').fillna(0)
print("User-Item Matrix:\n", user_item_matrix)

# Compute User Similarity
user_sim = cosine_similarity(user_item_matrix)
user_sim_df = pd.DataFrame(user_sim, index=user_item_matrix.index, columns=user_item_matrix.index)

print("\nUser Similarity Matrix:\n", user_sim_df)

11. TensorFlow Implementation (NCF)

Neural Collaborative Filtering replaces the inner product of Matrix Factorization with a neural architecture that can learn arbitrary non-linear interactions.


import tensorflow as tf
from tensorflow.keras.layers import Input, Embedding, Flatten, Dense, Concatenate
from tensorflow.keras.models import Model

def build_ncf_model(num_users, num_items, latent_dim=8):
    # Inputs
    user_input = Input(shape=(1,), name='user_input')
    item_input = Input(shape=(1,), name='item_input')

    # Embeddings (equivalent to Latent Factors P and Q)
    user_embedding = Embedding(num_users, latent_dim, name='user_emb')(user_input)
    item_embedding = Embedding(num_items, latent_dim, name='item_emb')(item_input)

    # Flatten embeddings
    user_vec = Flatten()(user_embedding)
    item_vec = Flatten()(item_embedding)

    # Concatenate user and item vectors
    concat = Concatenate()([user_vec, item_vec])

    # Deep Neural Network Layers
    fc1 = Dense(32, activation='relu')(concat)
    fc2 = Dense(16, activation='relu')(fc1)
    fc3 = Dense(8, activation='relu')(fc2)

    # Output layer (1 neuron predicting rating)
    output = Dense(1, activation='linear', name='rating_prediction')(fc3)

    model = Model(inputs=[user_input, item_input], outputs=output)
    model.compile(optimizer='adam', loss='mse', metrics=['mae'])
    
    return model

# Assume we have 1000 users and 5000 items
model = build_ncf_model(num_users=1000, num_items=5000)
model.summary()

# Training would look like:
# model.fit([train_user_ids, train_item_ids], train_ratings, epochs=5, batch_size=64)

12. Scikit-Learn and Surprise Pipeline

In practice, building CF algorithms from scratch is inefficient. The scikit-surprise library is the standard in Python for classical recommender systems.


# pip install scikit-surprise
from surprise import Dataset, Reader, SVD
from surprise.model_selection import cross_validate

# 1. Load built-in MovieLens 100K dataset
data = Dataset.load_builtin('ml-100k')

# 2. Initialize the SVD algorithm (Matrix Factorization)
algo = SVD(n_factors=100, n_epochs=20, lr_all=0.005, reg_all=0.02)

# 3. Run 5-fold cross-validation and print results
results = cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

# 4. Train on full dataset and predict
trainset = data.build_full_trainset()
algo.fit(trainset)

# Predict rating for User '196' and Item '302'
pred = algo.predict('196', '302')
print(f"\nPredicted rating: {pred.est:.2f}")

13. Indian Case Studies

🇮🇳 India Spotlight: Localizing Recommendations at Scale

India presents unique challenges for recommender systems due to its vast demographic diversity, linguistic variations, and varying internet speeds.

13.1. Flipkart: Cross-Lingual Product Recommendations

Flipkart caters to millions of users in Tier 2 and Tier 3 cities who often search in vernacular languages or "Hinglish". Their recommendation engine relies heavily on Knowledge Graphs and Multilingual Embeddings (like mBERT). If a user searches for "jute bags", the system maps it to "bori" or "thaila" in local contexts. Moreover, Flipkart adjusts recommendations based on the user's phone model and network speed, suggesting lighter apps or fewer images for low-bandwidth users.

13.2. Hotstar: Handling IPL Traffic Spikes

During the Indian Premier League (IPL), Disney+ Hotstar experiences unprecedented concurrency (often over 25+ million simultaneous viewers). Their recommendation system for VOD (Video on Demand) must gracefully degrade. They pre-compute a massive set of item-item similarities using ALS (Alternating Least Squares) offline and serve these pre-computed recommendations during high-traffic windows via fast Redis caches, rather than evaluating deep neural networks in real-time.

13.3. Spotify India: Hyper-Local Music Discovery

When Spotify entered India, they had to tackle the cold start problem for regional music (Punjabi, Tamil, Telugu). They created hybrid models combining acoustic features of the songs (Content-Based) with the listening habits of early adopters (Collaborative). Their "Punjabi 101" and "Bollywood Mush" playlists are curated using a mix of editorial insight and heavy algorithmic collaborative filtering.

14. Global Case Studies

14.1. The Netflix Prize

In 2006, Netflix released 100 million anonymous movie ratings and offered $1M to anyone who could improve their algorithm (Cinematch) by 10%. The competition ran for 3 years. The winning team, BellKor's Pragmatic Chaos, utilized a massive ensemble of 107 different algorithmic models. The key breakthrough was the inclusion of Temporal Dynamics—recognizing that user ratings shift over time (e.g., someone's taste in 2005 vs 2009) and that the baseline rating of a movie can change depending on when it was rated.

14.2. Amazon: Item-to-Item Collaborative Filtering

Amazon realized early on that computing user-user similarities on a matrix of millions of users was computationally unfeasible in real-time. Instead, they pre-computed item-item similarity offline. Because items (a toaster) don't change their characteristics rapidly, this matrix is stable. When you view a toaster, Amazon simply looks up the pre-computed row for that toaster and recommends the top items. This approach scales logarithmically and has been the backbone of modern e-commerce.

14.3. YouTube: Deep Neural Networks for Recommendations (2016)

YouTube processes 500 hours of video uploaded every minute. They formalized the Two-Stage Recommender Pipeline. The first stage (Candidate Generation) takes the user's history and context, and uses an extremely fast multi-class classifier to select hundreds of videos from a corpus of billions. The second stage (Ranking) uses a heavier Deep Neural Network with rich features (time since last watch, language, demographic) to assign a score and rank the final few dozen videos shown to the user.

15. Startup Applications

Many modern startups pivot their entire business model around superior recommendation engines.

EdTech (e.g., Coursera, Unacademy): Knowledge Tracing models. If a student fails a quiz on Backpropagation, the system recommends a prerequisite video on the Chain Rule. It uses Graph Neural Networks (GNNs) mapped to curriculum ontologies.
FoodTech (e.g., Zomato, Swiggy): Context-aware recommenders. The system knows the time of day, weather, and your location. A rainy Sunday morning triggers recommendations for "Hot Samosas and Chai" from nearby highly-rated vendors, factoring in delivery partner availability.
FashionTech (e.g., Myntra): Visual search and recommendation. Using Convolutional Neural Networks (CNNs), the system extracts visual embeddings from a dress you liked and recommends similar patterns, cuts, and colors, completely bypassing textual descriptions.

16. Government Applications

Recommender systems are increasingly used in e-governance to improve citizen engagement and resource allocation.

MyGov Platform: By analyzing a citizen's profile (age, income bracket, state, occupation), the portal can proactively recommend relevant government schemes (Yojanas), subsidies, or tax benefits, drastically reducing the friction in discovering government aid.
Tourism (Incredible India): Personalized itinerary generators. Based on a user's past travel history and explicit preferences (e.g., "spiritual", "adventure"), the platform recommends specific tourist circuits.
Agriculture (Kisan Suvidha): Recommending optimal crop varieties, fertilizers, and sowing times based on the farmer's localized soil data and real-time weather forecasts.

17. Industry Applications Matrix

Here is a summary of how different industries leverage RecSys:

Industry	Primary Algorithm Used	Key Metric / Goal
E-Commerce (Amazon)	Item-Item CF, Association Rules	Conversion Rate, Average Order Value (AOV)
Streaming (Netflix, Spotify)	Matrix Factorization, Deep Learning	Watch Time, Monthly Active Users (MAU)
Social Media (Instagram, TikTok)	Session-Based RNNs, Reinforcement Learning	Session Length, Engagement (Likes/Shares)
Job Portals (LinkedIn, Naukri)	Content-Based, Knowledge Graphs	Click-Through Rate (CTR) on Job Posts

18. Mini Projects

🚀 Career Path: Portfolio Builders

To land a role as a Machine Learning Engineer specializing in personalization, implement these projects and host them on Streamlit.

Project 1: The Classic Movie Recommender

Goal: Build a web app that takes a user's favorite movies and returns 5 recommendations.

Dataset: MovieLens (100K or 1M).
Tech Stack: Pandas, Scikit-Learn (Cosine Similarity for Content-Based on genres/tags), Scikit-Surprise (SVD for Collaborative).
Challenge: Implement a hybrid function that weights Content-Based score 30% and CF score 70%.

Project 2: E-commerce "Frequently Bought Together" Engine

Goal: Mine transaction logs to find item associations.

Dataset: Instacart Market Basket Analysis (Kaggle).
Tech Stack: Python, Apriori Algorithm, FP-Growth.
Output: When a user adds "Diapers" to the cart, the system suggests "Baby Wipes" and "Beer" (a classic data mining anecdote).

Project 3: Session-Based News Recommender

Goal: Recommend the next news article a user will click in their current session, without knowing their long-term history.

Dataset: MIND (Microsoft News Dataset).
Tech Stack: PyTorch or TensorFlow, GRU/LSTM networks.
Architecture: Feed the sequence of clicked article embeddings into an RNN, and use the final hidden state to predict the next click.

19. Exercises

Test your practical understanding. Try to solve these on paper or in a Jupyter Notebook.

Compute the Cosine Similarity between Item A [1, 0, 1, 1] and Item B [0, 1, 1, 1].
Write a Python function to compute the Pearson Correlation Coefficient between two arrays.
Explain why Pearson Correlation is sometimes preferred over Cosine Similarity for user ratings. (Hint: Mean centering).
Given a 5x5 rating matrix with 10 known ratings, calculate the sparsity of the matrix.
Perform 1 iteration of Alternating Least Squares (ALS) by hand for a 2x2 matrix.
Implement the TF-IDF formula from scratch without using Scikit-Learn.
How does adding an L2 regularization term to FunkSVD prevent overfitting?
Design a database schema for an e-commerce catalog to support fast Content-Based Filtering.
What is the Cold Start Problem, and name two ways to overcome it for a new user.
What is the Cold Start Problem for a new item? How does Multi-Armed Bandit help?
Implement a basic $\epsilon$-greedy algorithm for recommending 5 new articles.
What is the difference between Explicit Feedback and Implicit Feedback? Give 3 examples of each.
Why is Root Mean Squared Error (RMSE) misleading when evaluating implicit feedback models?
Define Precision@K and Recall@K. Calculate them for a system that recommended 10 items, 3 of which the user actually clicked.
Explain NDCG (Normalized Discounted Cumulative Gain). Why is order important?
If a recommender only suggests popular items, what problem does it create in the ecosystem?
Design an A/B testing framework to compare two different recommendation models on a website.
What is a Two-Tower Neural Network architecture? Why is it efficient for Candidate Generation?
Explain the concept of 'Filter Bubble' in social media feeds.
How can Knowledge Graphs enhance recommendations over standard Collaborative Filtering?

20. Multiple Choice Questions

1. Which algorithm is best suited when you only have implicit feedback (clicks, views) and large, sparse datasets?

A) K-Nearest Neighbors
B) Alternating Least Squares (ALS)
C) Pearson Correlation
D) TF-IDF

Correct Answer: B. ALS is specifically designed to handle implicit feedback efficiently, especially in distributed environments like Spark.

2. In Content-Based Filtering, what does IDF stand for?

A) Inverse Document Frequency
B) Internal Data Format
C) Item-Document Frequency
D) Inverse Distribution Factor

Correct Answer: A. Inverse Document Frequency penalizes words that appear in too many documents.

3. The Netflix Prize famously utilized which specific variation of Matrix Factorization?

A) Principal Component Analysis
B) Singular Value Decomposition (FunkSVD)
C) Non-Negative Matrix Factorization
D) Independent Component Analysis

Correct Answer: B. FunkSVD uses stochastic gradient descent to approximate the matrix ignoring missing values.

4. Which metric heavily penalizes a system if a highly relevant item is placed at rank 10 instead of rank 1?

A) Precision@K
B) RMSE
C) NDCG
D) Recall

Correct Answer: C. NDCG (Normalized Discounted Cumulative Gain) uses a logarithmic discount factor based on the rank position.

5. What is the primary disadvantage of User-Based Collaborative Filtering compared to Item-Based CF in large e-commerce sites?

A) It doesn't use ratings.
B) Users change preferences faster than items change features, making the matrix unstable.
C) It requires deep learning.
D) It cannot recommend new items.

Correct Answer: B. Item-item matrices are much more stable and can be pre-computed offline.

6. A user just signed up and hasn't rated anything. This is known as:

A) The Sparse Matrix Problem
B) The Filter Bubble
C) The Cold Start Problem
D) The Long Tail

Correct Answer: C. The Cold Start Problem.

7. In Neural Collaborative Filtering (NCF), what replaces the standard dot product used in Matrix Factorization?

A) Convolutional Layers
B) A Multi-Layer Perceptron (MLP)
C) Recurrent Neural Networks
D) TF-IDF Vectors

Correct Answer: B. An MLP is used to learn arbitrary non-linear interactions between user and item embeddings.

8. What is "Serendipity" in the context of recommender systems?

A) Recommending the most popular items.
B) Recommending items exactly similar to past history.
C) The ability of the system to recommend surprising, yet appealing items.
D) The speed of the recommendation algorithm.

Correct Answer: C. Serendipity helps break the filter bubble and improves long-term user satisfaction.

9. In a Two-Stage architecture (like YouTube's), what is the goal of the first stage?

A) To perfectly rank 10 items.
B) To quickly reduce the corpus from millions to a few hundred candidates.
C) To extract image features from videos.
D) To compute the final NDCG score.

Correct Answer: B. Candidate Generation focuses on high recall and extremely fast inference.

10. Which technique is used to evaluate a recommender system's impact on actual business metrics (like revenue) in production?

A) Cross-validation
B) RMSE calculation
C) A/B Testing
D) Leave-one-out Evaluation

Correct Answer: C. A/B testing randomly routes users to different models to measure real-world performance.

21. Interview Questions

Top product companies (Amazon, Meta, Netflix) frequently ask these conceptual and architectural questions.

Design Amazon's recommender system: Walk through candidate generation, ranking, and the use of item-item collaborative filtering.
Handling implicit feedback: If a user watches 5 minutes of a 2-hour movie, is that a positive or negative signal? How do you model this mathematically?
Cold Start Mitigation: You are launching a brand new music app. You have 0 users. How do you generate the very first recommendations?
Matrix Factorization vs Deep Learning: When would you choose standard ALS over Neural Collaborative Filtering? Explain the trade-offs in latency and accuracy.
Diversity vs Relevance: How do you modify an algorithm to ensure that the top 5 recommended news articles are not all about the exact same topic?
Metrics interpretation: Your offline NDCG went up by 5%, but in online A/B testing, the click-through rate dropped. What could have caused this?
Real-time updates: How do you update user embeddings in real-time as they are actively clicking items in their current session?
Explain the Math: Write down the loss function for Matrix Factorization with L2 regularization and derive the gradient update rule on the whiteboard.
Scale: How do you compute the nearest neighbors for a user when there are 100 million items? (Hint: Approximate Nearest Neighbors, FAISS, ScaNN).
Bias and Fairness: How do you ensure your job recommendation algorithm doesn't inadvertently discriminate based on gender or geography?

22. Research Problems

For those looking to pursue a Master's or PhD in Recommender Systems, here are some cutting-edge open problems:

Causal Inference in Recommendations: Most recommenders suffer from selection bias (they only learn from items users chose to interact with). How can we use Inverse Probability Weighting (IPW) to estimate unbiased causal effects?
Continual / Lifelong Learning: Recommenders suffer from "catastrophic forgetting." How do we update neural models with today's streaming data without forgetting long-term patterns, and without retraining from scratch?
LLMs for RecSys: Can Large Language Models (like GPT-4) be used directly as zero-shot recommenders by feeding user history into the prompt context? How do we solve the context window limit for users with years of history?
Federated Recommendations: Designing algorithms where user data never leaves their mobile device, ensuring absolute privacy while still contributing to a global collaborative model.

23. Key Takeaways

Recommender systems solve information overload by predicting user preferences.
Content-Based relies on item features; Collaborative Filtering relies on past user-item interactions.
Matrix Factorization (SVD, ALS) is the mathematical bedrock of discovering latent factors.
Modern systems at scale use a Multi-Stage architecture: Candidate Generation (fast retrieval of 1000s) $\rightarrow$ Ranking (heavy DL scoring of 100s).
Offline metrics (NDCG, Recall@K) are proxies; true success is measured via online A/B Testing.

24. References & Further Reading

Aggarwal, C. C. (2016). Recommender Systems: The Textbook. Springer.
Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix Factorization Techniques for Recommender Systems. IEEE Computer.
Covington, P., Adams, J., & Sargin, E. (2016). Deep Neural Networks for YouTube Recommendations. RecSys '16.
He, X., et al. (2017). Neural Collaborative Filtering. WWW '17.
Surprise Documentation: http://surpriselib.com/
Stanford CS246: Mining Massive Datasets (Chapter 9: Recommendation Systems).

Appendix A: Complete Hybrid Recommender Source Code

For students wanting to build a complete production-grade system, here is an extended Python implementation of a Hybrid Recommender System combining Content-Based and Collaborative Filtering using object-oriented principles. This is meant to serve as a comprehensive reference.

# ========================================== # HYBRID RECOMMENDER SYSTEM PIPELINE # ========================================== import numpy as np import pandas as pd from scipy.sparse import csr_matrix from sklearn.neighbors import NearestNeighbors from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics.pairwise import linear_kernel import logging logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) class HybridRecommender: """ A robust Hybrid Recommender combining Item-Item Collaborative Filtering with Content-Based Filtering using TF-IDF on item metadata. """ def __init__(self, cf_weight=0.7, cb_weight=0.3): self.cf_weight = cf_weight self.cb_weight = cb_weight self.item_factors = None self.user_factors = None self.tfidf_matrix = None self.cosine_sim = None self.model_knn = NearestNeighbors(metric='cosine', algorithm='brute', n_neighbors=20, n_jobs=-1) self.item_mapper = {} self.item_inv_mapper = {} self.user_mapper = {} self.user_inv_mapper = {} def fit_collaborative(self, ratings_df, user_col='userId', item_col='movieId', rating_col='rating'): """ Fits the collaborative filtering model using k-NN on the sparse user-item matrix. """ logger.info("Fitting Collaborative Filtering model...") # Mapping IDs to continuous indices unique_users = ratings_df[user_col].unique() unique_items = ratings_df[item_col].unique() self.user_mapper = {user_id: i for i, user_id in enumerate(unique_users)} self.item_mapper = {item_id: i for i, item_id in enumerate(unique_items)} self.user_inv_mapper = {i: user_id for i, user_id in enumerate(unique_users)} self.item_inv_mapper = {i: item_id for i, item_id in enumerate(unique_items)} user_indices = [self.user_mapper[i] for i in ratings_df[user_col]] item_indices = [self.item_mapper[i] for i in ratings_df[item_col]] self.user_item_matrix = csr_matrix((ratings_df[rating_col], (user_indices, item_indices)), shape=(len(unique_users), len(unique_items))) self.model_knn.fit(self.user_item_matrix.T) logger.info("Collaborative Filtering model fitted successfully.") def fit_content_based(self, items_df, item_col='movieId', text_col='description'): """ Fits the content-based model using TF-IDF on item descriptions. """ logger.info("Fitting Content-Based Filtering model...") tfidf = TfidfVectorizer(stop_words='english', max_features=5000) # Ensure items align with the CF matrix items_df['mapped_id'] = items_df[item_col].map(self.item_mapper) items_df = items_df.dropna(subset=['mapped_id']).sort_values('mapped_id') self.tfidf_matrix = tfidf.fit_transform(items_df[text_col]) self.cosine_sim = linear_kernel(self.tfidf_matrix, self.tfidf_matrix) logger.info("Content-Based Filtering model fitted successfully.") def get_cf_recommendations(self, item_id, n_recommendations=10): if item_id not in self.item_mapper: return {} idx = self.item_mapper[item_id] distances, indices = self.model_knn.kneighbors(self.user_item_matrix.T[idx], n_neighbors=n_recommendations+1) raw_recommends = \ sorted(list(zip(indices.squeeze().tolist(), distances.squeeze().tolist())), key=lambda x: x[1])[:0:-1] # Convert distances to similarity scores (1 - distance) return {self.item_inv_mapper[idx]: (1 - dist) for idx, dist in raw_recommends} def get_cb_recommendations(self, item_id, n_recommendations=10): if item_id not in self.item_mapper: return {} idx = self.item_mapper[item_id] sim_scores = list(enumerate(self.cosine_sim[idx])) sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True) sim_scores = sim_scores[1:n_recommendations+1] return {self.item_inv_mapper[i]: score for i, score in sim_scores} def recommend(self, item_id, n_recommendations=10): logger.info(f"Generating hybrid recommendations for item {item_id}") cf_preds = self.get_cf_recommendations(item_id, n_recommendations * 2) cb_preds = self.get_cb_recommendations(item_id, n_recommendations * 2) hybrid_scores = {} all_items = set(cf_preds.keys()).union(set(cb_preds.keys())) for item in all_items: cf_score = cf_preds.get(item, 0.0) cb_score = cb_preds.get(item, 0.0) hybrid_scores[item] = (cf_score * self.cf_weight) + (cb_score * self.cb_weight) sorted_hybrid = sorted(hybrid_scores.items(), key=lambda x: x[1], reverse=True) return sorted_hybrid[:n_recommendations] # Example Usage Block if __name__ == "__main__": # Dummy data generation for testing the HybridRecommender np.random.seed(42) users = np.random.randint(1, 1000, size=5000) items = np.random.randint(1, 500, size=5000) ratings = np.random.randint(1, 6, size=5000) ratings_df = pd.DataFrame({'userId': users, 'movieId': items, 'rating': ratings}) # Generate dummy item metadata unique_items_df = pd.DataFrame({'movieId': np.unique(items)}) vocab = ["action", "romance", "space", "alien", "comedy", "drama", "thriller", "heist", "magic", "historical"] descriptions = [" ".join(np.random.choice(vocab, size=5)) for _ in range(len(unique_items_df))] unique_items_df['description'] = descriptions recommender = HybridRecommender(cf_weight=0.6, cb_weight=0.4) recommender.fit_collaborative(ratings_df) recommender.fit_content_based(unique_items_df) sample_item = unique_items_df['movieId'].iloc[0] recs = recommender.recommend(sample_item, n_recommendations=5) print(f"Top 5 Hybrid Recommendations for item {sample_item}:") for item, score in recs: print(f"Item ID: {item}, Score: {score:.4f}")

Appendix B: Sample Clickstream Dataset (JSON)

Below is a simulated dataset of 500 user interaction logs. This represents the raw implicit feedback that is typically ingested by Apache Kafka in a production recommender system. You can copy this data to test your algorithms.

[ {"user_id": "U721", "session_id": "S1001", "item_id": "I042", "timestamp": "2023-10-01T08:12:34Z", "event_type": "view", "duration_sec": 45}, {"user_id": "U721", "session_id": "S1001", "item_id": "I089", "timestamp": "2023-10-01T08:14:02Z", "event_type": "click", "duration_sec": 120}, {"user_id": "U314", "session_id": "S1002", "item_id": "I042", "timestamp": "2023-10-01T08:15:10Z", "event_type": "purchase", "duration_sec": 300}, {"user_id": "U089", "session_id": "S1003", "item_id": "I112", "timestamp": "2023-10-01T08:16:05Z", "event_type": "view", "duration_sec": 12}, {"user_id": "U555", "session_id": "S1004", "item_id": "I999", "timestamp": "2023-10-01T08:20:00Z", "event_type": "add_to_cart", "duration_sec": 40}, {"user_id": "U721", "session_id": "S1001", "item_id": "I042", "timestamp": "2023-10-01T08:21:34Z", "event_type": "view", "duration_sec": 145}, {"user_id": "U721", "session_id": "S1001", "item_id": "I089", "timestamp": "2023-10-01T08:24:02Z", "event_type": "click", "duration_sec": 20}, {"user_id": "U314", "session_id": "S1002", "item_id": "I042", "timestamp": "2023-10-01T08:25:10Z", "event_type": "purchase", "duration_sec": 350}, {"user_id": "U089", "session_id": "S1003", "item_id": "I112", "timestamp": "2023-10-01T08:26:05Z", "event_type": "view", "duration_sec": 10}, {"user_id": "U555", "session_id": "S1004", "item_id": "I999", "timestamp": "2023-10-01T08:30:00Z", "event_type": "add_to_cart", "duration_sec": 60}, {"user_id": "U123", "session_id": "S1005", "item_id": "I101", "timestamp": "2023-10-01T08:31:00Z", "event_type": "view", "duration_sec": 5}, {"user_id": "U123", "session_id": "S1005", "item_id": "I102", "timestamp": "2023-10-01T08:32:00Z", "event_type": "view", "duration_sec": 8}, {"user_id": "U123", "session_id": "S1005", "item_id": "I103", "timestamp": "2023-10-01T08:35:00Z", "event_type": "click", "duration_sec": 45}, {"user_id": "U444", "session_id": "S1006", "item_id": "I201", "timestamp": "2023-10-01T08:40:00Z", "event_type": "purchase", "duration_sec": 120}, {"user_id": "U444", "session_id": "S1006", "item_id": "I202", "timestamp": "2023-10-01T08:42:00Z", "event_type": "view", "duration_sec": 15}, {"user_id": "U999", "session_id": "S1007", "item_id": "I301", "timestamp": "2023-10-01T08:45:00Z", "event_type": "add_to_cart", "duration_sec": 90}, {"user_id": "U999", "session_id": "S1007", "item_id": "I302", "timestamp": "2023-10-01T08:48:00Z", "event_type": "view", "duration_sec": 22}, {"user_id": "U888", "session_id": "S1008", "item_id": "I401", "timestamp": "2023-10-01T08:50:00Z", "event_type": "click", "duration_sec": 33}, {"user_id": "U888", "session_id": "S1008", "item_id": "I402", "timestamp": "2023-10-01T08:55:00Z", "event_type": "purchase", "duration_sec": 210}, {"user_id": "U777", "session_id": "S1009", "item_id": "I501", "timestamp": "2023-10-01T09:00:00Z", "event_type": "view", "duration_sec": 7}, {"user_id": "U777", "session_id": "S1009", "item_id": "I502", "timestamp": "2023-10-01T09:05:00Z", "event_type": "add_to_cart", "duration_sec": 55}, {"user_id": "U666", "session_id": "S1010", "item_id": "I601", "timestamp": "2023-10-01T09:10:00Z", "event_type": "click", "duration_sec": 110}, {"user_id": "U666", "session_id": "S1010", "item_id": "I602", "timestamp": "2023-10-01T09:15:00Z", "event_type": "view", "duration_sec": 18}, {"user_id": "U555", "session_id": "S1011", "item_id": "I701", "timestamp": "2023-10-01T09:20:00Z", "event_type": "purchase", "duration_sec": 400}, {"user_id": "U555", "session_id": "S1011", "item_id": "I702", "timestamp": "2023-10-01T09:25:00Z", "event_type": "view", "duration_sec": 25}, {"user_id": "U444", "session_id": "S1012", "item_id": "I801", "timestamp": "2023-10-01T09:30:00Z", "event_type": "add_to_cart", "duration_sec": 70}, {"user_id": "U444", "session_id": "S1012", "item_id": "I802", "timestamp": "2023-10-01T09:35:00Z", "event_type": "view", "duration_sec": 12}, {"user_id": "U333", "session_id": "S1013", "item_id": "I901", "timestamp": "2023-10-01T09:40:00Z", "event_type": "click", "duration_sec": 65}, {"user_id": "U333", "session_id": "S1013", "item_id": "I902", "timestamp": "2023-10-01T09:45:00Z", "event_type": "purchase", "duration_sec": 280}, {"user_id": "U222", "session_id": "S1014", "item_id": "I011", "timestamp": "2023-10-01T09:50:00Z", "event_type": "view", "duration_sec": 9}, {"user_id": "U222", "session_id": "S1014", "item_id": "I012", "timestamp": "2023-10-01T09:55:00Z", "event_type": "add_to_cart", "duration_sec": 45}, {"user_id": "U111", "session_id": "S1015", "item_id": "I021", "timestamp": "2023-10-01T10:00:00Z", "event_type": "click", "duration_sec": 130}, {"user_id": "U111", "session_id": "S1015", "item_id": "I022", "timestamp": "2023-10-01T10:05:00Z", "event_type": "view", "duration_sec": 20}, {"user_id": "U000", "session_id": "S1016", "item_id": "I031", "timestamp": "2023-10-01T10:10:00Z", "event_type": "purchase", "duration_sec": 500}, {"user_id": "U000", "session_id": "S1016", "item_id": "I032", "timestamp": "2023-10-01T10:15:00Z", "event_type": "view", "duration_sec": 30}, {"user_id": "U123", "session_id": "S1017", "item_id": "I041", "timestamp": "2023-10-01T10:20:00Z", "event_type": "add_to_cart", "duration_sec": 85}, {"user_id": "U123", "session_id": "S1017", "item_id": "I042", "timestamp": "2023-10-01T10:25:00Z", "event_type": "view", "duration_sec": 14}, {"user_id": "U234", "session_id": "S1018", "item_id": "I051", "timestamp": "2023-10-01T10:30:00Z", "event_type": "click", "duration_sec": 75}, {"user_id": "U234", "session_id": "S1018", "item_id": "I052", "timestamp": "2023-10-01T10:35:00Z", "event_type": "purchase", "duration_sec": 320}, {"user_id": "U345", "session_id": "S1019", "item_id": "I061", "timestamp": "2023-10-01T10:40:00Z", "event_type": "view", "duration_sec": 11}, {"user_id": "U345", "session_id": "S1019", "item_id": "I062", "timestamp": "2023-10-01T10:45:00Z", "event_type": "add_to_cart", "duration_sec": 50}, {"user_id": "U456", "session_id": "S1020", "item_id": "I071", "timestamp": "2023-10-01T10:50:00Z", "event_type": "click", "duration_sec": 140}, {"user_id": "U456", "session_id": "S1020", "item_id": "I072", "timestamp": "2023-10-01T10:55:00Z", "event_type": "view", "duration_sec": 22}, {"user_id": "U567", "session_id": "S1021", "item_id": "I081", "timestamp": "2023-10-01T11:00:00Z", "event_type": "purchase", "duration_sec": 600}, {"user_id": "U567", "session_id": "S1021", "item_id": "I082", "timestamp": "2023-10-01T11:05:00Z", "event_type": "view", "duration_sec": 35}, {"user_id": "U678", "session_id": "S1022", "item_id": "I091", "timestamp": "2023-10-01T11:10:00Z", "event_type": "add_to_cart", "duration_sec": 95}, {"user_id": "U678", "session_id": "S1022", "item_id": "I092", "timestamp": "2023-10-01T11:15:00Z", "event_type": "view", "duration_sec": 16}, {"user_id": "U789", "session_id": "S1023", "item_id": "I101", "timestamp": "2023-10-01T11:20:00Z", "event_type": "click", "duration_sec": 85}, {"user_id": "U789", "session_id": "S1023", "item_id": "I102", "timestamp": "2023-10-01T11:25:00Z", "event_type": "purchase", "duration_sec": 360}, {"user_id": "U890", "session_id": "S1024", "item_id": "I111", "timestamp": "2023-10-01T11:30:00Z", "event_type": "view", "duration_sec": 13}, {"user_id": "U890", "session_id": "S1024", "item_id": "I112", "timestamp": "2023-10-01T11:35:00Z", "event_type": "add_to_cart", "duration_sec": 60}, {"user_id": "U901", "session_id": "S1025", "item_id": "I121", "timestamp": "2023-10-01T11:40:00Z", "event_type": "click", "duration_sec": 150}, {"user_id": "U901", "session_id": "S1025", "item_id": "I122", "timestamp": "2023-10-01T11:45:00Z", "event_type": "view", "duration_sec": 24}, {"user_id": "U012", "session_id": "S1026", "item_id": "I131", "timestamp": "2023-10-01T11:50:00Z", "event_type": "purchase", "duration_sec": 700}, {"user_id": "U012", "session_id": "S1026", "item_id": "I132", "timestamp": "2023-10-01T11:55:00Z", "event_type": "view", "duration_sec": 40}, {"user_id": "U123", "session_id": "S1027", "item_id": "I141", "timestamp": "2023-10-01T12:00:00Z", "event_type": "add_to_cart", "duration_sec": 105}, {"user_id": "U123", "session_id": "S1027", "item_id": "I142", "timestamp": "2023-10-01T12:05:00Z", "event_type": "view", "duration_sec": 18}, {"user_id": "U234", "session_id": "S1028", "item_id": "I151", "timestamp": "2023-10-01T12:10:00Z", "event_type": "click", "duration_sec": 95}, {"user_id": "U234", "session_id": "S1028", "item_id": "I152", "timestamp": "2023-10-01T12:15:00Z", "event_type": "purchase", "duration_sec": 400}, {"user_id": "U345", "session_id": "S1029", "item_id": "I161", "timestamp": "2023-10-01T12:20:00Z", "event_type": "view", "duration_sec": 15}, {"user_id": "U345", "session_id": "S1029", "item_id": "I162", "timestamp": "2023-10-01T12:25:00Z", "event_type": "add_to_cart", "duration_sec": 70}, {"user_id": "U456", "session_id": "S1030", "item_id": "I171", "timestamp": "2023-10-01T12:30:00Z", "event_type": "click", "duration_sec": 160}, {"user_id": "U456", "session_id": "S1030", "item_id": "I172", "timestamp": "2023-10-01T12:35:00Z", "event_type": "view", "duration_sec": 26}, {"user_id": "U567", "session_id": "S1031", "item_id": "I181", "timestamp": "2023-10-01T12:40:00Z", "event_type": "purchase", "duration_sec": 800}, {"user_id": "U567", "session_id": "S1031", "item_id": "I182", "timestamp": "2023-10-01T12:45:00Z", "event_type": "view", "duration_sec": 45}, {"user_id": "U678", "session_id": "S1032", "item_id": "I191", "timestamp": "2023-10-01T12:50:00Z", "event_type": "add_to_cart", "duration_sec": 115}, {"user_id": "U678", "session_id": "S1032", "item_id": "I192", "timestamp": "2023-10-01T12:55:00Z", "event_type": "view", "duration_sec": 20}, {"user_id": "U789", "session_id": "S1033", "item_id": "I201", "timestamp": "2023-10-01T13:00:00Z", "event_type": "click", "duration_sec": 105}, {"user_id": "U789", "session_id": "S1033", "item_id": "I202", "timestamp": "2023-10-01T13:05:00Z", "event_type": "purchase", "duration_sec": 440}, {"user_id": "U890", "session_id": "S1034", "item_id": "I211", "timestamp": "2023-10-01T13:10:00Z", "event_type": "view", "duration_sec": 17}, {"user_id": "U890", "session_id": "S1034", "item_id": "I212", "timestamp": "2023-10-01T13:15:00Z", "event_type": "add_to_cart", "duration_sec": 80}, {"user_id": "U901", "session_id": "S1035", "item_id": "I221", "timestamp": "2023-10-01T13:20:00Z", "event_type": "click", "duration_sec": 170}, {"user_id": "U901", "session_id": "S1035", "item_id": "I222", "timestamp": "2023-10-01T13:25:00Z", "event_type": "view", "duration_sec": 28}, {"user_id": "U012", "session_id": "S1036", "item_id": "I231", "timestamp": "2023-10-01T13:30:00Z", "event_type": "purchase", "duration_sec": 900}, {"user_id": "U012", "session_id": "S1036", "item_id": "I232", "timestamp": "2023-10-01T13:35:00Z", "event_type": "view", "duration_sec": 50}, {"user_id": "U123", "session_id": "S1037", "item_id": "I241", "timestamp": "2023-10-01T13:40:00Z", "event_type": "add_to_cart", "duration_sec": 125}, {"user_id": "U123", "session_id": "S1037", "item_id": "I242", "timestamp": "2023-10-01T13:45:00Z", "event_type": "view", "duration_sec": 22}, {"user_id": "U234", "session_id": "S1038", "item_id": "I251", "timestamp": "2023-10-01T13:50:00Z", "event_type": "click", "duration_sec": 115}, {"user_id": "U234", "session_id": "S1038", "item_id": "I252", "timestamp": "2023-10-01T13:55:00Z", "event_type": "purchase", "duration_sec": 480}, {"user_id": "U345", "session_id": "S1039", "item_id": "I261", "timestamp": "2023-10-01T14:00:00Z", "event_type": "view", "duration_sec": 19}, {"user_id": "U345", "session_id": "S1039", "item_id": "I262", "timestamp": "2023-10-01T14:05:00Z", "event_type": "add_to_cart", "duration_sec": 90}, {"user_id": "U456", "session_id": "S1040", "item_id": "I271", "timestamp": "2023-10-01T14:10:00Z", "event_type": "click", "duration_sec": 180}, {"user_id": "U456", "session_id": "S1040", "item_id": "I272", "timestamp": "2023-10-01T14:15:00Z", "event_type": "view", "duration_sec": 30}, {"user_id": "U567", "session_id": "S1041", "item_id": "I281", "timestamp": "2023-10-01T14:20:00Z", "event_type": "purchase", "duration_sec": 1000}, {"user_id": "U567", "session_id": "S1041", "item_id": "I282", "timestamp": "2023-10-01T14:25:00Z", "event_type": "view", "duration_sec": 55}, {"user_id": "U678", "session_id": "S1042", "item_id": "I291", "timestamp": "2023-10-01T14:30:00Z", "event_type": "add_to_cart", "duration_sec": 135}, {"user_id": "U678", "session_id": "S1042", "item_id": "I292", "timestamp": "2023-10-01T14:35:00Z", "event_type": "view", "duration_sec": 24}, {"user_id": "U789", "session_id": "S1043", "item_id": "I301", "timestamp": "2023-10-01T14:40:00Z", "event_type": "click", "duration_sec": 125}, {"user_id": "U789", "session_id": "S1043", "item_id": "I302", "timestamp": "2023-10-01T14:45:00Z", "event_type": "purchase", "duration_sec": 520}, {"user_id": "U890", "session_id": "S1044", "item_id": "I311", "timestamp": "2023-10-01T14:50:00Z", "event_type": "view", "duration_sec": 21}, {"user_id": "U890", "session_id": "S1044", "item_id": "I312", "timestamp": "2023-10-01T14:55:00Z", "event_type": "add_to_cart", "duration_sec": 100}, {"user_id": "U901", "session_id": "S1045", "item_id": "I321", "timestamp": "2023-10-01T15:00:00Z", "event_type": "click", "duration_sec": 190}, {"user_id": "U901", "session_id": "S1045", "item_id": "I322", "timestamp": "2023-10-01T15:05:00Z", "event_type": "view", "duration_sec": 32}, {"user_id": "U012", "session_id": "S1046", "item_id": "I331", "timestamp": "2023-10-01T15:10:00Z", "event_type": "purchase", "duration_sec": 1100}, {"user_id": "U012", "session_id": "S1046", "item_id": "I332", "timestamp": "2023-10-01T15:15:00Z", "event_type": "view", "duration_sec": 60}, {"user_id": "U123", "session_id": "S1047", "item_id": "I341", "timestamp": "2023-10-01T15:20:00Z", "event_type": "add_to_cart", "duration_sec": 145}, {"user_id": "U123", "session_id": "S1047", "item_id": "I342", "timestamp": "2023-10-01T15:25:00Z", "event_type": "view", "duration_sec": 26}, {"user_id": "U234", "session_id": "S1048", "item_id": "I351", "timestamp": "2023-10-01T15:30:00Z", "event_type": "click", "duration_sec": 135}, {"user_id": "U234", "session_id": "S1048", "item_id": "I352", "timestamp": "2023-10-01T15:35:00Z", "event_type": "purchase", "duration_sec": 560}, {"user_id": "U345", "session_id": "S1049", "item_id": "I361", "timestamp": "2023-10-01T15:40:00Z", "event_type": "view", "duration_sec": 23}, {"user_id": "U345", "session_id": "S1049", "item_id": "I362", "timestamp": "2023-10-01T15:45:00Z", "event_type": "add_to_cart", "duration_sec": 110}, {"user_id": "U456", "session_id": "S1050", "item_id": "I371", "timestamp": "2023-10-01T15:50:00Z", "event_type": "click", "duration_sec": 200}, {"user_id": "U456", "session_id": "S1050", "item_id": "I372", "timestamp": "2023-10-01T15:55:00Z", "event_type": "view", "duration_sec": 34}, {"user_id": "U567", "session_id": "S1051", "item_id": "I381", "timestamp": "2023-10-01T16:00:00Z", "event_type": "purchase", "duration_sec": 1200}, {"user_id": "U567", "session_id": "S1051", "item_id": "I382", "timestamp": "2023-10-01T16:05:00Z", "event_type": "view", "duration_sec": 65}, {"user_id": "U678", "session_id": "S1052", "item_id": "I391", "timestamp": "2023-10-01T16:10:00Z", "event_type": "add_to_cart", "duration_sec": 155}, {"user_id": "U678", "session_id": "S1052", "item_id": "I392", "timestamp": "2023-10-01T16:15:00Z", "event_type": "view", "duration_sec": 28}, {"user_id": "U789", "session_id": "S1053", "item_id": "I401", "timestamp": "2023-10-01T16:20:00Z", "event_type": "click", "duration_sec": 145}, {"user_id": "U789", "session_id": "S1053", "item_id": "I402", "timestamp": "2023-10-01T16:25:00Z", "event_type": "purchase", "duration_sec": 600}, {"user_id": "U890", "session_id": "S1054", "item_id": "I411", "timestamp": "2023-10-01T16:30:00Z", "event_type": "view", "duration_sec": 25}, {"user_id": "U890", "session_id": "S1054", "item_id": "I412", "timestamp": "2023-10-01T16:35:00Z", "event_type": "add_to_cart", "duration_sec": 120}, {"user_id": "U901", "session_id": "S1055", "item_id": "I421", "timestamp": "2023-10-01T16:40:00Z", "event_type": "click", "duration_sec": 210}, {"user_id": "U901", "session_id": "S1055", "item_id": "I422", "timestamp": "2023-10-01T16:45:00Z", "event_type": "view", "duration_sec": 36}, {"user_id": "U012", "session_id": "S1056", "item_id": "I431", "timestamp": "2023-10-01T16:50:00Z", "event_type": "purchase", "duration_sec": 1300}, {"user_id": "U012", "session_id": "S1056", "item_id": "I432", "timestamp": "2023-10-01T16:55:00Z", "event_type": "view", "duration_sec": 70}, {"user_id": "U123", "session_id": "S1057", "item_id": "I441", "timestamp": "2023-10-01T17:00:00Z", "event_type": "add_to_cart", "duration_sec": 165}, {"user_id": "U123", "session_id": "S1057", "item_id": "I442", "timestamp": "2023-10-01T17:05:00Z", "event_type": "view", "duration_sec": 30}, {"user_id": "U234", "session_id": "S1058", "item_id": "I451", "timestamp": "2023-10-01T17:10:00Z", "event_type": "click", "duration_sec": 155}, {"user_id": "U234", "session_id": "S1058", "item_id": "I452", "timestamp": "2023-10-01T17:15:00Z", "event_type": "purchase", "duration_sec": 640}, {"user_id": "U345", "session_id": "S1059", "item_id": "I461", "timestamp": "2023-10-01T17:20:00Z", "event_type": "view", "duration_sec": 27}, {"user_id": "U345", "session_id": "S1059", "item_id": "I462", "timestamp": "2023-10-01T17:25:00Z", "event_type": "add_to_cart", "duration_sec": 130}, {"user_id": "U456", "session_id": "S1060", "item_id": "I471", "timestamp": "2023-10-01T17:30:00Z", "event_type": "click", "duration_sec": 220}, {"user_id": "U456", "session_id": "S1060", "item_id": "I472", "timestamp": "2023-10-01T17:35:00Z", "event_type": "view", "duration_sec": 38}, {"user_id": "U567", "session_id": "S1061", "item_id": "I481", "timestamp": "2023-10-01T17:40:00Z", "event_type": "purchase", "duration_sec": 1400}, {"user_id": "U567", "session_id": "S1061", "item_id": "I482", "timestamp": "2023-10-01T17:45:00Z", "event_type": "view", "duration_sec": 75}, {"user_id": "U678", "session_id": "S1062", "item_id": "I491", "timestamp": "2023-10-01T17:50:00Z", "event_type": "add_to_cart", "duration_sec": 175}, {"user_id": "U678", "session_id": "S1062", "item_id": "I492", "timestamp": "2023-10-01T17:55:00Z", "event_type": "view", "duration_sec": 32}, {"user_id": "U789", "session_id": "S1063", "item_id": "I501", "timestamp": "2023-10-01T18:00:00Z", "event_type": "click", "duration_sec": 165}, {"user_id": "U789", "session_id": "S1063", "item_id": "I502", "timestamp": "2023-10-01T18:05:00Z", "event_type": "purchase", "duration_sec": 680}, {"user_id": "U890", "session_id": "S1064", "item_id": "I511", "timestamp": "2023-10-01T18:10:00Z", "event_type": "view", "duration_sec": 29}, {"user_id": "U890", "session_id": "S1064", "item_id": "I512", "timestamp": "2023-10-01T18:15:00Z", "event_type": "add_to_cart", "duration_sec": 140}, {"user_id": "U901", "session_id": "S1065", "item_id": "I521", "timestamp": "2023-10-01T18:20:00Z", "event_type": "click", "duration_sec": 230}, {"user_id": "U901", "session_id": "S1065", "item_id": "I522", "timestamp": "2023-10-01T18:25:00Z", "event_type": "view", "duration_sec": 40}, {"user_id": "U012", "session_id": "S1066", "item_id": "I531", "timestamp": "2023-10-01T18:30:00Z", "event_type": "purchase", "duration_sec": 1500}, {"user_id": "U012", "session_id": "S1066", "item_id": "I532", "timestamp": "2023-10-01T18:35:00Z", "event_type": "view", "duration_sec": 80}, {"user_id": "U123", "session_id": "S1067", "item_id": "I541", "timestamp": "2023-10-01T18:40:00Z", "event_type": "add_to_cart", "duration_sec": 185}, {"user_id": "U123", "session_id": "S1067", "item_id": "I542", "timestamp": "2023-10-01T18:45:00Z", "event_type": "view", "duration_sec": 34}, {"user_id": "U234", "session_id": "S1068", "item_id": "I551", "timestamp": "2023-10-01T18:50:00Z", "event_type": "click", "duration_sec": 175}, {"user_id": "U234", "session_id": "S1068", "item_id": "I552", "timestamp": "2023-10-01T18:55:00Z", "event_type": "purchase", "duration_sec": 720}, {"user_id": "U345", "session_id": "S1069", "item_id": "I561", "timestamp": "2023-10-01T19:00:00Z", "event_type": "view", "duration_sec": 31}, {"user_id": "U345", "session_id": "S1069", "item_id": "I562", "timestamp": "2023-10-01T19:05:00Z", "event_type": "add_to_cart", "duration_sec": 150}, {"user_id": "U456", "session_id": "S1070", "item_id": "I571", "timestamp": "2023-10-01T19:10:00Z", "event_type": "click", "duration_sec": 240}, {"user_id": "U456", "session_id": "S1070", "item_id": "I572", "timestamp": "2023-10-01T19:15:00Z", "event_type": "view", "duration_sec": 42}, {"user_id": "U567", "session_id": "S1071", "item_id": "I581", "timestamp": "2023-10-01T19:20:00Z", "event_type": "purchase", "duration_sec": 1600}, {"user_id": "U567", "session_id": "S1071", "item_id": "I582", "timestamp": "2023-10-01T19:25:00Z", "event_type": "view", "duration_sec": 85}, {"user_id": "U678", "session_id": "S1072", "item_id": "I591", "timestamp": "2023-10-01T19:30:00Z", "event_type": "add_to_cart", "duration_sec": 195}, {"user_id": "U678", "session_id": "S1072", "item_id": "I592", "timestamp": "2023-10-01T19:35:00Z", "event_type": "view", "duration_sec": 36}, {"user_id": "U789", "session_id": "S1073", "item_id": "I601", "timestamp": "2023-10-01T19:40:00Z", "event_type": "click", "duration_sec": 185}, {"user_id": "U789", "session_id": "S1073", "item_id": "I602", "timestamp": "2023-10-01T19:45:00Z", "event_type": "purchase", "duration_sec": 760}, {"user_id": "U890", "session_id": "S1074", "item_id": "I611", "timestamp": "2023-10-01T19:50:00Z", "event_type": "view", "duration_sec": 33}, {"user_id": "U890", "session_id": "S1074", "item_id": "I612", "timestamp": "2023-10-01T19:55:00Z", "event_type": "add_to_cart", "duration_sec": 160}, {"user_id": "U901", "session_id": "S1075", "item_id": "I621", "timestamp": "2023-10-01T20:00:00Z", "event_type": "click", "duration_sec": 250}, {"user_id": "U901", "session_id": "S1075", "item_id": "I622", "timestamp": "2023-10-01T20:05:00Z", "event_type": "view", "duration_sec": 44}, {"user_id": "U012", "session_id": "S1076", "item_id": "I631", "timestamp": "2023-10-01T20:10:00Z", "event_type": "purchase", "duration_sec": 1700}, {"user_id": "U012", "session_id": "S1076", "item_id": "I632", "timestamp": "2023-10-01T20:15:00Z", "event_type": "view", "duration_sec": 90}, {"user_id": "U123", "session_id": "S1077", "item_id": "I641", "timestamp": "2023-10-01T20:20:00Z", "event_type": "add_to_cart", "duration_sec": 205}, {"user_id": "U123", "session_id": "S1077", "item_id": "I642", "timestamp": "2023-10-01T20:25:00Z", "event_type": "view", "duration_sec": 38}, {"user_id": "U234", "session_id": "S1078", "item_id": "I651", "timestamp": "2023-10-01T20:30:00Z", "event_type": "click", "duration_sec": 195}, {"user_id": "U234", "session_id": "S1078", "item_id": "I652", "timestamp": "2023-10-01T20:35:00Z", "event_type": "purchase", "duration_sec": 800}, {"user_id": "U345", "session_id": "S1079", "item_id": "I661", "timestamp": "2023-10-01T20:40:00Z", "event_type": "view", "duration_sec": 35}, {"user_id": "U345", "session_id": "S1079", "item_id": "I662", "timestamp": "2023-10-01T20:45:00Z", "event_type": "add_to_cart", "duration_sec": 170}, {"user_id": "U456", "session_id": "S1080", "item_id": "I671", "timestamp": "2023-10-01T20:50:00Z", "event_type": "click", "duration_sec": 260}, {"user_id": "U456", "session_id": "S1080", "item_id": "I672", "timestamp": "2023-10-01T20:55:00Z", "event_type": "view", "duration_sec": 46}, {"user_id": "U567", "session_id": "S1081", "item_id": "I681", "timestamp": "2023-10-01T21:00:00Z", "event_type": "purchase", "duration_sec": 1800}, {"user_id": "U567", "session_id": "S1081", "item_id": "I682", "timestamp": "2023-10-01T21:05:00Z", "event_type": "view", "duration_sec": 95}, {"user_id": "U678", "session_id": "S1082", "item_id": "I691", "timestamp": "2023-10-01T21:10:00Z", "event_type": "add_to_cart", "duration_sec": 215}, {"user_id": "U678", "session_id": "S1082", "item_id": "I692", "timestamp": "2023-10-01T21:15:00Z", "event_type": "view", "duration_sec": 40}, {"user_id": "U789", "session_id": "S1083", "item_id": "I701", "timestamp": "2023-10-01T21:20:00Z", "event_type": "click", "duration_sec": 205}, {"user_id": "U789", "session_id": "S1083", "item_id": "I702", "timestamp": "2023-10-01T21:25:00Z", "event_type": "purchase", "duration_sec": 840}, {"user_id": "U890", "session_id": "S1084", "item_id": "I711", "timestamp": "2023-10-01T21:30:00Z", "event_type": "view", "duration_sec": 37}, {"user_id": "U890", "session_id": "S1084", "item_id": "I712", "timestamp": "2023-10-01T21:35:00Z", "event_type": "add_to_cart", "duration_sec": 180}, {"user_id": "U901", "session_id": "S1085", "item_id": "I721", "timestamp": "2023-10-01T21:40:00Z", "event_type": "click", "duration_sec": 270}, {"user_id": "U901", "session_id": "S1085", "item_id": "I722", "timestamp": "2023-10-01T21:45:00Z", "event_type": "view", "duration_sec": 48}, {"user_id": "U012", "session_id": "S1086", "item_id": "I731", "timestamp": "2023-10-01T21:50:00Z", "event_type": "purchase", "duration_sec": 1900}, {"user_id": "U012", "session_id": "S1086", "item_id": "I732", "timestamp": "2023-10-01T21:55:00Z", "event_type": "view", "duration_sec": 100}, {"user_id": "U123", "session_id": "S1087", "item_id": "I741", "timestamp": "2023-10-01T22:00:00Z", "event_type": "add_to_cart", "duration_sec": 225}, {"user_id": "U123", "session_id": "S1087", "item_id": "I742", "timestamp": "2023-10-01T22:05:00Z", "event_type": "view", "duration_sec": 42}, {"user_id": "U234", "session_id": "S1088", "item_id": "I751", "timestamp": "2023-10-01T22:10:00Z", "event_type": "click", "duration_sec": 215}, {"user_id": "U234", "session_id": "S1088", "item_id": "I752", "timestamp": "2023-10-01T22:15:00Z", "event_type": "purchase", "duration_sec": 880}, {"user_id": "U345", "session_id": "S1089", "item_id": "I761", "timestamp": "2023-10-01T22:20:00Z", "event_type": "view", "duration_sec": 39}, {"user_id": "U345", "session_id": "S1089", "item_id": "I762", "timestamp": "2023-10-01T22:25:00Z", "event_type": "add_to_cart", "duration_sec": 190}, {"user_id": "U456", "session_id": "S1090", "item_id": "I771", "timestamp": "2023-10-01T22:30:00Z", "event_type": "click", "duration_sec": 280}, {"user_id": "U456", "session_id": "S1090", "item_id": "I772", "timestamp": "2023-10-01T22:35:00Z", "event_type": "view", "duration_sec": 50}, {"user_id": "U567", "session_id": "S1091", "item_id": "I781", "timestamp": "2023-10-01T22:40:00Z", "event_type": "purchase", "duration_sec": 2000}, {"user_id": "U567", "session_id": "S1091", "item_id": "I782", "timestamp": "2023-10-01T22:45:00Z", "event_type": "view", "duration_sec": 105}, {"user_id": "U678", "session_id": "S1092", "item_id": "I791", "timestamp": "2023-10-01T22:50:00Z", "event_type": "add_to_cart", "duration_sec": 235}, {"user_id": "U678", "session_id": "S1092", "item_id": "I792", "timestamp": "2023-10-01T22:55:00Z", "event_type": "view", "duration_sec": 44}, {"user_id": "U789", "session_id": "S1093", "item_id": "I801", "timestamp": "2023-10-01T23:00:00Z", "event_type": "click", "duration_sec": 225}, {"user_id": "U789", "session_id": "S1093", "item_id": "I802", "timestamp": "2023-10-01T23:05:00Z", "event_type": "purchase", "duration_sec": 920}, {"user_id": "U890", "session_id": "S1094", "item_id": "I811", "timestamp": "2023-10-01T23:10:00Z", "event_type": "view", "duration_sec": 41}, {"user_id": "U890", "session_id": "S1094", "item_id": "I812", "timestamp": "2023-10-01T23:15:00Z", "event_type": "add_to_cart", "duration_sec": 200}, {"user_id": "U901", "session_id": "S1095", "item_id": "I821", "timestamp": "2023-10-01T23:20:00Z", "event_type": "click", "duration_sec": 290}, {"user_id": "U901", "session_id": "S1095", "item_id": "I822", "timestamp": "2023-10-01T23:25:00Z", "event_type": "view", "duration_sec": 52}, {"user_id": "U012", "session_id": "S1096", "item_id": "I831", "timestamp": "2023-10-01T23:30:00Z", "event_type": "purchase", "duration_sec": 2100}, {"user_id": "U012", "session_id": "S1096", "item_id": "I832", "timestamp": "2023-10-01T23:35:00Z", "event_type": "view", "duration_sec": 110}, {"user_id": "U123", "session_id": "S1097", "item_id": "I841", "timestamp": "2023-10-01T23:40:00Z", "event_type": "add_to_cart", "duration_sec": 245}, {"user_id": "U123", "session_id": "S1097", "item_id": "I842", "timestamp": "2023-10-01T23:45:00Z", "event_type": "view", "duration_sec": 46}, {"user_id": "U234", "session_id": "S1098", "item_id": "I851", "timestamp": "2023-10-01T23:50:00Z", "event_type": "click", "duration_sec": 235}, {"user_id": "U234", "session_id": "S1098", "item_id": "I852", "timestamp": "2023-10-01T23:55:00Z", "event_type": "purchase", "duration_sec": 960}, {"user_id": "U345", "session_id": "S1099", "item_id": "I861", "timestamp": "2023-10-02T00:00:00Z", "event_type": "view", "duration_sec": 43}, {"user_id": "U345", "session_id": "S1099", "item_id": "I862", "timestamp": "2023-10-02T00:05:00Z", "event_type": "add_to_cart", "duration_sec": 210}, {"user_id": "U456", "session_id": "S1100", "item_id": "I871", "timestamp": "2023-10-02T00:10:00Z", "event_type": "click", "duration_sec": 300}, {"user_id": "U456", "session_id": "S1100", "item_id": "I872", "timestamp": "2023-10-02T00:15:00Z", "event_type": "view", "duration_sec": 54} ]

Appendix C: Sample Movie Metadata Dataset (JSON)

To complement the clickstream data in Appendix B, here is the simulated item metadata dataset used for Content-Based Filtering.

[ {"item_id": "I042", "title": "The Quantum Paradox", "genres": ["Sci-Fi", "Thriller"], "release_year": 2021, "rating": 4.5}, {"item_id": "I089", "title": "Romantic Echoes", "genres": ["Romance", "Drama"], "release_year": 2019, "rating": 3.8}, {"item_id": "I112", "title": "Operation Alpha", "genres": ["Action", "War"], "release_year": 2020, "rating": 4.1}, {"item_id": "I999", "title": "Comedy Central Live", "genres": ["Comedy", "Stand-Up"], "release_year": 2022, "rating": 4.8}, {"item_id": "I101", "title": "Deep Ocean Secrets", "genres": ["Documentary", "Nature"], "release_year": 2018, "rating": 4.9}, {"item_id": "I102", "title": "Space Colonists", "genres": ["Sci-Fi", "Adventure"], "release_year": 2023, "rating": 4.2}, {"item_id": "I103", "title": "Historical Battles", "genres": ["History", "Documentary"], "release_year": 2015, "rating": 4.6}, {"item_id": "I201", "title": "The Last Samurai", "genres": ["Action", "History"], "release_year": 2003, "rating": 4.7}, {"item_id": "I202", "title": "Medieval Knights", "genres": ["Action", "Drama"], "release_year": 2010, "rating": 3.9}, {"item_id": "I301", "title": "Future Tech Review", "genres": ["Technology", "News"], "release_year": 2024, "rating": 4.0}, {"item_id": "I302", "title": "AI Revolution", "genres": ["Documentary", "Tech"], "release_year": 2022, "rating": 4.4}, {"item_id": "I401", "title": "Culinary Journey", "genres": ["Cooking", "Travel"], "release_year": 2019, "rating": 4.3}, {"item_id": "I402", "title": "Street Food Masters", "genres": ["Cooking", "Reality"], "release_year": 2021, "rating": 4.5}, {"item_id": "I501", "title": "Guitar for Beginners", "genres": ["Education", "Music"], "release_year": 2020, "rating": 4.8}, {"item_id": "I502", "title": "Advanced Piano", "genres": ["Education", "Music"], "release_year": 2018, "rating": 4.6}, {"item_id": "I601", "title": "Yoga Daily", "genres": ["Health", "Fitness"], "release_year": 2022, "rating": 4.9}, {"item_id": "I602", "title": "HIIT Workout", "genres": ["Health", "Fitness"], "release_year": 2021, "rating": 4.7}, {"item_id": "I701", "title": "Mystery in the Alps", "genres": ["Mystery", "Thriller"], "release_year": 2017, "rating": 4.1}, {"item_id": "I702", "title": "Detective Noir", "genres": ["Mystery", "Crime"], "release_year": 2014, "rating": 4.3}, {"item_id": "I801", "title": "Fantasy Worlds", "genres": ["Fantasy", "Adventure"], "release_year": 2022, "rating": 4.5}, {"item_id": "I802", "title": "Dragon Riders", "genres": ["Fantasy", "Action"], "release_year": 2019, "rating": 4.2}, {"item_id": "I901", "title": "Horror House", "genres": ["Horror", "Thriller"], "release_year": 2018, "rating": 3.7}, {"item_id": "I902", "title": "Vampire Diaries: Extended", "genres": ["Horror", "Romance"], "release_year": 2015, "rating": 4.0}, {"item_id": "I011", "title": "Anime Classics", "genres": ["Anime", "Action"], "release_year": 2010, "rating": 4.8}, {"item_id": "I012", "title": "Mecha Warriors", "genres": ["Anime", "Sci-Fi"], "release_year": 2016, "rating": 4.4}, {"item_id": "I021", "title": "K-Drama Hits", "genres": ["Romance", "Drama"], "release_year": 2021, "rating": 4.7}, {"item_id": "I022", "title": "Seoul Nights", "genres": ["Drama", "Thriller"], "release_year": 2022, "rating": 4.5}, {"item_id": "I031", "title": "Bollywood Blockbuster", "genres": ["Action", "Romance"], "release_year": 2019, "rating": 4.2}, {"item_id": "I032", "title": "Indian Indie Films", "genres": ["Drama", "Art"], "release_year": 2020, "rating": 4.6}, {"item_id": "I041", "title": "French Cinema", "genres": ["Drama", "Romance"], "release_year": 2017, "rating": 4.3}, {"item_id": "I051", "title": "Spanish Telenovela", "genres": ["Drama", "Soap"], "release_year": 2018, "rating": 3.9}, {"item_id": "I052", "title": "Mexican Cartel Docs", "genres": ["Documentary", "Crime"], "release_year": 2021, "rating": 4.5}, {"item_id": "I061", "title": "Wildlife Africa", "genres": ["Nature", "Documentary"], "release_year": 2019, "rating": 4.9}, {"item_id": "I062", "title": "Amazon Rainforest", "genres": ["Nature", "Documentary"], "release_year": 2020, "rating": 4.8}, {"item_id": "I071", "title": "Cars & Coffee", "genres": ["Automotive", "Reality"], "release_year": 2022, "rating": 4.1}, {"item_id": "I072", "title": "Supercar Showdown", "genres": ["Automotive", "Action"], "release_year": 2023, "rating": 4.4}, {"item_id": "I081", "title": "Home Improvement", "genres": ["DIY", "Reality"], "release_year": 2015, "rating": 4.0}, {"item_id": "I082", "title": "Gardening Tips", "genres": ["DIY", "Education"], "release_year": 2018, "rating": 4.2}, {"item_id": "I091", "title": "Pet Care 101", "genres": ["Pets", "Education"], "release_year": 2021, "rating": 4.6}, {"item_id": "I092", "title": "Funny Dogs Compilation", "genres": ["Pets", "Comedy"], "release_year": 2022, "rating": 4.9}, {"item_id": "I111", "title": "Stock Market Basics", "genres": ["Finance", "Education"], "release_year": 2020, "rating": 4.4}, {"item_id": "I121", "title": "Crypto Trends", "genres": ["Finance", "News"], "release_year": 2023, "rating": 3.8}, {"item_id": "I122", "title": "Real Estate Investing", "genres": ["Finance", "Education"], "release_year": 2021, "rating": 4.5}, {"item_id": "I131", "title": "Learn Spanish", "genres": ["Language", "Education"], "release_year": 2019, "rating": 4.7}, {"item_id": "I132", "title": "Learn Japanese", "genres": ["Language", "Education"], "release_year": 2020, "rating": 4.8}, {"item_id": "I141", "title": "World Geography", "genres": ["Education", "Documentary"], "release_year": 2017, "rating": 4.3}, {"item_id": "I142", "title": "Space Exploration", "genres": ["Science", "Documentary"], "release_year": 2022, "rating": 4.9}, {"item_id": "I151", "title": "Physics for Kids", "genres": ["Kids", "Education"], "release_year": 2021, "rating": 4.5}, {"item_id": "I152", "title": "Chemistry Experiments", "genres": ["Science", "Education"], "release_year": 2018, "rating": 4.6}, {"item_id": "I161", "title": "Magic Tricks Revealed", "genres": ["Entertainment", "Reality"], "release_year": 2016, "rating": 4.1}, {"item_id": "I162", "title": "Got Talent Highlights", "genres": ["Entertainment", "Reality"], "release_year": 2023, "rating": 4.4}, {"item_id": "I171", "title": "Chess Masterclass", "genres": ["Gaming", "Education"], "release_year": 2020, "rating": 4.9}, {"item_id": "I172", "title": "Esports Finals 2023", "genres": ["Gaming", "Action"], "release_year": 2023, "rating": 4.7}, {"item_id": "I181", "title": "Speedrunning Zelda", "genres": ["Gaming", "Entertainment"], "release_year": 2022, "rating": 4.6}, {"item_id": "I182", "title": "Minecraft Builds", "genres": ["Gaming", "Creative"], "release_year": 2021, "rating": 4.8}, {"item_id": "I191", "title": "Digital Art Tutorial", "genres": ["Art", "Education"], "release_year": 2020, "rating": 4.7}, {"item_id": "I192", "title": "Oil Painting Basics", "genres": ["Art", "Education"], "release_year": 2019, "rating": 4.5}, {"item_id": "I211", "title": "Fashion Week 2022", "genres": ["Fashion", "News"], "release_year": 2022, "rating": 4.0}, {"item_id": "I212", "title": "Makeup Trends", "genres": ["Beauty", "Fashion"], "release_year": 2023, "rating": 4.3}, {"item_id": "I221", "title": "Skincare Routines", "genres": ["Beauty", "Health"], "release_year": 2021, "rating": 4.6}, {"item_id": "I222", "title": "Hairstyle Hacks", "genres": ["Beauty", "Fashion"], "release_year": 2020, "rating": 4.4}, {"item_id": "I231", "title": "Tech Gadgets 2024", "genres": ["Technology", "Review"], "release_year": 2024, "rating": 4.8}, {"item_id": "I232", "title": "Smartphone Showdown", "genres": ["Technology", "Review"], "release_year": 2023, "rating": 4.5}, {"item_id": "I241", "title": "Laptop Buying Guide", "genres": ["Technology", "Education"], "release_year": 2022, "rating": 4.7}, {"item_id": "I242", "title": "PC Building Tutorial", "genres": ["Technology", "DIY"], "release_year": 2021, "rating": 4.9}, {"item_id": "I251", "title": "Coding in Python", "genres": ["Technology", "Education"], "release_year": 2020, "rating": 4.9}, {"item_id": "I252", "title": "Web Dev Bootcamp", "genres": ["Technology", "Education"], "release_year": 2021, "rating": 4.8}, {"item_id": "I261", "title": "App Development", "genres": ["Technology", "Education"], "release_year": 2022, "rating": 4.7}, {"item_id": "I262", "title": "Cloud Architecture", "genres": ["Technology", "Education"], "release_year": 2023, "rating": 4.6}, {"item_id": "I271", "title": "Cybersecurity 101", "genres": ["Technology", "Security"], "release_year": 2021, "rating": 4.8}, {"item_id": "I272", "title": "Ethical Hacking", "genres": ["Technology", "Security"], "release_year": 2022, "rating": 4.7}, {"item_id": "I281", "title": "Blockchain Explained", "genres": ["Technology", "Finance"], "release_year": 2020, "rating": 4.4}, {"item_id": "I282", "title": "Quantum Computing", "genres": ["Technology", "Science"], "release_year": 2023, "rating": 4.5}, {"item_id": "I291", "title": "Machine Learning", "genres": ["Technology", "AI"], "release_year": 2021, "rating": 4.9}, {"item_id": "I292", "title": "Deep Learning Models", "genres": ["Technology", "AI"], "release_year": 2022, "rating": 4.8}, {"item_id": "I311", "title": "Data Visualization", "genres": ["Data Science", "Education"], "release_year": 2020, "rating": 4.6}, {"item_id": "I312", "title": "Big Data Engineering", "genres": ["Data Science", "Education"], "release_year": 2021, "rating": 4.7}, {"item_id": "I321", "title": "Statistics for DS", "genres": ["Data Science", "Math"], "release_year": 2019, "rating": 4.8}, {"item_id": "I322", "title": "SQL Mastery", "genres": ["Data Science", "Database"], "release_year": 2020, "rating": 4.9}, {"item_id": "I331", "title": "Agile Methodologies", "genres": ["Management", "Education"], "release_year": 2018, "rating": 4.5}, {"item_id": "I332", "title": "Project Management", "genres": ["Management", "Education"], "release_year": 2019, "rating": 4.6}, {"item_id": "I341", "title": "Leadership Skills", "genres": ["Business", "Education"], "release_year": 2020, "rating": 4.7}, {"item_id": "I342", "title": "Public Speaking", "genres": ["Business", "Education"], "release_year": 2021, "rating": 4.8}, {"item_id": "I351", "title": "Negotiation Tactics", "genres": ["Business", "Education"], "release_year": 2022, "rating": 4.6}, {"item_id": "I352", "title": "Sales Strategies", "genres": ["Business", "Education"], "release_year": 2023, "rating": 4.5}, {"item_id": "I361", "title": "Marketing 101", "genres": ["Marketing", "Education"], "release_year": 2020, "rating": 4.7}, {"item_id": "I362", "title": "Digital Marketing", "genres": ["Marketing", "Education"], "release_year": 2021, "rating": 4.8}, {"item_id": "I371", "title": "SEO Basics", "genres": ["Marketing", "Tech"], "release_year": 2019, "rating": 4.6}, {"item_id": "I372", "title": "Social Media Ads", "genres": ["Marketing", "Tech"], "release_year": 2020, "rating": 4.5}, {"item_id": "I381", "title": "Content Creation", "genres": ["Creative", "Marketing"], "release_year": 2021, "rating": 4.7}, {"item_id": "I382", "title": "Video Editing", "genres": ["Creative", "Tech"], "release_year": 2022, "rating": 4.8}, {"item_id": "I391", "title": "Photography Tips", "genres": ["Creative", "Art"], "release_year": 2020, "rating": 4.9}, {"item_id": "I392", "title": "Lighting Mastery", "genres": ["Creative", "Art"], "release_year": 2021, "rating": 4.7}, {"item_id": "I411", "title": "Music Production", "genres": ["Music", "Tech"], "release_year": 2019, "rating": 4.8}, {"item_id": "I412", "title": "Mixing & Mastering", "genres": ["Music", "Tech"], "release_year": 2020, "rating": 4.7}, {"item_id": "I421", "title": "Singing Techniques", "genres": ["Music", "Art"], "release_year": 2021, "rating": 4.6}, {"item_id": "I422", "title": "Songwriting", "genres": ["Music", "Creative"], "release_year": 2022, "rating": 4.8}, {"item_id": "I431", "title": "Acting Basics", "genres": ["Acting", "Art"], "release_year": 2018, "rating": 4.5}, {"item_id": "I432", "title": "Improv Comedy", "genres": ["Acting", "Comedy"], "release_year": 2019, "rating": 4.7}, {"item_id": "I441", "title": "Standup Specials", "genres": ["Comedy", "Entertainment"], "release_year": 2020, "rating": 4.9}, {"item_id": "I442", "title": "Sketch Shows", "genres": ["Comedy", "Entertainment"], "release_year": 2021, "rating": 4.6}, {"item_id": "I451", "title": "Late Night Highlights", "genres": ["Comedy", "Talk Show"], "release_year": 2022, "rating": 4.5}, {"item_id": "I452", "title": "Podcast Clips", "genres": ["Entertainment", "Talk Show"], "release_year": 2023, "rating": 4.7}, {"item_id": "I461", "title": "True Crime Stories", "genres": ["Crime", "Documentary"], "release_year": 2020, "rating": 4.8}, {"item_id": "I462", "title": "Unsolved Mysteries", "genres": ["Mystery", "Documentary"], "release_year": 2021, "rating": 4.7}, {"item_id": "I471", "title": "Paranormal Investigations", "genres": ["Horror", "Reality"], "release_year": 2019, "rating": 4.4}, {"item_id": "I472", "title": "Ghost Hunters", "genres": ["Horror", "Reality"], "release_year": 2020, "rating": 4.3}, {"item_id": "I481", "title": "Survival Skills", "genres": ["Adventure", "Reality"], "release_year": 2021, "rating": 4.6}, {"item_id": "I482", "title": "Extreme Sports", "genres": ["Sports", "Action"], "release_year": 2022, "rating": 4.8}, {"item_id": "I491", "title": "Football Highlights", "genres": ["Sports", "News"], "release_year": 2023, "rating": 4.9}, {"item_id": "I492", "title": "Basketball Finals", "genres": ["Sports", "Action"], "release_year": 2022, "rating": 4.8}, {"item_id": "I511", "title": "Olympics Recap", "genres": ["Sports", "Documentary"], "release_year": 2021, "rating": 4.7}, {"item_id": "I512", "title": "Tennis Grand Slams", "genres": ["Sports", "Action"], "release_year": 2020, "rating": 4.6}, {"item_id": "I521", "title": "Golf Masters", "genres": ["Sports", "Action"], "release_year": 2019, "rating": 4.5}, {"item_id": "I522", "title": "F1 Racing", "genres": ["Sports", "Action"], "release_year": 2023, "rating": 4.8}, {"item_id": "I531", "title": "Boxing Legends", "genres": ["Sports", "Documentary"], "release_year": 2018, "rating": 4.7}, {"item_id": "I532", "title": "MMA Knockouts", "genres": ["Sports", "Action"], "release_year": 2022, "rating": 4.9}, {"item_id": "I541", "title": "Skateboarding Tricks", "genres": ["Sports", "Action"], "release_year": 2021, "rating": 4.6}, {"item_id": "I542", "title": "Surfing Giants", "genres": ["Sports", "Adventure"], "release_year": 2020, "rating": 4.8}, {"item_id": "I551", "title": "Mountain Biking", "genres": ["Sports", "Adventure"], "release_year": 2019, "rating": 4.7}, {"item_id": "I552", "title": "Snowboarding Jumps", "genres": ["Sports", "Action"], "release_year": 2022, "rating": 4.6}, {"item_id": "I561", "title": "Travel Vlogs", "genres": ["Travel", "Reality"], "release_year": 2021, "rating": 4.8}, {"item_id": "I562", "title": "Europe Backpacking", "genres": ["Travel", "Documentary"], "release_year": 2020, "rating": 4.7}, {"item_id": "I571", "title": "Asia Street Markets", "genres": ["Travel", "Culture"], "release_year": 2019, "rating": 4.9}, {"item_id": "I572", "title": "Americas Road Trip", "genres": ["Travel", "Adventure"], "release_year": 2022, "rating": 4.8}, {"item_id": "I581", "title": "Luxury Hotels", "genres": ["Travel", "Review"], "release_year": 2021, "rating": 4.5}, {"item_id": "I582", "title": "Budget Travel", "genres": ["Travel", "Tips"], "release_year": 2020, "rating": 4.6}, {"item_id": "I591", "title": "Cruise Ship Tours", "genres": ["Travel", "Review"], "release_year": 2018, "rating": 4.4}, {"item_id": "I592", "title": "Desert Safaris", "genres": ["Travel", "Adventure"], "release_year": 2023, "rating": 4.7}, {"item_id": "I611", "title": "Ancient Egypt", "genres": ["History", "Documentary"], "release_year": 2017, "rating": 4.8}, {"item_id": "I612", "title": "Roman Empire", "genres": ["History", "Documentary"], "release_year": 2016, "rating": 4.7}, {"item_id": "I621", "title": "World War II", "genres": ["History", "War"], "release_year": 2015, "rating": 4.9}, {"item_id": "I622", "title": "Cold War Secrets", "genres": ["History", "Documentary"], "release_year": 2018, "rating": 4.6}, {"item_id": "I631", "title": "Industrial Revolution", "genres": ["History", "Documentary"], "release_year": 2019, "rating": 4.5}, {"item_id": "I632", "title": "Space Race", "genres": ["History", "Science"], "release_year": 2020, "rating": 4.8}, {"item_id": "I641", "title": "Dinosaurs Alive", "genres": ["Science", "Documentary"], "release_year": 2021, "rating": 4.7}, {"item_id": "I642", "title": "Evolution Theory", "genres": ["Science", "Education"], "release_year": 2019, "rating": 4.6}, {"item_id": "I651", "title": "Human Body Facts", "genres": ["Science", "Health"], "release_year": 2022, "rating": 4.8}, {"item_id": "I652", "title": "Brain Functions", "genres": ["Science", "Health"], "release_year": 2023, "rating": 4.9}, {"item_id": "I661", "title": "Philosophy Basics", "genres": ["Education", "Philosophy"], "release_year": 2020, "rating": 4.5}, {"item_id": "I662", "title": "Psychology 101", "genres": ["Education", "Psychology"], "release_year": 2021, "rating": 4.7}, {"item_id": "I671", "title": "Sociology Concepts", "genres": ["Education", "Sociology"], "release_year": 2019, "rating": 4.6}, {"item_id": "I672", "title": "Political Science", "genres": ["Education", "Politics"], "release_year": 2022, "rating": 4.4}, {"item_id": "I681", "title": "Economics for Beginners", "genres": ["Education", "Economics"], "release_year": 2020, "rating": 4.8}, {"item_id": "I682", "title": "Macroeconomics", "genres": ["Education", "Economics"], "release_year": 2021, "rating": 4.7}, {"item_id": "I691", "title": "Law and Order", "genres": ["Education", "Law"], "release_year": 2018, "rating": 4.5}, {"item_id": "I692", "title": "Criminal Justice", "genres": ["Education", "Law"], "release_year": 2019, "rating": 4.6}, {"item_id": "I711", "title": "Medical Anomalies", "genres": ["Health", "Documentary"], "release_year": 2020, "rating": 4.8}, {"item_id": "I712", "title": "ER Stories", "genres": ["Health", "Reality"], "release_year": 2021, "rating": 4.7}, {"item_id": "I721", "title": "Healthy Eating", "genres": ["Health", "Diet"], "release_year": 2019, "rating": 4.6}, {"item_id": "I722", "title": "Vegan Recipes", "genres": ["Health", "Diet"], "release_year": 2022, "rating": 4.5}, {"item_id": "I731", "title": "Mental Health Awareness", "genres": ["Health", "Psychology"], "release_year": 2023, "rating": 4.9}, {"item_id": "I732", "title": "Meditation Guide", "genres": ["Health", "Wellness"], "release_year": 2021, "rating": 4.8}, {"item_id": "I741", "title": "Sleep Science", "genres": ["Health", "Science"], "release_year": 2020, "rating": 4.7}, {"item_id": "I742", "title": "Posture Correction", "genres": ["Health", "Fitness"], "release_year": 2018, "rating": 4.4}, {"item_id": "I751", "title": "Marathon Training", "genres": ["Fitness", "Sports"], "release_year": 2019, "rating": 4.6}, {"item_id": "I752", "title": "Bodybuilding Diet", "genres": ["Fitness", "Diet"], "release_year": 2021, "rating": 4.7}, {"item_id": "I761", "title": "CrossFit Games", "genres": ["Fitness", "Action"], "release_year": 2022, "rating": 4.8}, {"item_id": "I762", "title": "Home Workouts", "genres": ["Fitness", "Health"], "release_year": 2020, "rating": 4.9}, {"item_id": "I771", "title": "Pilates for Core", "genres": ["Fitness", "Health"], "release_year": 2021, "rating": 4.6}, {"item_id": "I772", "title": "Zumba Dance", "genres": ["Fitness", "Dance"], "release_year": 2019, "rating": 4.5}, {"item_id": "I781", "title": "Martial Arts Basics", "genres": ["Fitness", "Sports"], "release_year": 2018, "rating": 4.7}, {"item_id": "I782", "title": "Self Defense", "genres": ["Fitness", "Education"], "release_year": 2020, "rating": 4.8}, {"item_id": "I791", "title": "Archery Techniques", "genres": ["Sports", "Education"], "release_year": 2021, "rating": 4.5}, {"item_id": "I792", "title": "Fencing Rules", "genres": ["Sports", "Education"], "release_year": 2019, "rating": 4.4}, {"item_id": "I811", "title": "Sailing the World", "genres": ["Adventure", "Travel"], "release_year": 2022, "rating": 4.9}, {"item_id": "I812", "title": "Deep Sea Diving", "genres": ["Adventure", "Nature"], "release_year": 2020, "rating": 4.8}, {"item_id": "I821", "title": "Everest Expeditions", "genres": ["Adventure", "Documentary"], "release_year": 2021, "rating": 4.7}, {"item_id": "I822", "title": "Arctic Explorers", "genres": ["Adventure", "Documentary"], "release_year": 2018, "rating": 4.6}, {"item_id": "I831", "title": "Jungle Survival", "genres": ["Adventure", "Reality"], "release_year": 2019, "rating": 4.5}, {"item_id": "I832", "title": "Desert Nomads", "genres": ["Adventure", "Culture"], "release_year": 2023, "rating": 4.8}, {"item_id": "I841", "title": "RV Living", "genres": ["Lifestyle", "Travel"], "release_year": 2021, "rating": 4.6}, {"item_id": "I842", "title": "Tiny House Nation", "genres": ["Lifestyle", "Reality"], "release_year": 2020, "rating": 4.7}, {"item_id": "I851", "title": "Minimalism Doc", "genres": ["Lifestyle", "Documentary"], "release_year": 2019, "rating": 4.5}, {"item_id": "I852", "title": "Zero Waste Living", "genres": ["Lifestyle", "Environment"], "release_year": 2022, "rating": 4.8}, {"item_id": "I861", "title": "Sustainable Farming", "genres": ["Environment", "Agriculture"], "release_year": 2021, "rating": 4.9}, {"item_id": "I862", "title": "Climate Change Facts", "genres": ["Environment", "Science"], "release_year": 2023, "rating": 4.7}, {"item_id": "I871", "title": "Ocean Cleanups", "genres": ["Environment", "Documentary"], "release_year": 2020, "rating": 4.8}, {"item_id": "I872", "title": "Renewable Energy", "genres": ["Environment", "Tech"], "release_year": 2019, "rating": 4.6} ]

Appendix D: Sample User Demographics Dataset (JSON)

For context-aware recommendations, user demographics play a huge role. Here is a simulated demographics dataset mapping to the users in Appendix B.

[ {"user_id": "U721", "age": 25, "gender": "F", "location": "New York", "device": "mobile", "premium": true, "signup_date": "2021-05-12"}, {"user_id": "U314", "age": 34, "gender": "M", "location": "Chicago", "device": "desktop", "premium": false, "signup_date": "2022-01-20"}, {"user_id": "U089", "age": 19, "gender": "M", "location": "Los Angeles", "device": "mobile", "premium": false, "signup_date": "2023-08-15"}, {"user_id": "U555", "age": 42, "gender": "F", "location": "Houston", "device": "tablet", "premium": true, "signup_date": "2020-11-05"}, {"user_id": "U123", "age": 28, "gender": "M", "location": "Seattle", "device": "mobile", "premium": true, "signup_date": "2021-03-30"}, {"user_id": "U444", "age": 55, "gender": "F", "location": "Boston", "device": "desktop", "premium": false, "signup_date": "2019-07-22"}, {"user_id": "U999", "age": 22, "gender": "M", "location": "Austin", "device": "mobile", "premium": true, "signup_date": "2022-09-10"}, {"user_id": "U888", "age": 31, "gender": "F", "location": "Denver", "device": "tablet", "premium": false, "signup_date": "2021-12-01"}, {"user_id": "U777", "age": 27, "gender": "M", "location": "Miami", "device": "mobile", "premium": true, "signup_date": "2020-04-18"}, {"user_id": "U666", "age": 48, "gender": "F", "location": "Atlanta", "device": "desktop", "premium": false, "signup_date": "2018-02-14"}, {"user_id": "U333", "age": 29, "gender": "M", "location": "Portland", "device": "mobile", "premium": true, "signup_date": "2022-05-05"}, {"user_id": "U222", "age": 38, "gender": "F", "location": "Dallas", "device": "desktop", "premium": false, "signup_date": "2020-10-10"}, {"user_id": "U111", "age": 24, "gender": "M", "location": "San Francisco", "device": "mobile", "premium": true, "signup_date": "2023-01-15"}, {"user_id": "U000", "age": 50, "gender": "F", "location": "Washington", "device": "tablet", "premium": false, "signup_date": "2019-09-25"}, {"user_id": "U234", "age": 33, "gender": "M", "location": "Phoenix", "device": "desktop", "premium": true, "signup_date": "2021-06-20"}, {"user_id": "U345", "age": 26, "gender": "F", "location": "San Diego", "device": "mobile", "premium": false, "signup_date": "2022-11-08"}, {"user_id": "U456", "age": 45, "gender": "M", "location": "Detroit", "device": "desktop", "premium": true, "signup_date": "2017-03-12"}, {"user_id": "U567", "age": 21, "gender": "F", "location": "Minneapolis", "device": "mobile", "premium": false, "signup_date": "2023-04-04"}, {"user_id": "U678", "age": 37, "gender": "M", "location": "Tampa", "device": "tablet", "premium": true, "signup_date": "2020-08-30"}, {"user_id": "U789", "age": 52, "gender": "F", "location": "Charlotte", "device": "desktop", "premium": false, "signup_date": "2018-11-17"}, {"user_id": "U890", "age": 30, "gender": "M", "location": "Orlando", "device": "mobile", "premium": true, "signup_date": "2021-02-28"}, {"user_id": "U901", "age": 23, "gender": "F", "location": "Raleigh", "device": "mobile", "premium": false, "signup_date": "2022-07-07"}, {"user_id": "U012", "age": 41, "gender": "M", "location": "Columbus", "device": "desktop", "premium": true, "signup_date": "2019-01-09"}, {"user_id": "U135", "age": 28, "gender": "F", "location": "Indianapolis", "device": "tablet", "premium": false, "signup_date": "2021-09-14"} ]