Capstone 8 Evidence Map

Capstone Summary

This documentation details Capstone 8 of the Microsoft AI Engineering Program 2026, which focuses on developing a comprehensive movie recommendation system. Using Python and datasets containing film titles and viewer ratings, the project demonstrates three distinct filtering methodologies: user-based, item-based, and model-based collaborative filtering. The technical workflow includes merging dataframes, creating a user-item pivot table, and calculating Pearson correlations to predict ratings and identify similar content. Performance is measured through 5-fold cross-validation, comparing the accuracy of SVD, NMF, and KNN models using Root Mean Square Error metrics. Ultimately, the project provides a structured portfolio of notebook evidence, statistical charts, and JSON summaries that validate the execution of these machine learning techniques.

Capstone 8 Scope

Capstone 8 turns the copied recommendation assignment into an executed notebook with user-based, item-based, and model-based recommendation outputs staged for the site workflow.

Primary staged datasets: movies.csv and ratings.csv.

The notebook exports recommendation outputs, model-cross-validation results, and a structured summary JSON.

Original Project PDF

The copied project directions are embedded here for direct comparison against the notebook and output artifacts.

Open Viewer Download PDF

Capstone 8 Evidence Map

Capstone 8 Scope

Original Project PDF

Requirement Checklist

Study the recommendation techniques for recommending movies using `movies.csv` and `ratings.csv`.

Load `movies.csv` and `ratings.csv`.

Merge both dataframes on `movieId`.

Create the user-item matrix using `pivot_table` with index `userId`, columns `title`, and values `rating`.

Perform user-based collaborative filtering.

Fill row-wise NaN values in the user-item matrix with the corresponding user's mean ratings.

Find the Pearson correlation between users.

Choose the correlation of all users with only User 1.

Sort the User 1 correlation in descending order.

Drop the NaN values generated in the correlation matrix.

Choose the top 50 users that are highly correlated to User 1.

Predict the rating that User 1 might give for the movie with `movieId 32` based on the top 50 user correlation matrix.

Perform item-based collaborative filtering.

Fill column-wise NaN values in the user-item matrix with the corresponding movie mean ratings.

Find the Pearson correlation between movies.

Choose the correlation of all movies with `Jurassic Park (1993)` only.

Sort the `Jurassic Park (1993)` movie correlation in descending order.

Drop the NaN values generated in the movie correlation matrix.

Find 10 movies similar to `Jurassic Park (1993)`.

Perform KNNBasic model-based collaborative filtering.

Initialize KNNBasic with Mean Squared Distance Similarity (`msd`), 20 neighbors, and 5-fold cross-validation against RMSE.

Initialize Singular Value Decomposition (SVD) and cross-validate 5 folds against RMSE.

Initialize Non-Negative Matrix Factorization (NMF) and cross-validate 5 folds against RMSE.

Print the best score and best parameters from cross validation on all built models.

Requirement Walkthrough

Merge The Ratings Data And Build The User-Item Matrix

Run User-Based And Item-Based Collaborative Filtering

Model-Based RMSE Comparison

Evaluate The Model-Based Recommendation Workflows

Data And Artifact Links

Project PDF

Notebook Evidence

Requirements File

Original CSV Dataset

JSON Output

CSV Output

Top 50 User Correlations CSV

Similar Movies CSV

Model CV Results CSV

Summary JSON

Colab Notebook

Capstone Session 8

Objective

Environment Note

Notebook Launch

Outputs And Results