Movie Recommendation System — 1st way— with source code

Let’s do it…

Step 1 — Importing packages required for Movie Recommendation System.

import pandas as pd

Step 2 — Reading input data.

df1 = pd.read_csv('',sep='\t')
df1.columns = ['user_id','item_id','rating','timestamp']

Step 3 — Reading Movie titles.

df2 = pd.read_csv(‘Movie_Id_Titles’)

Step 4 — Merging movie data and movie titles.

df = pd.merge(df1,df2,on=’item_id’)

Step 5 — Group the same movie entries.

rating_and_no_of_rating = pd.DataFrame(df.groupby(‘title’)[‘rating’].mean().sort_values(ascending=False))
  • We are grouping movies here and taking the mean of all ratings given to them and then we are sorting them by their mean rating.
  • You can see that in the result below a crap movie ‘They made a criminal’ is showing up which might have got a rating from only one person and that too 5 stars. That’s why its mean is also 5.

Step 6 — Adding a column of no. of ratings.

rating_and_no_of_rating[‘no_of_ratings’] = df.groupby(‘title’)[‘rating’].count()
  • Adding a column of no. of ratings.
  • We are calculating the no. of ratings by using the count method of a data frame.

Step 7 — Sorting on no. of ratings.

rating_and_no_of_rating = rating_and_no_of_rating.sort_values(‘no_of_ratings’,ascending=False)
  • Simply sort by no. of ratings.
  • And now we see some genuine results.
  • Star Wars which is a very famous movie has got a mean of 4.35 as a rating from 583 users.

Step 8 — Creating a pivot table.

pt = df.pivot_table(index=’user_id’,columns=’title’,values=’rating’)
  • Creating a pivot table.
  • In this pivot table, users go along rows and movies go along columns.
  • Nan represents that, that user has not given any rating to that movie.

Step 9 — Checking movie names.

  • Simply printing all movie names we have.

Step 10 — Live Prediction.

test_movie = input('Enter movie name --> ')

movie_vector = pt[test_movie].dropna()
similar_movies = pt.corrwith(movie_vector)

corr_df = pd.DataFrame(similar_movies,columns=['Correlation'])
corr_df = corr_df.join(rating_and_no_of_rating['no_of_ratings'])

corr_df = corr_df[corr_df['no_of_ratings']>100].sort_values('Correlation',ascending=False).dropna()
  • While testing I added ‘Star Wars (1977)’ as a test movie.
  • Pick its vector from the pivot table above.
  • Use that vector and correlate it with other movies using corrwith() and it will give its correlation with other movies based on user ratings and store it in similar_movies.
  • After that create a data frame as shown below and again merge no_of_ratings in it to take at least 100 ratings as the threshold.
  • Sort the results based on correlations and BOOM here are the results.
  • We can see the results, all these movies are somehow related to Star Wars.
  • Empire Strikes Back, Return of the Jedi, both of them are related to Star Wars.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store