Movie Recommendation System — 2nd way — with source code

Abhishek Sharma
5 min readDec 9, 2021

--

In this blog, we will see one more way of implementing the Movie Recommendation System. This blog is also going to be a very interesting blog, so without any further due.

The simple intuition of this 2nd way is that we will be combining the main features like the cast, director, genres, etc., and observe similarities between them because most of the time similar directors make similar movies, similar casts like to perform in some similar specific types of movies.

Read the full article with source code here — https://machinelearningprojects.net/movie-recommendation-system-2nd-way/

Let’s do it…

Step 1 — Importing libraries required for Movie Recommendation System.

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

Step 2 — Reading input data.

org_movies = pd.read_csv(‘movie_dataset.csv’)
org_movies.head(3)

Step 3 — Checking columns of our data.

org_movies.columns

Step 4 — Just keeping important columns.

movies = org_movies[[ ‘genres’, ‘keywords’,’cast’, ‘title’, ‘director’]]
movies.head()
  • We will remove all the unnecessary columns/features and just keep these 5 columns.

Step 5 — Checking info. of our data.

movies.info()
  • As we can see from the image below that our data is having some NULL values.
  • So we will fill these NULL values in the next step.

Step 6 — Filling Null values.

movies.fillna(‘’,inplace=True)
  • We are simply filling the NULL values with an empty space.

Step 7 — Again checking info.

movies.info()
  • Now if we check again, we can see that there are no NULL values now.

Step 8 — Making a column called combined features.

movies[‘combined_features’] = movies[‘genres’] +’ ‘+ movies[‘keywords’] +’ ‘+ movies[‘cast’] +’ ‘+ movies[‘title’] +’ ‘+ movies[‘director’]
movies.head()
  • Here we have made a new column called combined_features which will contain all these features combined or we can say all these strings concatenated.

Step 9 — Observe the first entry in the combined feature column.

movies.iloc[0][‘combined_features’]
  • This is how the first combined_feature looks.

Step 10 — Initializing CountVectorizer.

cv = CountVectorizer()
count_matrix = cv.fit_transform(movies[‘combined_features’])

Here we are using CountVectorizer() to convert these combined features to a bag of words because we just can’t operate on strings.

Step 11 — Finding similarities between different entries.

cs = cosine_similarity(count_matrix)
cs.shape
  • Here we are using cosine_similarity to calculate similarities between all the combined features.
  • Like we will calculate the similarity between 1st and 2nd combined features, between 2nd and 3rd, between 1st and 3rd, etc.
  • And then we come up with this 4803 X 4803 matrix which contains similarities.

Step 12 — Two utility functions.

def get_movie_name_from_index(index):
return org_movies[org_movies['index']==index]['title'].values[0]
def get_index_from_movie_name(name):
return org_movies[org_movies['title']==name]['index'].values[0]
  • Just 2 utility functions.
  • The first function helps in extracting names from the index.
  • The second function helps in extracting the index from the name.

Step 13 — Printing all movies names.

print(list(movies[‘title’]))

Step 14 — Live predictor.

test_movie_name = input('Enter Movie name --> ')
test_movie_index = get_index_from_movie_name(test_movie_name)
movie_corrs = cs[test_movie_index]
movie_corrs = enumerate(movie_corrs)
sorted_similar_movies = sorted(movie_corrs,key=lambda x:x[1],reverse=True)
for i in range(10):
print(get_movie_name_from_index(sorted_similar_movies[i][0]))
  • Simply enter the movie name, for eg. ‘The Avengers’.
  • Get its index.
  • Get its similarities with all other movies using the cosine_similairty matrix.
  • Simply enumerate the similarities. This step will just make similarity which was like [0.001, 0.2, 0.65, 0.02…] to [(0,0.001), (1,0.2), (2,0.65), (3,0.02)…]. It will just add an index in front of all of them.
  • Then we simply sort the results based on the 2nd parameter above that was similarity (0th index is index and 1st index is a similarity).
  • And then print the first 10.
  • We can see that it is giving pretty good results as if someone likes The Avengers’, he/she will surely like Avengers: Age of Ultron, Iron Man 2, Captain America, etc.

Do let me know if there’s any query regarding Movie Recommendation System by contacting me on email or LinkedIn.

To explore more Machine Learning, Deep Learning, Computer Vision, NLP, Flask Projects visit my blog.

For further code explanation and source code visit here

So this is all for this blog folks, thanks for reading it and I hope you are taking something with you after reading this and till the next time 👋…

--

--

No responses yet