House Price Prediction Project proves to be the Hello World of the Machine Learning world. It is a very easy project which simply uses Linear Regression to predict house prices. This is going to be a very short blog, so without any further due.

Let’s do it…

Step 1 — Importing required libraries.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

%matplotlib inline

Step 2 — Reading our input data for House Price Prediction.

customers = pd.read_csv('USA_Housing.csv')

Step 3 — Describing our data.


Step 4 — Analyzing information from our data.

Step 5 — Plots to visualize data of House Price Prediction.

  • We use sns.pairplot(data) to plot all the possible combinations of numerical columns in the dataset.
  • From the plots below we can infer one thing that Price is highly correlated to Average Area Income.

Step 6 — Scaling our data.

scaler = StandardScaler()


cols = X.columns

X = scaler.fit_transform(X)
  • We need to scale our data to bring everything down to one scale or within one range.
  • We are using StandardScaler here to scale our data.
  • Just check out the 1st image of input data and see how different columns belong to different scales.

Step 7 — Splitting our data for train and test purposes.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)
  • Using train_test_split() to split our data in 70%-30% proportions.

Step 8 — Training our Linear Regression model for House Price Prediction.

lr = LinearRegression(),y_train)

pred = lr.predict(X_test)

  • We are using r2_score here to measure the performance of our regression model.
  • Our model is giving a 0.91 r2_score out of 1 which is a very decent score.
  • I also tried using Lasso and Ridge Regressions but they also performed nearly the same as Linear regression.

Step 9 — Let's visualize our predictions of House Price Prediction.

sns.scatterplot(x=y_test, y=pred)
  • This should be a straight line for a 100% accurate model.
  • But we are also getting a trend like a straight line which is also not bad.

Step 10 — Plotting the residuals of our House Price Prediction model.

  • Here we are plotting a histogram of residuals.
  • Residual is the error term in a regression, or we can say the difference between real value and our predicted value.
  • As we can see that most of the residuals are around 0 means our predictions are almost near to the real values, hence it is a very good model.

Step 11 — Observing the coefficients.

cdf=pd.DataFrame(lr.coef_, cols, [‘coefficients’]).sort_values(‘coefficients’,ascending=False)
  • These are the coefficients calculated while Linear Regression.
  • Its intuition is that a 1 unit increase in Avg. Area Income will lead to an increase of $230377.522 in the price of the house, assuming all other factors are kept constant.

So this is all for this blog folks, thanks for reading it and I hope you are taking something with you after reading this



