House Price Prediction— USA Housing Data— with source code — Easy Project

Abhishek Sharma
4 min readDec 6, 2021

--

House Price Prediction Project proves to be the Hello World of the Machine Learning world. It is a very easy project which simply uses Linear Regression to predict house prices. This is going to be a very short blog, so without any further due.

Read the full article with source code here — https://machinelearningprojects.net/house-price-prediction/

Let’s do it…

Step 1 — Importing required libraries.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

%matplotlib inline

Step 2 — Reading our input data for House Price Prediction.

customers = pd.read_csv('USA_Housing.csv')
customers.head()

Step 3 — Describing our data.

customers.describe()

Step 4 — Analyzing information from our data.

customers.info()

Step 5 — Plots to visualize data of House Price Prediction.

sns.pairplot(customers)
  • We use sns.pairplot(data) to plot all the possible combinations of numerical columns in the dataset.
  • From the plots below we can infer one thing that Price is highly correlated to Average Area Income.

Step 6 — Scaling our data.

scaler = StandardScaler()

X=customers.drop(['Price','Address'],axis=1)
y=customers['Price']

cols = X.columns

X = scaler.fit_transform(X)
  • We need to scale our data to bring everything down to one scale or within one range.
  • We are using StandardScaler here to scale our data.
  • Just check out the 1st image of input data and see how different columns belong to different scales.

Step 7 — Splitting our data for train and test purposes.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)
  • Using train_test_split() to split our data in 70%-30% proportions.

Step 8 — Training our Linear Regression model for House Price Prediction.

lr = LinearRegression()
lr.fit(X_train,y_train)

pred = lr.predict(X_test)

r2_score(y_test,pred)
  • We are using r2_score here to measure the performance of our regression model.
  • Our model is giving a 0.91 r2_score out of 1 which is a very decent score.
  • I also tried using Lasso and Ridge Regressions but they also performed nearly the same as Linear regression.

Step 9 — Let's visualize our predictions of House Price Prediction.

sns.scatterplot(x=y_test, y=pred)
  • This should be a straight line for a 100% accurate model.
  • But we are also getting a trend like a straight line which is also not bad.

Step 10 — Plotting the residuals of our House Price Prediction model.

sns.histplot((y_test-pred),bins=50,kde=True)
  • Here we are plotting a histogram of residuals.
  • Residual is the error term in a regression, or we can say the difference between real value and our predicted value.
  • As we can see that most of the residuals are around 0 means our predictions are almost near to the real values, hence it is a very good model.

Step 11 — Observing the coefficients.

cdf=pd.DataFrame(lr.coef_, cols, [‘coefficients’]).sort_values(‘coefficients’,ascending=False)
cdf
  • These are the coefficients calculated while Linear Regression.
  • Its intuition is that a 1 unit increase in Avg. Area Income will lead to an increase of $230377.522 in the price of the house, assuming all other factors are kept constant.

To explore more Machine Learning, Deep Learning, Computer Vision, NLP, Flask Projects visit my blog.

For further code explanation and source code visit here

So this is all for this blog folks, thanks for reading it and I hope you are taking something with you after reading this and till the next time 👋…

--

--