IPL Score Prediction Flask App— with source code

Let’s do it…

Step 1 — Importing libraries required for IPL Score Prediction.

import joblibimport numpy as npimport pandas as pdimport seaborn as snsfrom datetime import datetimefrom sklearn.linear_model import Ridgefrom sklearn.preprocessing import StandardScalerfrom sklearn.model_selection import RandomizedSearchCVfrom sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error

Step 2 — Reading the data for IPL Score Prediction.

df = pd.read_csv(‘ipl.csv’)df.head()

Step 3 — Dropping unnecessary columns.

cols_to_drop = [‘mid’,’batsman’,’bowler’,’striker’,’non-striker’]df.drop(cols_to_drop,axis=1,inplace=True)df.head()
  • In our specific use case features like mid, batsman, bowler, striker, and non-striker would not play a great role so it’s better to drop them.
  • I know that batsmen can play a role in changing scores, but the problem is that there are tonnes of batsmen that have played in IPL so we can’t operate on these many categories, so it’s better to drop them.

Step 4 — Preprocessing our data for IPL Score Prediction.

df[‘date’] = df[‘date’].apply(lambda x: datetime.strptime(x,’%Y-%m-%d’))# we have to remove temporary teams or the teams which are not available nowconsistent_teams = [‘Chennai Super Kings’, ‘Delhi Daredevils’,‘Kings XI Punjab’, ‘Kolkata Knight Riders’,‘Mumbai Indians’, ‘Rajasthan Royals’,‘Royal Challengers Bangalore’, ‘Sunrisers Hyderabad’]df = df[(df[‘bat_team’].isin(consistent_teams)) & (df[‘bowl_team’].isin(consistent_teams))]# we don’t want first five overs datadf = df[df[‘overs’]>=5.0]df.head()
  • Convert the date column to pandas DateTime column.
  • Then we have to remove teams that are not playing today in IPL and we just have to keep consistent teams.
  • Also, we will take data that is after the 5 overs because the initial stages of the match do not play that important part in deciding the score.

Step 5 — Checking unique venues.

  • Checking unique venues.
  • We can see that there are some foreign grounds also, so we will remove them in the next step.

Step 6 — Correct the names of the venues.

def f(x):  if x==’M Chinnaswamy Stadium’:   return ‘M Chinnaswamy Stadium, Bangalore’  elif x==’Feroz Shah Kotla’:   return ‘Feroz Shah Kotla, Delhi’  elif x==’Wankhede Stadium’:   return ‘Wankhede Stadium, Mumbai’  elif x==’Sawai Mansingh Stadium’:   return ‘Sawai Mansingh Stadium, Jaipur’  elif x==’Eden Gardens’:   return ‘Eden Gardens, Kolkata’  elif x==’Dr DY Patil Sports Academy’:   return ‘Dr DY Patil Sports Academy, Mumbai’  elif x==’Himachal Pradesh Cricket Association Stadium’:   return ‘Himachal Pradesh Cricket Association Stadium, Dharamshala’  elif x==’Subrata Roy Sahara Stadium’:   return ‘Maharashtra Cricket Association Stadium, Pune’  elif x==’Shaheed Veer Narayan Singh International Stadium’:   return ‘Raipur International Cricket Stadium, Raipur’  elif x==’JSCA International Stadium Complex’:   return ‘JSCA International Stadium Complex, Ranchi’  elif x==’Maharashtra Cricket Association Stadium’:   return ‘Maharashtra Cricket Association Stadium, Pune’  elif x==’Dr. Y.S. Rajasekhara Reddy ACA-VDCA Cricket Stadium’:   return ‘ACA-VDCA Stadium, Visakhapatnam’  elif x==’Punjab Cricket Association IS Bindra Stadium, Mohali’:   return ‘Punjab Cricket Association Stadium, Mohali’  elif x==’Holkar Cricket Stadium’:   return ‘Holkar Cricket Stadium, Indore’  elif x==’Sheikh Zayed Stadium’:   return ‘Sheikh Zayed Stadium, Abu-Dhabi’  elif x==’Sharjah Cricket Stadium’:   return ‘Sharjah Cricket Stadium, Sharjah’  elif x==’Dubai International Cricket Stadium’:   return ‘Dubai International Cricket Stadium, Dubai’  elif x==’Barabati Stadium’:   return ‘Barabati Stadium, Cuttack’  else:   return x ignored_stadiums = [‘Newlands’, “St George’s Park”,‘Kingsmead’, ‘SuperSport Park’, ‘Buffalo Park’,‘New Wanderers Stadium’, ‘De Beers Diamond Oval’,‘OUTsurance Oval’, ‘Brabourne Stadium’]df = df[True^(df[‘venue’].isin(ignored_stadiums))]df[‘venue’] = df[‘venue’].apply(f)df.head()
  • Here we are just using this function to correct the venue names.
  • After that, we are removing entries in which we have foreign grounds like Newlands, St George’s Park, etc.
  • In the third last line, we are using XOR operation to remove these grounds. As we know that True xor True is False, that’s what we are doing here. If we come across any entry whose venue is in ignored stadiums that entry will be true and true XOR true will become false and we will not take that in df.

Step 7 — Converting categorical columns to dummy variables.

df_new = pd.get_dummies(data=df,columns=[‘venue’,’bat_team’,’bowl_team’])df_new.head()
  • Creating dummy variables out of categorical variables like venue, bat_team, and bowl_team.

Step 8 — Checking columns.


Step 9 — Just change the positions of columns.

df_new = df_new[[‘date’,’venue_ACA-VDCA Stadium, Visakhapatnam’,‘venue_Barabati Stadium, Cuttack’, ‘venue_Dr DY Patil Sports Academy, Mumbai’,‘venue_Dubai International Cricket Stadium, Dubai’,‘venue_Eden Gardens, Kolkata’, ‘venue_Feroz Shah Kotla, Delhi’,‘venue_Himachal Pradesh Cricket Association Stadium, Dharamshala’,‘venue_Holkar Cricket Stadium, Indore’,‘venue_JSCA International Stadium Complex, Ranchi’,‘venue_M Chinnaswamy Stadium, Bangalore’,‘venue_MA Chidambaram Stadium, Chepauk’,‘venue_Maharashtra Cricket Association Stadium, Pune’,‘venue_Punjab Cricket Association Stadium, Mohali’,‘venue_Raipur International Cricket Stadium, Raipur’,‘venue_Rajiv Gandhi International Stadium, Uppal’,‘venue_Sardar Patel Stadium, Motera’,‘venue_Sawai Mansingh Stadium, Jaipur’,‘venue_Sharjah Cricket Stadium, Sharjah’,‘venue_Sheikh Zayed Stadium, Abu-Dhabi’,‘venue_Wankhede Stadium, Mumbai’,’bat_team_Chennai Super Kings’,‘bat_team_Delhi Daredevils’, ‘bat_team_Kings XI Punjab’,‘bat_team_Kolkata Knight Riders’, ‘bat_team_Mumbai Indians’,‘bat_team_Rajasthan Royals’, ‘bat_team_Royal Challengers Bangalore’,‘bat_team_Sunrisers Hyderabad’,’bowl_team_Chennai Super Kings’,‘bowl_team_Delhi Daredevils’, ‘bowl_team_Kings XI Punjab’,‘bowl_team_Kolkata Knight Riders’, ‘bowl_team_Mumbai Indians’,‘bowl_team_Rajasthan Royals’, ‘bowl_team_Royal Challengers Bangalore’,‘bowl_team_Sunrisers Hyderabad’,’runs’, ‘wickets’, ‘overs’, ‘runs_last_5’, ‘wickets_last_5’,‘total’]]df_new.head()

Step 10 — Resetting index.

  • Just see in the image above that indexes are not proper because we dropped many entries.
  • So we are just resetting indexes in this step and just deleting the index column that it will make of previous indexes.

Step 11 — Scaling our numerical data for IPL Score Prediction model.

scaler = StandardScaler()scaled_cols = scaler.fit_transform(df_new[[‘runs’, ‘wickets’, ‘overs’, ‘runs_last_5’, ‘wickets_last_5’]])scaled_cols = pd.DataFrame(scaled_cols,columns=[‘runs’, ‘wickets’, ‘overs’, ‘runs_last_5’, ‘wickets_last_5’])df_new.drop([‘runs’, ‘wickets’, ‘overs’, ‘runs_last_5’, ‘wickets_last_5’],axis=1,inplace=True)df_new = pd.concat([df_new,scaled_cols],axis=1)df_new.head()
  • Scaling our columns like ‘runs’, ‘wickets’, ‘overs’, ‘runs_last_5’, ‘wickets_last_5’ to bring them all down to same scale.
  • We will not scale the whole data because other columns are just 1s and 0s and this represents categorical columns and as we know scaling is just done on numerical values.

Step 12 — Splitting data for training and testing.

X_train = df_new.drop(‘total’,axis=1)[df_new[‘date’].dt.year<=2016]X_test = df_new.drop(‘total’,axis=1)[df_new[‘date’].dt.year>=2017]X_train.drop(‘date’,inplace=True,axis=1)X_test.drop(‘date’,inplace=True,axis=1)y_train = df_new[df_new[‘date’].dt.year<=2016][‘total’].valuesy_test = df_new[df_new[‘date’].dt.year>=2017][‘total’].values
  • We are splitting using the date column.
  • All the data from 2007–2017 is for training.
  • Data from and after 2017 is for testing.

Step 13 — Checking our X_train.


Step 14 — Training our Ridge model for IPL Score Prediction.

ridge = Ridge()parameters={‘alpha’:[1e-15,1e-10,1e-8,1e-3,1e-2,1,5,10,20,30,35,40]}ridge_regressor = RandomizedSearchCV(ridge,parameters,cv=10,scoring=’neg_mean_squared_error’)ridge_regressor.fit(X_train,y_train)print(ridge_regressor.best_params_)print(ridge_regressor.best_score_)print(‘\n’)# IPL Score Predictionprediction_r = ridge_regressor.predict(X_test)print(‘MAE:’, mean_absolute_error(y_test, prediction_r))print(‘MSE:’, mean_squared_error(y_test, prediction_r))print(‘RMSE:’, np.sqrt(mean_squared_error(y_test, prediction_r)))print(‘\n’)print(f’r2 score of ridge : {r2_score(y_test,prediction_r)}’)sns.distplot(y_test-prediction_r)
  • I also tried Linear regression but that was overfitting to the next level, that’s why I went with Ridge Regression because it prevents our model from overfitting.
  • Lasso was also giving similar results.

Step 15 — Saving our IPL Score Prediction model.





Data Scientist || Blogger || machinelearningprojects.net

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

12 common JMP charts you can make in Python with Plotly for free

Top Machine Learning & Data Science podcasts

Data Analytics for Design Thinking

Alexander Zverev — Daniil Medvedev “liveStream”(live)

Online live stream search engine

Support Vector Machines Explained

READ/DOWNLOAD*] Introduction to Mediation, Moderat

Japan vs Saudi Arabia “liveStream”(live)

You want to travel and you don’t know where it’s worth it? The data has that answer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Abhishek Sharma

Abhishek Sharma

Data Scientist || Blogger || machinelearningprojects.net

More from Medium


fire and smoke detection using CNN

Artificial Intelligence Project: Pose Detection

Hindi Character Recognition

A Simple Introduction to Object Detection