How to train your first XGBoost model in Python
In this blog, we will see how you can train your first XGBoost model in Python in the simplest way possible.
XGBoost is an implementation of gradient-boosted decision trees designed for performance and speed.
Read the full article here — How to train your first XGBoost model in Python
After reading this post you will know:
- How to install XGBoost on your system for use in Python.
- How to prepare data and train your first XGBoost model.
- How to make predictions using your XGBoost model.
Step 0 — Installing XGBoost
- Refer to this XGBoost Installation guide.
Windows
pip install xgboost
Linux
sudo pip install xgboost
Step 1 — Importing Required Libraries
- Importing Pandas for reading the CSV file.
- Importing XGBClassifier from xgboost module to model it.
- Importing accuracy_score and train_test_split from sklearn to calculate the accuracy and split the data respectively.
import pandas as pd
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
Step 2 — Loading the Data
- In this tutorial, we are going to use the Pima Indians onset of diabetes dataset.
- This dataset is comprised of 8 input variables that describe the medical details of patients and one output variable to indicate whether the patient will have an onset of diabetes within 5 years.
- Download Data from this link.
df = pd.read_csv('pima-indians-diabetes.data.csv',header=None)
df.head()
Step 3 — Splitting the Data
- Here we are keeping the first 8 columns as features and we name it X.
- For X we have used df.iloc[:,0:8] which says that take all the rows and include only 0:8(0,1,2,3,4,5,6,7) columns.
- The last column is the target column and we name it Y.
- For Y we have used df.iloc[:,8] which says that take all the rows and just take the 8th column(target column).
- Let’s split the data into a 67:33 train:test ratio using the train_test_split method of sklearn. It takes mainly two parameters; features, and targets. Here X represents features and Y represents targets.
# split data into X and y
X = df.iloc[:,0:8]
Y = df.iloc[:,8]# split data into train and test sets
test_size = 0.33
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=7)
Step 4 — Training the XGBoost Model
- Create an XGBClassifier object and name it model.
- Now let’s train this model using the training Data.
model = XGBClassifier()
model.fit(X_train, y_train)
Step 5 — Making predictions on the Test Data
- Let’s make the predictions now.
- Use the model.predict method to make predictions on the test data.
- Let’s see the predictions that our model made.
# make predictions for test data
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]
predictions
Step 6 — Testing the XGBoost Model Performance
- Let’s see the accuracy of our model.
- Here we have used the accuracy_score function of sklearn to find the accuracy of our model.
- We can see that our model is giving 74% accuracy which is not very fascinating :) but still it
# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))
Let’s see the whole code in one place…
import pandas as pd
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score# load data
df = pd.read_csv('pima-indians-diabetes.data.csv',header=None)# split data into X and y
X = df.iloc[:,0:8]
Y = df.iloc[:,8]# split data into train and test sets
test_size = 0.33
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=7)# fit model no training data
model = XGBClassifier()
model.fit(X_train, y_train)# make predictions for test data
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))
Do let me know if there’s any query while you train your first XGBoost model.
So this is all for this blog folks, thanks for reading it and I hope you are taking something with you after reading this and till the next time …
Read my previous post: 4 Easiest ways to visualize Decision Trees using Scikit-Learn and Python
Check out my other machine learning projects, deep learning projects, computer vision projects, NLP projects, and Flask projects at machinelearningprojects.net.