How to train your first XGBoost model in Python

Abhishek Sharma
4 min readAug 2, 2022

--

In this blog, we will see how you can train your first XGBoost model in Python in the simplest way possible.

XGBoost is an implementation of gradient-boosted decision trees designed for performance and speed.

Read the full article here — How to train your first XGBoost model in Python

After reading this post you will know:

  • How to install XGBoost on your system for use in Python.
  • How to prepare data and train your first XGBoost model.
  • How to make predictions using your XGBoost model.

Step 0 — Installing XGBoost

Windows

pip install xgboost

Linux

sudo pip install xgboost

Step 1 — Importing Required Libraries

  • Importing Pandas for reading the CSV file.
  • Importing XGBClassifier from xgboost module to model it.
  • Importing accuracy_score and train_test_split from sklearn to calculate the accuracy and split the data respectively.
import pandas as pd
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

Step 2 — Loading the Data

  • In this tutorial, we are going to use the Pima Indians onset of diabetes dataset.
  • This dataset is comprised of 8 input variables that describe the medical details of patients and one output variable to indicate whether the patient will have an onset of diabetes within 5 years.
  • Download Data from this link.
df = pd.read_csv('pima-indians-diabetes.data.csv',header=None)
df.head()

Step 3 — Splitting the Data

  • Here we are keeping the first 8 columns as features and we name it X.
  • For X we have used df.iloc[:,0:8] which says that take all the rows and include only 0:8(0,1,2,3,4,5,6,7) columns.
  • The last column is the target column and we name it Y.
  • For Y we have used df.iloc[:,8] which says that take all the rows and just take the 8th column(target column).
  • Let’s split the data into a 67:33 train:test ratio using the train_test_split method of sklearn. It takes mainly two parameters; features, and targets. Here X represents features and Y represents targets.
# split data into X and y
X = df.iloc[:,0:8]
Y = df.iloc[:,8]
# split data into train and test sets
test_size = 0.33
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=7)

Step 4 — Training the XGBoost Model

  • Create an XGBClassifier object and name it model.
  • Now let’s train this model using the training Data.
model = XGBClassifier()
model.fit(X_train, y_train)

Step 5 — Making predictions on the Test Data

  • Let’s make the predictions now.
  • Use the model.predict method to make predictions on the test data.
  • Let’s see the predictions that our model made.
# make predictions for test data
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]
predictions

Step 6 — Testing the XGBoost Model Performance

  • Let’s see the accuracy of our model.
  • Here we have used the accuracy_score function of sklearn to find the accuracy of our model.
  • We can see that our model is giving 74% accuracy which is not very fascinating :) but still it
# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))

Let’s see the whole code in one place…

import pandas as pd
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# load data
df = pd.read_csv('pima-indians-diabetes.data.csv',header=None)
# split data into X and y
X = df.iloc[:,0:8]
Y = df.iloc[:,8]
# split data into train and test sets
test_size = 0.33
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=7)
# fit model no training data
model = XGBClassifier()
model.fit(X_train, y_train)
# make predictions for test data
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]
# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))

Do let me know if there’s any query while you train your first XGBoost model.

So this is all for this blog folks, thanks for reading it and I hope you are taking something with you after reading this and till the next time …

Read my previous post: 4 Easiest ways to visualize Decision Trees using Scikit-Learn and Python

Check out my other machine learning projects, deep learning projects, computer vision projects, NLP projects, and Flask projects at machinelearningprojects.net.

--

--

No responses yet