How to Extract Tables from PDF files and save them as CSV using Python

Abhishek Sharma
3 min readAug 8, 2022

So guys in today’s blog we will see how to extract tables from PDF files and save them as CSV files using just 3–4 lines of code.

This use-case can be very useful when you need to extract n number of tables from a PDF File. So without any further due, let’s do it…

Snapshot of our Final CSV…

Step 1 — Install Camelot

  • To install the Camelot library, run the following command in your terminal.
pip install "camelot-py[cv]"

Step 2 — Importing required libraries

  • For our today’s use case, we just need to import the Camelot library.
import camelot

Step 3 — Reading the PDF file.

  • Download the pdf file.
  • Here we are simply using camelot.read_pdf function to read our PDF file and extract tables from it automatically.
  • If our PDF has more than 1 page, we can also specify the page numbers from which we need to read the CSVs.
  • Also if our PDF file is password protected we can pass the password of the file as the parameter to the read_pdf function.
tables = camelot.read_pdf('table.pdf')
# tables = camelot.read_pdf('table.pdf', pages='1,2,3,5-7,8')
# tables = camelot.read_pdf('table.pdf', password='*******')

Step 4 — Let’s extract tables from PDF files

  • As we already know that our PDF File is having just one table so we will just do tables[0].df, means print the 0th element(table) in our tables as a dataframe.
  • When you are working with multiple tables simply run a for-loop.
#Access the ith table as Pandas Data frame
tables[0].df

Step 5 — Save the table in CSV format

tables.export('found_table.csv', f='csv')

Step 6 — Visualizing the conversion metrics

tables[0].parsing_report
  • Read more about the advance usage of camelot library here.

And this is how you Extract Tables from PDF files…

So this is all for this blog folks. Thanks for reading it and I hope you are taking something with you after reading this and till the next time …

Read my previous post: How to Deploy a Flask app online using Pythonanywhere

Check out my other machine learning projects, deep learning projects, computer vision projects, NLP projects, and Flask projects at machinelearningprojects.net.

--

--