Microsoft Excel is a popular application for Windows that is used to work with spreadsheets, tables and charts. Spreadsheets are used extensively in the business world to record budgets, create reports, manage inventory or perform calculations.
Working on Excel tasks in your day to day work can be boring, mind-numbing and even daunting especially if you have to go through thousands of rows of data to make small edits or to compare figures. Python can be used to automate working with Excel documents and in this blog post, I will show you how.
The first thing to do, is to download a python library called OpenPyXL. OpenPyXL can be installed easily using pip. To learn more about how to install 3rd party python packages, read this.
pip install openpyxl
Verify that it has been successfully installed by starting the python interpreter and running:
If the import does not print any errors it means you have successfully installed OpenPyXL.
Opening Excel Files
Excel files are made up of numbered rows and columns with letters starting at A that can contain either numbers or text. Rows and columns in a single grid make up a work sheet or sheet for short. Multiple work sheets make up a workbook. In other words, an Excel spreadsheet is a workbook.
To open an Excel workbook, we need to import the OpenPyXL module and use its
load_workbook() function. For the purpose of this article, let’s assume we want to open a file named example.xlsx.
import openpyxl excel_doc = openpyxl.load_workbook('example.xlsx')
Here, we called the
openpyxl.load_workbook() function and passed in the name of the excel document we want it to open. The function returns a Workbook object that represents the actual Excel file.
Getting the sheets
The next step in retrieving data from an Excel file is to specify which sheet we are interested in. To find out the names of the worksheets in the workbook, call the
>>> import openpyxl >>> excel_doc = openpyxl.load_workbook('example.xlsx') >>> excel_doc.get_sheet_names() [u'Sheet1', u'Sheet2', u'Sheet3']
We have three sheets in this document, namely
Sheet3. The next step is to select a sheet to get data from.
>>> import openpyxl >>> excel_doc = openpyxl.load_workbook('example.xlsx') >>> excel_doc.get_sheet_names() [u'Sheet1', u'Sheet2', u'Sheet3'] work_sheet1 = excel_doc.get_sheet_by_name('Sheet1') <Worksheet "Sheet1">
'Sheet1' to the
get_sheet_by_name() method creates a
Worksheet object that represents Sheet1 in the Excel file. The worksheet object allows you to get data from the the individual cells or from a range of them.
To get the value of a single cell, such as the cell in the first row and first column(cell A1) do the following:
To access many cells, use slicing in the same way you would do it in Excel:
An alternative way of retrieving cell values is to use the sheet’s
cell() method. The
cell() method allows you to specify the cell’s column and row as integer keyword arguments. This is useful when working with excel files that have a lot of columns and rows.
>>> sheet.cell(row=1, column=2).value u'Apples'
How to get data from a range of cells
To read the data from a range of cells, you can use the methods above in a for loop. Let’s suppose you want to get the contents of a single column.
>>> for i in range(1, 8): print(sheet.cell(row=i, column=2).value) Apples Cherries Pears Oranges Apples Bananas Strawberries
The code above will go down column 2, starting at row 1 and print the values of each cell in column 7 until it gets to row 7.
Writing data to an Excel file
It is possible to use OpenPyXL to change or modify the spreadsheet data. For example, the code below shows how to change the title of the active worksheet.
>>> active_worksheet = excel_doc.active >>> active_worksheet.title u'Sheet1' active_worksheet.title = "End Of Year Report" >>> active_worksheet.title u'End Of Year Report' >>> active_worksheet['A1'].value = "2017" >>> active_worksheet['A1'].value u'2017'
Changing sheet titles or cell data is simple to do, it is much like assigning data to a normal Python data structure.
OpenPyXL allows you to do more than read and write data to excel documents. Other OpenPyXL features I did not discuss here include changing font styles in cells, adding formulas, charts and even freezing panes.
For more information on how to work with Excel Spreadsheets using Python, have a look at the following resources:
- OpenPyXL documentation here
- Chapter 12 of Al Sweigart’s Automate The Boring Stuff With Python book.
- Pyxll’s Post on Tools for Working with Python and Excel