Slicing with Pandas iloc and loc

Vinoo | Aug 1, 2024 min read

Introduction

Pandas is an incredibly powerful library for data analysis. We can use Pandas to effectively get an initial snapshot of our data and explore it for features, and insights. It is sometimes confusing to understand the use of 2 key methods on a pandas DataFrame - iloc and loc. This article will clear this up so you can use them easily on to select, filter and sort your data.

Alt text

Why use these methods?

When we learn SQL, our most basic method to query data is the SELECT statement. This allows us to query for specific data in a table. In essence, Pandas DataFrame acts like a table and we use loc and iloc to query the table.

Lets review a very simple 3 X 2 pandas dataframe of customers and ratings called df_customers:

Let’s create the dataframe:

python import pandas as pd

customers = ['John', 'Alice', "Harriet"] 
rating = [56, 23, 89] df_customers = pd.DataFrame({'first_name':customers, 'rating': rating})

Alt text

Key points about loc and iloc:

iloc takes 2 parameters - [rows, columns]

  1. Rows is a single integer or slice representing the rows of our query
  2. Columns represent the columns we want to include
  3. iloc uses number indexes for both rows and columns - let’s look at some examples:
df_customers.iloc[0:10, :] # returns the first 10 rows, from index 0 to 9 df_customers.iloc[0,:] # returns first row, all columns - John, 56
df_customers.iloc[0,0] # returns first row, only first column - John
df_customers.iloc[:,:] # returns all rows and all columns - this will be the original source dataframe
df_customers.iloc[0:1, 0] # returns record 0, first column - John	

Bear in mind that iloc uses standard indexing scheme - So 0:10 - will include records from index 0 to 9, but not the record at index 10

loc also takes 2 parameters - [rows, columns]

  • The key difference with loc is that it enables a few different use cases:
  1. Filtering columns and rows with label names of the dataframe (i.e. ‘first_name’, ‘rating’)
  2. Filtering rows with labels/attributes or indexes - These labels can be of various types, including integers (which might look like positions but are actually treated as labels), strings, or other objects. .loc does not use integer positions in the same way that .iloc does.
  3. Selecting rows using conditions
df_customers.loc[:, 'rating']  # all rows, only the rating column
df_customers.loc[df_customers['first_name'] == 'John', ['first_name', 'rating']]  # rows where first_name is 'John', columns 'first_name' and 'rating'
df_customers.loc[0, :]  # first row (assuming default integer index), all columns
df_customers.loc[df_customers['rating'] > 50, :]  # rows where rating > 50, all columns

Bear in mind that loc does not use a standard indexing scheme - So 0:10 - will include records from index 0 to 10, but will include index 10, resulting in 11 records returned

python df_customers.loc[0:10, :] # returns rows from index 0 to 10, inclusively

Conclusion

Understanding the difference between loc and iloc is crucial for effective data manipulation with Pandas. When using iloc think i = integer based indexing while loc is label-based indexing. I highly recommend spinning up a Kaggle Notebook and trying out these examples!