Skip to content

The State of Iowa set a goal to replace 25% of petroleum in Iowa with biofuel by 2020. Did they succeed? The answer, as well as the analysis of fuel sales in the State in 2012-2022 years, are here in my Python project!

Notifications You must be signed in to change notification settings

lanavirsen/Iowa-Motor-Fuel-Sales

Repository files navigation

⛽ Iowa Motor Fuel Sales Analysis

Iowa logo

Introduction

In this project, I explore the dataset "Iowa Motor Fuel Sales by County and Year".

About the dataset

From data.iowa.gov:

Iowa motor fuel retailers are businesses that offer a range of fuel options, including gasoline, diesel, ethanol, and biodiesel. Iowa Code section 452A.33 requires all Iowa fuel retailers to report motor fuel and diesel gallons to the Iowa Department of Revenue and for the Department to prepare and submit an annual report of fuel gallons to the Iowa Governor and Legislature. The full annual reports related to this dataset are published on the Iowa Department of Revenue web page.

The State of Iowa set a goal to replace 25.0 percent of petroleum in Iowa with biofuel by 2020, and the Biofuel Distribution Percentage measures how the State is doing toward meeting that goal.

Within the scope of this project, I aim to answer the following questions:

  • How have sales of different types of fuel changed over the 2012-2022 period?
  • How do fuel sales and the biofuel distribution percentage correlate with the number of retail locations?
  • What is the biofuel distribution percentage across different counties, and how does it compare to the state's goal?
  • How does the biofuel distribution percentage trend over the years, and does the state meet its 2020 goal?

The tools I use:

  • Python: Primary programming language for data manipulation and analysis.
  • Pandas: A Python library utilized for cleaning and transforming datasets.
  • Matplotlib/Seaborn: These visualization libraries generate plots and graphs to explore data trends.
  • GeoPandas: An extension of Pandas for working with geospatial data.
  • Scikit-learn: A toolkit for predictive modeling and machine learning analysis.
  • NumPy: A fundamental package for scientific computing in Python.
  • Jupyter Notebook: An interactive environment for documenting the data analysis process.

The files

Data sources

  1. The dataset "Iowa Motor Fuel Sales by County and Year" has been provided by: Iowa Department of Revenue.
  1. The Shapefile "Iowa County Boundaries", used for geographical visualization, has been provided by: Iowa Geospatial Data Clearinghouse.

The dataset's description

The dataset has 9 columns and 1075 rows.

Data.iowa.gov provided the following description of the dataset's columns:

Column name Description Type
Calendar Year Calendar year when fuel was sold Number
County County where fuel was sold Plain text
Number of Retail Locations Number of retail fuel locations within the county Number
Non-Ethanol Gasoline Sales (in gallons) Gallons of non-ethanol gasoline sold Number
Ethanol Gasoline Sales (in gallons) Gallons of ethanol gasoline sold Number
Clear and Dyed Diesel Sales (in gallons) Gallons of clear and dyed diesel sold Number
Clear and Dyed Biodiesel Sales (in gallons) Gallons of clear and dyed biodiesel sold Number
Pure Biodiesel Sales (in gallons) Gallons of pure biodiesel sold Number
Biofuel Distribution Percentage Biofuel Distribution Percentage = (Pure Ethanol Gallons + Pure Biodiesel Gallons) / Total Gasoline Gallons Number

Steps done

1. Data cleaning

1.1. Renaming columns

I needed to rename one column due to a formatting issue.

1.2. Handling missing values

The dataset contained missing values in various columns

After closer inspection, I decided to keep the missing values for now, as most of them wouldn't affect the analysis I was conducting in the scope of this project. The ones that later proved to affect the integrity of the geographical visualization got imputed (through interpolation and extrapolation) in one of the following steps.

1.3. Considerations regarding data types

Three of the columns that were expected to have integer values had their data type set to float due to the missing values present.

As I decided to keep the missing values, the data type of the columns remained set to float.

1.4. Checking string case

To avoid possible future issues when I would be merging the dataset with geographical data, I reinforced title case in the County column.

1.5. Handling duplicates

There were no duplicates in the dataset.

1.6. Identifying outliers

The dataset revealed a great number of outliers in different columns.

After a thorough investigation, I replaced some outliers that appeared to be errors, while retaining most values identified as outliers.

2. Analysis

2.1. Trend analysis

I analyzed sales trends over the years to understand the market dynamics for different types of fuels.

2.2. Impact of retail locations

I evaluated the impact of the Number of Retail Locations on fuel sales, as well as the Biofuel Distribution Percentage.

To explore the relationship, I used scatter plots and correlation analysis.

2.3. Regional insights

I examined the Biofuel Distribution Percentage across different counties through geographical visualization.

For that, I used geopandas library and a shapefile defining the boundaries of Iowa's counties.

As the visualization revealed missing values for the Biofuel Distribution Percentage for some counties for certain years, these values got imputed (through interpolation and extrapolation).

2.4. Biofuel adoption

  • I assessed Iowa's progress towards the state's goal of replacing 25% of petroleum with biofuel by 2020.
  • I identified counties with Biofuel Distribution equal or above 25% in 2020.
  • Assuming a linear trend, I predicted when the state could achieve its goal.

Conclusion

Answering the questions posed at the beginning on this project.

How have sales of different types of fuel changed over the 2012-2022 period?

Figure 1

Figure 1: Annual sales trends for various fuel types.

  • Total Ethanol Gasoline Sales generally remained high. They were increasing in the period between 2012 and 2016, peaking at over 1.37 billion gallons in 2016, then plateaued between 2016 and 2019 before starting to decrease from nearly 1.35 billion gallons in 2019 to slightly over 1.11 billion gallons in 2022.
  • Total Clear and Dyed Biodiesel Sales were on the rise from about 286 million gallons in 2012 to approximately 495 million gallons in 2019, then entering a potential plateau.
  • Total Clear and Dyed Diesel Sales showed fluctuations over the years with a peak in 2015 (over 481 million gallons) before beginning to decline, with a slight increase to nearly 350 million gallons noted in 2022.
  • Total Pure Biodiesel Sales, introduced in 2018, remain relatively stable with a slight decrease over the registered years from nearly 210 million gallons in 2018 to about 198 million gallons in 2022.
  • Total Non-Ethanol Gasoline Sales consistently decreased throughout the entire period from approximately 272 million gallons in 2012 to around 163 million gallons in 2022.

How do fuel sales and the biofuel distribution percentage correlate with the number of retail locations?

The number of retail locations has a strong positive relationship with fuel sales across all types (in other words, fuel sales increase proportionally with the number of retail outlets). Figure 2 below shows the heatmap of the relationships sorted by their strength.

Figure 2

Figure 2: Correlation of the number of retail locations with fuel sales and the biofuel distribution percentage.

A correlation coefficient close to 1 indicates a strong positive relationship, while a coefficient near 0 suggests no linear relationship.

No relationship between the Number of Retail Locations and the Biofuel Distribution Percentage was found.

What is the biofuel distribution percentage across different counties, and how does it compare to the state's goal?

The State of Iowa set a goal to replace 25% of petroleum in the state with biofuel by 2020.

The data shows that by 2020, the majority (75%) of Iowa's counties had the Biofuel Distribution Percentage below 13.6%, falling significantly short of this target.

There were 6 counties with Biofuel Distribution equal or above 25% in 2020, or 6.1% of all the counties of state Iowa. These counties are listed below on Figure 3.

Figure 3

Figure 3: The list of counties with biofuel distribution equal or above 25% in 2020.

The following animation on Figure 4 illustrates the yearly changes in biofuel distribution percentages across Iowa counties over a decade.

A color scale is used where deep red signifies a low Biofuel Distribution Percentage (0%), transitioning to an off-white for percentages around 25% (the state's goal), and culminating in deep blue for the highest percentages (up to 50%).

Figure 4

Figure 4: Animated biofuel distribution percentage by county (2012-2022).

How does the biofuel distribution percentage trend over the years, and does the state meet its 2020 goal?

There has been a gradual increase in the biofuel distribution percentage from 9.4% in 2012 to 15.1% in 2022, but the state did not achieve its goal of replacing 25% of petroleum with biofuel by 2020.

Figure 5

Figure 5: Annual average biofuel distribution percentage and forecast to 25% goal.

Assuming a linear trend, the Biofuel Distribution Percentage is predicted to reach 25% in the year 2042.

About

The State of Iowa set a goal to replace 25% of petroleum in Iowa with biofuel by 2020. Did they succeed? The answer, as well as the analysis of fuel sales in the State in 2012-2022 years, are here in my Python project!

Resources

Stars

Watchers

Forks