Skip to content

This project focuses on analyzing the relationship between payment methods and fare amounts through statistical analysis, hypothesis testing, linear regression and finding actionable insights that can help taxi drivers increase their earnings based on preferred payment methods.

Notifications You must be signed in to change notification settings

SnPreethi/Payment_Analysis_for_Maximizing_Taxi_Revenue

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

MAXIMIZING REVENUE FOR TAXI CAB DRIVERS THROUGH PAYMENT TYPE ANALYSIS

INTRODUCTION

Maximizing revenue in the taxi service industry is essential for both sustained business success and driver satisfaction. This project aims to analyze the impact of different payment methods on taxi fare pricing and draw business insights to optimize revenue. Using statistical techniques and Python, the goal is to determine if a significant difference exists and explore whether this information can be used to encourage payment methods that lead to higher revenue for drivers while still ensuring a positive customer experience.

DATASET

The dataset used for this analysis consists of various trip record submissions made by Yellow Taxi. The dataset used in this project contains 18 different columns. It consists of the following fields:

  1. Vendor ID - Code 1 stands for Creative Mobile Technologies, LLC. Code 2 for VeriFone Inc.
  2. Pickup Datetime - The date and time when the meter was engaged.
  3. Dropoff Datetime - The date and time when the meter was disengaged.
  4. Passenger Count - The number of passengers in the vehicle (value entered by the driver).
  5. Trip Distance - The elapsed trip distance in miles reported by the taximeter.
  6. Pickup Location ID - TLC Taxi Zone in which the taximeter was engaged.
  7. Dropoff Location ID - TLC Taxi Zone in which the tximeter was disengaged.
  8. Rate Code ID - The final rate code in effect at the end of the trip.
    • 1 = Standard rate
    • 2 = JFK
    • 3 = Newark Airport trips
    • 4 = Nassau or Westchester
    • 5 = Negotiated fare
    • 6 = Group ride
  9. Store and Forward Flag - This flag indicared whether the trip record was held in vehicle memory before sending to the vendor because the vehicle did not have a connection to the server.
  10. Payment Type - A numeric code signifying how the passenger paid for the trip.
    • 1 = Credit Card
    • 2 = Cash
    • 3 = No charge
    • 4 = Dispute
    • 5 = Unknown
    • 6 = Avoided Trip
  11. Fare Amount - The time and distance fare calculated by the meter.
  12. Extras and Surcharges - Miscellaneous extras and surcharges. Currently, this only includes the $0.50 and $1 rush hour and overnight charges.
  13. MTA Tax - $0.50 MTA tax that is automatically triggered based on the metered rate in use.
  14. Improvement Surcharge - $0.50 improvement surcharge that is automatically triggered based on the metered rate in use.
  15. Tip Amount - Automatically populated for credit card tips. Cash tips are not included.
  16. Tolls Amount - Total amount of all tolls paid in trip.
  17. Total Amount - The total amount charged to passengers. Does not include cash tips.
  18. Congestion Surcharge - Total amount collected in trip for NYS congestion surcharge.

Download Dataset - https://data.world/vizwiz/nyc-taxi-jan-2020/workspace/file?filename=yellow_tripdata_2020-01.csv

METHODOLOGY

Following are the steps involved while performing statistical analysis.

  1. Data Cleaning - Removing the inconsistencies from the raw data. It involved dropping unncessary columns, handling missing values, correcting data types, dealing with duplicates, and outlier removal.
  2. Distribution Analysis - How the important features are distributed and studying their characteristics.
  3. Visualizations & Interpretation of Results - Plotting various graphs and plots to visually confirm the interpretations of distribution analysis.
  4. Hypothesis Testing - Testing the correctness of the claims made from distribution analysis and visualizations by formulating hypothesis on dependencies of revenue.
  5. Key Business Insights - Drawing conclusions from various tests to improve the revenue of taxi drivers.
  6. Story Telling

FURTHER INVESTIGATIONS

  1. During the investigation of relationship between fare amount and duration using regression analysis, 24% of non-linearity was observed. This can be further addressed using polynomial or other non-linear models.
  2. Analysing the impact of performance on addition of more features to the dataset.
  3. Investigating heteroscedasticity (non-constant variance) in all the trip durations, and applying transformations to handle non-constant variance. This can involve transforming variables or using a different model.

NOTE

  • The dataset is a sample of 2020 Yellow Taxi Trip Data, January-June. The findings are based on historical data, and results may vary with different datasets.
  • This analysis assumes that external factors such as traffic conditions, time of day, or other factors are not considered, and the primary focus is on the relationship between payment type and fare amount. But for further study, few points are noted under the section - "Further Investigations".
  • Further enhancements such as incorporating real-time data (e.g., traffic patterns, peak hours) or customer demographics could improve the prediction accuracy.

About

This project focuses on analyzing the relationship between payment methods and fare amounts through statistical analysis, hypothesis testing, linear regression and finding actionable insights that can help taxi drivers increase their earnings based on preferred payment methods.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published