Maximizing revenue in the taxi service industry is essential for both sustained business success and driver satisfaction. This project aims to analyze the impact of different payment methods on taxi fare pricing and draw business insights to optimize revenue. Using statistical techniques and Python, the goal is to determine if a significant difference exists and explore whether this information can be used to encourage payment methods that lead to higher revenue for drivers while still ensuring a positive customer experience.
The dataset used for this analysis consists of various trip record submissions made by Yellow Taxi. The dataset used in this project contains 18 different columns. It consists of the following fields:
- Vendor ID - Code 1 stands for Creative Mobile Technologies, LLC. Code 2 for VeriFone Inc.
- Pickup Datetime - The date and time when the meter was engaged.
- Dropoff Datetime - The date and time when the meter was disengaged.
- Passenger Count - The number of passengers in the vehicle (value entered by the driver).
- Trip Distance - The elapsed trip distance in miles reported by the taximeter.
- Pickup Location ID - TLC Taxi Zone in which the taximeter was engaged.
- Dropoff Location ID - TLC Taxi Zone in which the tximeter was disengaged.
- Rate Code ID - The final rate code in effect at the end of the trip.
- 1 = Standard rate
- 2 = JFK
- 3 = Newark Airport trips
- 4 = Nassau or Westchester
- 5 = Negotiated fare
- 6 = Group ride
- Store and Forward Flag - This flag indicared whether the trip record was held in vehicle memory before sending to the vendor because the vehicle did not have a connection to the server.
- Payment Type - A numeric code signifying how the passenger paid for the trip.
- 1 = Credit Card
- 2 = Cash
- 3 = No charge
- 4 = Dispute
- 5 = Unknown
- 6 = Avoided Trip
- Fare Amount - The time and distance fare calculated by the meter.
- Extras and Surcharges - Miscellaneous extras and surcharges. Currently, this only includes the $0.50 and $1 rush hour and overnight charges.
- MTA Tax - $0.50 MTA tax that is automatically triggered based on the metered rate in use.
- Improvement Surcharge - $0.50 improvement surcharge that is automatically triggered based on the metered rate in use.
- Tip Amount - Automatically populated for credit card tips. Cash tips are not included.
- Tolls Amount - Total amount of all tolls paid in trip.
- Total Amount - The total amount charged to passengers. Does not include cash tips.
- Congestion Surcharge - Total amount collected in trip for NYS congestion surcharge.
Download Dataset - https://data.world/vizwiz/nyc-taxi-jan-2020/workspace/file?filename=yellow_tripdata_2020-01.csv
Following are the steps involved while performing statistical analysis.
- Data Cleaning - Removing the inconsistencies from the raw data. It involved dropping unncessary columns, handling missing values, correcting data types, dealing with duplicates, and outlier removal.
- Distribution Analysis - How the important features are distributed and studying their characteristics.
- Visualizations & Interpretation of Results - Plotting various graphs and plots to visually confirm the interpretations of distribution analysis.
- Hypothesis Testing - Testing the correctness of the claims made from distribution analysis and visualizations by formulating hypothesis on dependencies of revenue.
- Key Business Insights - Drawing conclusions from various tests to improve the revenue of taxi drivers.
- Story Telling
- During the investigation of relationship between fare amount and duration using regression analysis, 24% of non-linearity was observed. This can be further addressed using polynomial or other non-linear models.
- Analysing the impact of performance on addition of more features to the dataset.
- Investigating heteroscedasticity (non-constant variance) in all the trip durations, and applying transformations to handle non-constant variance. This can involve transforming variables or using a different model.
- The dataset is a sample of 2020 Yellow Taxi Trip Data, January-June. The findings are based on historical data, and results may vary with different datasets.
- This analysis assumes that external factors such as traffic conditions, time of day, or other factors are not considered, and the primary focus is on the relationship between payment type and fare amount. But for further study, few points are noted under the section - "Further Investigations".
- Further enhancements such as incorporating real-time data (e.g., traffic patterns, peak hours) or customer demographics could improve the prediction accuracy.