- Project Reference & Resource: Google Data Analytics Capstone: Complete a Case Study
- I'll be acting as a junior data analyst at Cyclistic, a fictional company.
- I'll encounter various characters and team members throughout the study.
- My goal is to address critical business questions by following the data analysis process: ask, prepare, process, analyze, share, and act.
- Tools used: MySQL, Tableau.
- I am a junior data analyst at Cyclistic in the marketing analyst team.
- The director of marketing thinks annual memberships are crucial for the company's success.
- My team's goal is to analyze how casual riders and annual members use Cyclistic bikes differently.
- We aim to use these insights to create a strategy for converting casual riders into annual members.
- Our recommendations need approval from Cyclistic executives and require strong data insights and professional data visualizations.
-
Cyclistic is a bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can't use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day.
-
Lily Moreno is the director of marketing and my manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.
-
The Cyclistic marketing analytics team is a group of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. I joined this team six months ago and have been busy learning about Cyclistic's mission and business goals — as well as how I, as a junior data analyst, can help Cyclistic achieve them.
-
The Cyclistic executive team is known for being notoriously detail-oriented. They will decide whether to approve the recommended marketing program.
- Three questions will guide the future marketing program:
- How do annual members and casual riders use Cyclistic bikes differently?
- Why would casual riders buy Cyclistic annual memberships?
- How can Cyclistic use digital media to influence casual riders to become members?
Moreno has assigned me the first question to answer: How do annual members and casual riders use Cyclistic bikes differently?
I plan to utilize Cyclistic's archived trip information to conduct an analysis and detect patterns. This data can be obtained through the divvy_tripdata source, with permission granted by Motivate International Inc. pursuant to their licensing agreement.
Note - This is public data that you can use to explore how different customer types are using Cyclistic bikes. But note that data-privacy issues prohibit you from using riders’ personally identifiable information. This means that you won’t be able to connect pass purchases to credit card numbers to determine if casual riders live in the Cyclistic service area or if they have purchased multiple single passes.
- Data Timeframe: June 2022 to May 2023 (Past 1 year data)
- Source Data Description and Structure:
- The data is organized into individual .csv files, each corresponding to a specific month.
- These files share common fields relevant to our analysis.
- After our initial examination, we can confidently assert that the source data is Reliable, Original, Comprehensive, Current and Cited.
- Tool for Data Cleaning, Transformation, and Processing: SQL
- We chose SQL for these tasks because the dataset exceeded 5.8 million rows, making it impractical to manage with Excel (which is typically suitable for datasets of less than 1 million rows). SQL's capacity to handle large volumes of data makes it the preferred choice for data wrangling.
- Tool for Analysis and Visualization: Tableau
- While there are several tools available for data visualization, including Excel and Power BI, our selection of Tableau is deliberate. Tableau offers dynamic capabilities and a diverse range of visualization options, making it the ideal choice for our analytical and visualization needs.
- File Naming Convention: Files are named in the format of YYYYMM-divvy.csv.
- Data Integration into 'combined_Table': The data has been uploaded and consolidated into a unified table named 'combined_Table'. (To see the process of data upload and consolidation into a single table, please click here.)
There are 5829030 no of rows and 13 no of columns observed in 'combined_Table' table
-
Total inconsistent values count
Total count of rows including 'Total no of blank or null values', 'Outlier or false value of ride length longer than a day' and 'Outlier or false value of ride length less than a minute'.
-
Total inconsistent values percentage
Percent of rows including 'Total no of blank or null values', 'Outlier or false value of ride length longer than a day' and 'Outlier or false value of ride length less than a minute'.
-
Data type: VARCHAR(255)
-
Constraints applied:
- Primary Key
- Unique
- Not Null
-
Primary Key constraints added in this field
- Data type: DATETIME -- (YYYY-MM-DD hh:mm:ss)
Characteristics and data format of Field 'start_station_name', 'end_station_name', 'start_station_id' & 'end_station_id
- Data type: MEDIUMTEXT
- Blank values were found in all four columns.
- Data type: Double
- Blank values were found in 'end_lat' & 'end_lng' columns
Data Cleaning & Transformation SQL Queries
- Rows with missing values are removed.
- We excluded trips that were too short (less than a minute) or too long (more than a day).
- A total of 1426785 rows were deleted.
- Created new Columns:
- ride_length
- month
- day_of_week
- hour_in_day.
- The data has been tidied and formatted for analysis.
- I loaded it into Tableau Public and used it to create charts and graphs to visualize the data.
- I then created a Story in Tableau to share the insights I gained from the data.
Tableau Story Link: Cyclistic Bike Share Analysis | June 22 to May 23