Skip to content

Use Python and the Pandas library to analyze school district data

Notifications You must be signed in to change notification settings

batty2021/School_District_Analysis

Repository files navigation

School_District_Analysis

Use Python and the Pandas library to analyze school district data.

Overview of the analysis:

Purpose :

Meria and her superviser asked me to replace the Math and Reading scores for Thomas High School with NaNs while keeping the rest of the data intact becouse the student complete CSV file shows evidence of acadamic dihonesty.

Resourse:

Resources/schools_complete.csv , Resources/students_complete.csv

Software:

Python 3.7, Anaconda, Jupyter Notebook and basics of the Pandas library.

Results:

Becouse of acadamic dishonesty by the ninth grade students of Thomas High School this Analysis was conducted twice.
The first trial included the full set of student data and the second one had thier scores replaced with NaN. The dataframe 
shown below is a summary reperesenting the district after replacing the ninth Graders' scors with NaN.

District_Summary_DataFrame

  • Replacing the ninth graders' math and reading scores with NaN resulted in the following changes ;
    • the overall passing percentage for Thomas High School fell to 65%
    • the Thomas high school has no longer included on the list of top 5 schools
  • When the 9th graders' of Thomas High School had their scores altered from the calculations, the following changes happened;
    • the overall passing percentage of THS decreased by 0.11%
    • the average scores of THS for math and reading increased by 0.06%
    • spending ranges $630 to $644 per student so, the overall passing percentage decreased by 0.1%
    • unfortunately; school rankings are unchanged. Thomas High School (THS) is still the 2nd best performing school in the district with an overall passing rate of 90.63% among their tenth through twelfth graders.
    • updated_Metrics_THS

The Effects of the school budget and school size

  • average scores and passing percentages do not increase as spending per student increases, this shows that there are more relevant factors than funding to decide average student scores. Spending_ranges

when considering School size "Large" schools with over 2000 students have the lowest average scores and passing percentages .when we see the performance between "small" & "medium" size schools is negligible, this indicates smaller students are performing better and in a more confidential setting.

School_size

Districe VS. Charter School

Charter schools are performing better than district schools in this analysis and also charter schools are top five highest overall passing percentages.

School_type

Math and reading scores by grade

After all, analyzing the average scores for math and reading by grade level for each school, found out that students' grade level does not affect their scores as much as the school that they attend. to see the detailed breakdown for math scores by grade shown below;

math_score_by_grade

Reading score by grade;Reading_score_by_grade

Summary

Finally, Omitting the 9th grade from Thomas High School is a suboptimal issue because a full set of data is ideal for creating the most accurate results, on the other side replacing the grades with NaN caused THS(Thomas High School) overall passing percentage and average scores are crash due to this THS lost its placement as top five within the district. however; after updating the total student to exclude the Thomas High School ninth grades and omitting their scores from the dataset, Thomas High School(THS) regained its high scores and got its position as the second place in the district.

About

Use Python and the Pandas library to analyze school district data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published