In directory data-raw
there is the clean_data.R
script that parses and combines three datasets provided by the State Water Boards
MonthlyPostingMarch.xlsx
are the results of lead testingSchoolsUnsampled.xlsx
are the schools that have not been sampled yetexemption_forms.xlsx
are the schools that are exempt due to independent testing
For schools with sites that tested greater than 5 ppb lead, the median value (of values >5 ppb) at the school is used to represent the lead level.
The cleaned and combined dataset is namedca_schools_lead_testing_data.csv
- district: name of district
- schoolName: name of school
- schoolAddress: school address
- medianResult: median lead found for test above 5 ppb
- unit: unit of median result (ppb = parts per billion)
- lead: was lead detected above 5 ppb (TRUE, FALSE, or NA if not tested yet or exempt from testing)
- status: testing status (tested, not tested, or NA if exempt)