Water Quality Analysis Project

Overview

This repository contains the coursework project focused on analyzing and predicting water quality using various machine learning techniques. The aim is to evaluate the effectiveness of different classifiers in predicting water quality based on several input parameters.

Project Structure

Data Collection: Identification and extraction of relevant water quality data from external sources.
Literature Review: Examination of existing research and literature to understand the domain and the applicability of various machine learning methods.
Method Justification: Justification for the selection of specific data analysis and machine learning techniques.
Model Implementation and Evaluation: Application and comparison of the effectiveness of different machine learning models, including K-Nearest Neighbors (KNN), Decision Tree Classifier (DTC), and Random Forest Classifier (RFC).
Documentation: Preparation of the explanatory note detailing the methodology, analysis, and findings.
Submission and Defense: Submission of the project for review and defense of the coursework.

Functionality

The software developed for this project includes the following functionality:

Loading the dataset.
Data preprocessing and exploration.
Applying machine learning models.
Predicting water quality.
Analyzing factors affecting water quality.
Visualizing and analyzing the results.

Models and Results

K-Nearest Neighbors (KNN)
- High impact: Total Dissolved Solids, Chloride, Sulfate.
- Low impact: Lead, Manganese, Iron.
Random Forest Classifier (RFC)
- High impact: pH, Turbidity, Manganese.
- Low impact: Lead, Total Dissolved Solids.
Decision Tree Classifier (DTC)
- High impact: pH, Chloride, Turbidity, Manganese.
- Low impact: Lead, Zinc, Sulfate.

Feature importance for KNN

Feature importance for RFC

Feature importance for DТC

Conclusion

The analysis of water quality using KNN, RFC, and DTC models provided insights into the most influential parameters affecting water quality. Among the three methods, Random Forest Classifier demonstrated the highest accuracy and effectiveness in classifying water quality, making it the most suitable method for this analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitattributes		.gitattributes
README.md		README.md
Report Course Work Skrypets Olga.docx		Report Course Work Skrypets Olga.docx
Report Course Work Skrypets Olga.pdf		Report Course Work Skrypets Olga.pdf
water_Quality_Prediction.ipynb		water_Quality_Prediction.ipynb
water_quality_prediction.py		water_quality_prediction.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Water Quality Analysis Project

Overview

Project Structure

Functionality

Models and Results

Feature importance for KNN

Feature importance for RFC

Feature importance for DТC

Conclusion

About

Releases

Packages

Languages

uhpoler/WaterQualityPrediction-coursework

Folders and files

Latest commit

History

Repository files navigation

Water Quality Analysis Project

Overview

Project Structure

Functionality

Models and Results

Feature importance for KNN

Feature importance for RFC

Feature importance for DТC

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages