Skip to content

This script employs K-Means clustering on the Iris dataset, utilizing Pandas, Matplotlib, and scikit-learn for efficient data handling, visualization, and machine learning.

Notifications You must be signed in to change notification settings

MPranav1/Prediction_USL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

K-Means Clustering on Iris Dataset

This Python script performs K-Means clustering on the Iris dataset using the popular machine learning libraries Pandas, Matplotlib, and scikit-learn.

Steps:

Load Data: The script starts by loading the Iris dataset from a CSV file into a Pandas DataFrame.

df = pd.read_csv('Iris.csv')

Select Features: The features 'SepalLengthCm' and 'SepalWidthCm' are selected for clustering.

X = df[['SepalLengthCm', 'SepalWidthCm']]

Determine Optimal Clusters: The Elbow Method is employed to find the optimal number of clusters by iterating over a range of cluster numbers and plotting the Within-Cluster-Sum-of-Squares (WCSS).

Elbow Method

wcss = [] 
for i in range(1, 11):
kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=0)
kmeans.fit(X)
wcss.append(kmeans.inertia_)
plt.plot(range(1, 11), wcss)
plt.title('Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()

Perform K-Means Clustering: K-Means clustering is then performed with the optimal number of clusters determined from the elbow method.

kmeans = KMeans(n_clusters=optimal_num_clusters, init='k-means++', max_iter=300, n_init=10, random_state=0)
kmeans.fit(X)

Visualize Clusters: The script visualizes the clusters by assigning different colors to each cluster and plotting the data points.

Visualize Clusters

colors = ['red', 'green', 'blue', ...]  # add more colors as needed
for i in range(optimal_num_clusters):
plt.scatter(df[df['cluster'] == i]['SepalLengthCm'], df[df['cluster'] == i]['SepalWidthCm'], color=colors[i])
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.show()

Usage: Ensure you have the required libraries installed: pip install pandas matplotlib scikit-learn. Replace 'Iris.csv' with the path to your dataset. Feel free to experiment with different features and cluster numbers for a deeper understanding of the data's underlying patterns.

About

This script employs K-Means clustering on the Iris dataset, utilizing Pandas, Matplotlib, and scikit-learn for efficient data handling, visualization, and machine learning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages