This Python script performs K-Means clustering on the Iris dataset using the popular machine learning libraries Pandas, Matplotlib, and scikit-learn.
Load Data: The script starts by loading the Iris dataset from a CSV file into a Pandas DataFrame.
df = pd.read_csv('Iris.csv')
Select Features: The features 'SepalLengthCm' and 'SepalWidthCm' are selected for clustering.
X = df[['SepalLengthCm', 'SepalWidthCm']]
Determine Optimal Clusters: The Elbow Method is employed to find the optimal number of clusters by iterating over a range of cluster numbers and plotting the Within-Cluster-Sum-of-Squares (WCSS).
wcss = []
for i in range(1, 11):
kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=0)
kmeans.fit(X)
wcss.append(kmeans.inertia_)
plt.plot(range(1, 11), wcss)
plt.title('Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()
Perform K-Means Clustering: K-Means clustering is then performed with the optimal number of clusters determined from the elbow method.
kmeans = KMeans(n_clusters=optimal_num_clusters, init='k-means++', max_iter=300, n_init=10, random_state=0)
kmeans.fit(X)
Visualize Clusters: The script visualizes the clusters by assigning different colors to each cluster and plotting the data points.
colors = ['red', 'green', 'blue', ...] # add more colors as needed
for i in range(optimal_num_clusters):
plt.scatter(df[df['cluster'] == i]['SepalLengthCm'], df[df['cluster'] == i]['SepalWidthCm'], color=colors[i])
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.show()
Usage: Ensure you have the required libraries installed: pip install pandas matplotlib scikit-learn. Replace 'Iris.csv' with the path to your dataset. Feel free to experiment with different features and cluster numbers for a deeper understanding of the data's underlying patterns.