Skip to content

Latest commit



170 lines (138 loc) · 5.23 KB

File metadata and controls

170 lines (138 loc) · 5.23 KB

Real Estate Price Prediction with Streamlit

This project focuses on analyzing real estate data and building predictive insights on house prices. The dataset contains features such as house size, number of bedrooms, and other key attributes.

Project Overview

The goal is to explore the dataset, prepare it for analysis, visualize key trends, and build a predictive model to estimate house prices.

Key Steps

1. Data Preparation

  • Loaded the real estate data from a CSV file (real_state_dataset.csv).
    import pandas as pd
    df = pd.read_csv('real_state_dataset.csv')
  • Displayed basic dataset information including shape, column names, and data types.
  • Inspected the first few rows using head().
  • Checked for missing values using df.isnull().sum().
  • Dropped unnecessary columns: brokered_by, zip_code, and prev_sold_date.
    df.drop(columns=['brokered_by', 'zip_code', 'prev_sold_date'], inplace=True)
  • Removed rows with missing values using dropna().
  • Checked for duplicate entries and removed them using drop_duplicates().

2. Exploratory Data Analysis (EDA)

  • Calculated descriptive statistics (count, mean, min, max) for numerical columns using describe().

  • Analyzed the distribution of key features.

  • Visualized the top 10 states with the most houses using a bar plot.

    import matplotlib.pyplot as plt
    plt.title('Top 10 States with Most Houses')


  • Calculated average house prices by state and city.

    avg_price_by_state = df.groupby('state')['price'].mean()
  • Displayed the correlation between numerical features and the target variable (price).


3. Feature Engineering and Selection

  • Selected relevant features (bed, bath, house_size) for model building.
    X = df[['bed', 'bath', 'house_size']]
    y = df['price']
  • No additional feature engineering was performed.

4. Model Building and Evaluation

  • Split the dataset into training and testing sets using train_test_split.
    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
  • Standardized the numerical features using StandardScaler to improve model performance.
    from sklearn.preprocessing import StandardScaler
    from sklearn.linear_model import LinearRegression
    import joblib
    scaler = StandardScaler()
    X_train = scaler.fit_transform(X_train)
    X_test = scaler.transform(X_test)
    joblib.dump(scaler, 'scaler.pkl')
  • Trained a Linear Regression model using the training data.
    lr = LinearRegression(), y_train)
  • Made predictions on the test data and evaluated the model using Mean Absolute Error (MAE).
    from sklearn.metrics import mean_absolute_error
    lr_pred = lr.predict(X_test)
    mae = mean_absolute_error(y_test, lr_pred)
    print(f'Mean Absolute Error: {mae}')
  • Saved the trained model and scaler using joblib.dump().
    joblib.dump(lr, 'model.pkl')

5. Streamlit Application

  • A Streamlit app was developed to allow users to input house features and get a predicted price.
    import streamlit as st
    import joblib
    import numpy as np
    scaler = joblib.load('scaler.pkl')
    model = joblib.load('model.pkl')
    st.title('House Price Prediction')
    bed = st.number_input('Bedrooms', value=2 , step=1)
    bath = st.number_input('Bathrooms', value=1, step=1)
    house_size = st.number_input('House Size', value=1000, step=50)
    X = [bed, bath, house_size]
    predict_btn = st.button('Predict')
    if predict_btn:
        X1 = np.array(X)
        X_array = scaler.transform([X1])
        prediction = model.predict(X_array)[0]
        st.write(f'Predicted Price: {prediction:.2f}')
        st.write('Click the button to predict the price')


  • The model was evaluated using Mean Absolute Error (MAE), which measures how close predictions are to actual values.
  • The Streamlit app provides an interactive interface for predicting house prices.

Result Image


Future Work

  • Handle outliers (if present) and perform feature scaling.
  • Compare the Linear Regression model with other machine learning models (e.g., Decision Trees, Random Forests).
  • Tune hyperparameters to improve model performance.
  • Deploy the final model as a web application.


Contributions are welcome! Please fork the repository and create a pull request for any enhancements or bug fixes.


This project is licensed under the MIT License.