A group project @ BeCode.org as part of the AI Bootcamp in Gent
This is the first stage of a larger project to create a Machine Learning (ML) model to predict sell prices of real estate properties in Belgium.
The current task is to gather actual data (at least 10,000 entries) from the Belgian real estate market. This data will be used to train and test ML prediction model.
The dataset delivered as a csv
file and covers the following subjects:
- ID number
- Source URL
- Price
- Property type
- Locality and address (if available)
- Number of bedrooms
- Livable surface
- Building information (construction year, facade count, floor count, etc.)
- Property condition
- Kitchen type
- Garden and its surface (if any)
- Terace and its surface (if any)
- The surface of land (for houses)
- Availability of some extras:
- Open fire
- Swimming pool
- Airconditioner
- Available facilities:
- Number of bathrooms, showers, and/or toilets
- Number of parking spaces
- Energy consumption information
- Sale type
The Python-based tool uses ImmoWeb website, the leading real estate website in Belgium, to scrape the required information and stores it in a dictionary format and later is written as a csv
file.
- Clone the immoweb-scraper repository
- Navigate to the root of the repository
- Install the required libraries by running
pip install -r requirements.txt
- Execute the script by running the command
python main.py
in the terminal. - This will scrape the property listing informations from ImmoWeb and store them in the
data
directory in bothjson
andcsv
formats.
This stage of the project lasted 4 days in the week of June 26-30, 2023.
The project was made by a group of Junior AI & Data Scientists (in alphabetical order):
- Félicien De Hertogh LinkedIn | GitHub
- César E. Mendoza V. LinkedIn | GitHub
- Mykola Senko LinkedIn | GitHub
- Vitaly Shalem LinkedIn | GitHub
The project was completed under the supervision of Vanessa Rivera Quiñones and Samuel Borms
Gent | June 30, 2021