The stock market is highly volatile and unpredictable which makes stock price forecasting and portfolio optimization challenging tasks. Therefore, since investors seek strategies that can provide risk-adjusted returns efficiently.
This project aims to leverage machine learning algorithms to predict future stock prices of the S&P-500 Market Index and subsequently apply optimization techniques to identify the optimal set of stocks for daily investment. The stock selection process focuses on maximizing returns and minimizing risks, addressing real-world financial challenges.
As a request from ou professor this project was developed using a Notebook
. Therefore if you're looking forward to test it out yourself, keep in mind to either use a Anaconda Distribution or a 3rd party software that helps you inspect and execute it.
Therefore, for more informations regarding the Virtual Environment used in Anaconda, consider checking the DEPENDENCIES.md file.
To effectively develop this project, we have divided it into the following phases:
-
Data Preprocessing and Feature Engineering
- Extract and process historical stock market data. Engineer relevant features such as moving averages and volatility measures. -
Data Cleaning
- Identify and remove incongruent or invalid entries from the stock market dataset. Handle missing values, outliers, and inconsistencies to ensure high-quality input data. -
Exploratory Data Analysis (EDA)
- Conduct in-depth analysis to understand data distributions and trends. Derive actionable insights to inform feature selection and modeling strategies. -
Model Development and Evaluation
- Develop and evaluate predictive models, including LSTMs (Long Short-Term Memory networks) and LightGBM (Light Gradient Boosting Machine). Implement a sliding window approach for training, where the model is iteratively trained on data window that moves forward by N days until reaching the end of the dataset. -
Portfolio Optimization
- Apply optimization techniques such as Monte Carlo simulations, Min-Max strategies, and genetic algorithms. Optimize portfolio selection to balance the trade-off between maximizing returns and minimizing risks. -
Results Analysis
- Analyze optimization outcomes in the context of financial performance metrics.
If you're interested in inspecting and executing this project yourself, you'll need access to all the datasets
we've created.
Since GitHub has file size limits, we've made them all available in a Cloud Storage provided by Google Drive which you can access here.
We began by examining the key characteristics of the S&P-500 Market Index
, focusing specifically on:
- The distribution of stocks across different industries.
- The trends in closing prices over time.
Stock's Industry Distribution
|
Closing Prices
|
---|---|
To illustrate the methodology applied to the chosen stocks, we highlight NVDA
as an example. By examining NVDA’s data, we can more clearly demonstrate the steps involved in analyzing and processing the information.
Conducted additional exploratory data analysis on the stock's market trends through an in-depth examination of key financial metrics.
Using a 20-day rolling window methodology, we prepared the data to train several machine learning models, achieving the following performance results:
Finally, leveraging a genetic algorithm
, we carried out portfolio optimization to devise an asset allocation plan. This approach resulted in a profit of approximately $30, as demonstrated through various financial metrics.
Overall, we have developed a tool designed to assist investors in effectively managing their assets, aiming to support them in making informed investment decisions.
- Authors → Francisco Macieira, Gonçalo Esteves and Nuno Gomes
- Course → Laboratory of AI and DS [CC3044]
- University → Faculty of Sciences, University of Porto
README.md by Gonçalo Esteves