Extensive study on regime identification in financial markets.
Developed with the software and tools below.
Table of Contents
This project investigates various statistical, machine learning, and deep learning methodologies for regime identification in financial markets. This research is conducted in collaboration with Insight Investment, involving a series of experiments and extensive ablation studies. We explore four different unsupervised models to identify patterns and trends from multiple indices belonging to different assets, providing insights into designing regime identification frameworks and approaches for time series representation learning with a focus on the local structures. Notably, the model using pure CNN AutoEncoder without recurrent connection effectively detects the 2008 Global Financial Crisis and 2019 COVID-19 periods.
These models can be used to identify the current regime, which provides an overall speculation on the market behaviour in the near feature. The market regime can also serve as a crucial feature for predicting price changes of financial products.
The overview of the four models can be seen in the table below:
Model | Model Structure | Deep Module Structure |
CFM (C-H) | two-stage | N/A |
DFM | two-stage | CNN AutoEncoder |
DCFM | two-stage | CNN AutoEncoder |
EEM | end-to-end | Siamese CNN |
The project report got distinction as part of my MSc Data Science and Machine Learning programme at UCL. The report can be seen at https://drive.google.com/file/d/1rvDxbqDT0iZz-Ybv-Ej64zos2VQEr_em/view?usp=sharing
In this Correlation Feature- Based Model (CFM) explored in this chapter, we use correlations between price changes of financial indices as features, applying variations with meta and cophenetic similarity measures, combined with k-means++ and hierarchical clustering algorithms. CFM effectively detects stable and predictable regime structures, successfully identifying significant events like the 2008 Global Financial Crisis (GFC). However, it struggles to capture deeper, more complex market patterns due to its insufficiency of correlation features to capture market trends.
The model structure can be seen in the figure below:
In this Deep Feature-Based Model (DFM) explored in this chapter, we use a CNN AutoEncoder to extract deep features from multivariate financial time series data. DFM excels in detecting specific market states and capturing complex behaviours, more effectively identifying events like the GFC and COVID- 19 periods than CFM. It identifies shorter and more frequent regime transitions, highlighting sensitivity to short-term market signals. Despite its effectiveness, DFM might introduce more noise and instability than CFM, reflected by higher entropy and shorter regime durations.
The model structure can be seen in the figure below:
In this Deep-Correlation Feature-Based Model (DCFM) explored in this chapter, we combine deep features from the CNN AutoEncoder with correlation features to achieve a trade-off between short-term and long-term trends, while reducing the noise and enhancing the detection and differentiation of market states. However, it does not exhibit the expected superior performance. The regime structure resembles that of CFM more than DFM, indicating the dominance of correlation features and the failure to leverage deep features’ deep and complex understanding of market structures.
The model structure can be seen in the figure below:
In this End-to-End Model (EEM) explored in this chapter, we use a Siamese CNN to directly identify market regimes without a separate clustering stage. However, EEM faces a significant class collapse problem, favouring only a subset of regimes due to the inability to distinguish market states effectively. It overfits input data while underfitting underlying patterns, leading to less meaningful regime structures. Incorporating inverse entropy loss helps to mitigate the issue, but challenges persist.
The model structure can be seen in the figure below:
The data used in this project was sourced from Bloomberg and covers the earliest available record up to June 11, 2024, with a daily frequency. The names of indices and columns we collected are shown in the table below:
Index | Price | Daily Total Return | Description | Asset Class |
CSI BARC Index | ✓ | Barclays Credit Spread Index | Credit Spread | |
DXY Curncy | ✓ | US Dollar Index | Currency | |
MXWO Index | ✓ | ✓ | MSCI World Equity Index | Equities |
SPGSIN Index | ✓ | S&P GS Industrial Metals Index | Commodities | |
SPX Index | ✓ | ✓ | S&P 500 Index | Equities |
USGG10YR Index | ✓ | US 10 year Gov Bond Yield | Interest Rates | |
VIX Index | ✓ | VIX Index | Volatility | |
XAU Curncy | ✓ | Gold Price | Commodities |
We explored four different models. One of them, DFM, yields relatively favourable results. Therefore, the results for DFMs are shown here as examples.
We identified 8 regimes in this model, represneted by different colors in the following figure. VIX is drawn in blue here as well as an indicator.
The price changes of different indices under different regimes are visualized. Specifically, we focused on annualized percentage price changes, and created
which is ann_change
divided by ann_std
to incorporate the consideration of risks.
We also used the normalized entropy of the transition matrix to assess the predictability and regime duration median to assess the stability.
└── market-regimes/
├── cluster.py
├── constant.py
├── correlation.py
├── data.py
├── example images/
├── flow
│ ├── cluster_assess_flow.py
│ ├── end_to_end_flow.py
│ ├── feature_concat_flow.py
│ ├── similarity_generate_flow.py
│ └── train_flow.py
├── loss.py
├── main.py
├── networks.py
├── preprocess.py
├── process.py
├── README.md
├── requirements.txt
├── results summary/
│ ├── summary_CFMs.csv
│ └── summary_DFMs.csv
├── stats.py
├── utils.py
└── visualization.py
File | Summary |
cluster.py | This script contains all the clustering stuff as well as the metrics and methods to evaluate the clustering results. |
constant.py | constant.py contains constants for this project, including neural network module configurations, data column mappings for returns, and custom color palettes for data visualization. |
correlation.py | Calculates and analyzes rolling correlation matrices among indices using multiple statistical methods. It constructs similarity matrices from these correlations using both cophenetic and meta-methodologies, enhancing the understanding of inter-temporal index behaviors potentially crucial for regime detection in financial markets. |
data.py | Transforms multivariate time series data into images and correlation matrices, including all the dataset classes. |
loss.py | loss.py defines KLDivergenceLoss, integrating KL divergence with optional L2 and the inverse entropy regularization |
main.py | Execute all the experiments in this project. |
networks.py | Implements all the deep learning modules including CNN AutoEncoder and the Siamese CNN. |
preprocess.py | This script is for the preprossing of the raw data. |
process.py | Provides the core functionalities for model training, inference, and feature extraction. It defines workflow classes crucial for regime identification and analysis in financial data series. |
stats.py | Provides comprehensive statistical analyses and metrics essential for assessing financial regime durations, transitions, and returns across different timeframes, contributing to the analysis of the regime identification results. |
utils.py | The utils.py file in the market-regimes project serves as a utility module, providing support functions across the application. This module plays a critical role by handling common functionalities like file operations, data parsing, and functions to manage the numerous result folders. |
visualization.py | This contains all the visualization in this project, including those for a single model and a summary of a ablation study |
This directory contains all the workflows in this project.File | Summary |
cluster_assess_flow.py | This is for the clustering based on the extracted features, as well as the analysis of the regime identification results. |
end_to_end_flow.py | This is for the end-to-end regime identification, including the analysis of the regime identification results. |
feature_concat_flow.py | This is for the concatenation of the deep features and the correlation features. |
similarity_generate_flow.py | Generates similarity matrices, i.e., features based on correlations. |
train_flow.py | Generates deep features. |
results summary
This directory contains all the workflows in this project.File | Summary |
summary_CFMs.csv | Summary of ablation study for CFMs. |
summary_DFMs.csv | Summary of ablation study for DFMs. |
System Requirements:
- Python:
version 3.11
- Clone the repository:
$ git clone https://github.com/kangchengX/market-regime.git
- Change to the project directory:
$ cd market-regime
- Install the dependencies:
$ pip install -r requirements.txt
According to the company's policy, the raw data is not allowed to public. However, the same data can be obtained through the steps described in Section Data.
Use the command below:
$ python main.py
Thanks for the help from the Senior Portfolio Manager, Zacharias Bobolakis, and the Senior VP and Quantitative Researcher, Mauricio Bouabci at Insight Investment, as well as Professor Philip Treleaven at UCL.