Time series analysis involves studying a sequence of data points collected or recorded over time. The goal is often to forecast future data based on historical trends. Applications include:
- Weather Monitoring: Daily, hourly, or weekly weather data.
- Performance Tracking: Changes in application performance.
- Medical Devices: Real-time vitals monitoring.
This project focuses on using the ARCH (Autoregressive Conditional Heteroskedasticity) model, introduced by Engle in 1982, to capture conditional variance in time series data. Key concepts include:
- Autoregressive: Current values depend on past values.
- Conditional Variance: Variance depends on past errors.
- Heteroskedasticity: Time series exhibits changing variance over time.
The GARCH (Generalized Autoregressive Conditional Heteroskedasticity) model extends ARCH by incorporating past variances and squared residuals for better volatility modeling.
This project also incorporates MLOps (Machine Learning Operations) by deploying the ARCH model on the AWS platform, emphasizing cost optimization and scalability.
The dataset, titled "Call-centers," records monthly data for various domains (e.g., healthcare, telecom, banking). Key features include:
- Month: Time period.
- Domains: Call categories (e.g., healthcare, telecom).
- External Regressors: Number of channels and phone lines for traffic prediction and resource availability.
Dimensions: 132 rows × 8 columns.
- Build ARCH and GARCH models to analyze and forecast time series data.
- Create an MLOps pipeline to deploy the ARCH model on AWS, ensuring scalability and cost efficiency.
- Programming Language: Python
- Libraries:
Flask
for building the web application.pickle
for model serialization.pandas
andnumpy
for data manipulation.matplotlib
andseaborn
for data visualization.statsmodels
for statistical modeling.arch
for ARCH/GARCH models.scipy
for scientific computing.
- AWS Services: Amazon EC2, AWS Lightsail, Docker.
-
Data Preprocessing:
- Set the date as the index.
- Adjust the frequency to monthly.
-
Exploratory Data Analysis (EDA):
- Visualize the data for trends and seasonality.
-
Model Building:
- Train ARCH models with varying lags.
- Extend to GARCH models for better volatility forecasting.
-
Forecasting:
- Use the best model to forecast time series data.
-
Model Saving:
- Serialize the ARCH model using
pickle
.
- Serialize the ARCH model using
-
Flask Application:
- Create a RESTful API for serving predictions.
-
AWS Deployment:
- Set up an EC2 instance and configure it for deployment.
- Use Docker for containerization.
-
Lightsail Deployment:
- Optimize the deployment process using AWS Lightsail.
-
Model Creation:
- Save the trained model in
.pkl
format.
- Save the trained model in
-
Flask App Development:
- Build a Flask application to serve predictions.
-
AWS EC2 Setup:
- Create and configure an EC2 instance.
- Install required tools using
install-docker.sh
andinstall-aws-cli.sh
.
-
Code Upload:
- Use Cloud Shell or an S3 bucket to upload files to the EC2 instance.
-
Dockerization:
- Build and run a Docker container for the application.
-
Lightsail Configuration:
- Refer to
lightsail-deployment.md
for step-by-step deployment instructions.
- Refer to
.
├── data/ # Input dataset (CallCenterData.xlsx).
├── MLPipeline/ # Python scripts for preprocessing and modeling.
├── notebooks/ # IPython notebook for ARCH model.
├── output/ # Serialized models and results.
├── app.py # Flask app for serving predictions.
├── Dockerfile # Docker image configuration.
├── engine.py # Orchestrates MLPipeline functions.
├── install-aws-cli.sh # Steps for AWS CLI installation.
├── install-docker.sh # Steps for Docker installation.
├── install-lightsail-cli.sh # Steps for Lightsail installation.
├── lightsail-deployment.md # Lightsail deployment instructions.
├── requirements.txt # List of dependencies.
└── README.md # Project documentation.
git clone <repository_url>
cd <repository_folder>
Install the required Python libraries using:
pip install -r requirements.txt
Execute the pipeline by running the engine.py
script:
python engine.py
Run the Flask application locally:
python app.py
- Volatility Forecasting:
- Accurate forecasting of time series data using ARCH and GARCH models.
- Scalable Deployment:
- Efficient deployment on AWS using Docker and Lightsail.
- User-Friendly API:
- Flask app for easy interaction with the model.
- Advanced Modeling: Leverages ARCH and GARCH models for robust time series analysis.
- MLOps Integration: Combines machine learning with scalable AWS deployment.
- Practical Application: Ideal for real-world scenarios involving volatility forecasting and cloud deployment.
Contributions are welcome! To contribute:
- Fork the repository.
- Create a feature branch:
git checkout -b feature-name
- Commit your changes:
git commit -m "Add feature"
- Push your branch:
git push origin feature-name
- Open a pull request.
This project is licensed under the MIT License. See the LICENSE
file for details.
For any questions or suggestions, please reach out to:
- Name: Abhinav Navneet
- Email: mailme.AbhinavN@gmail.com
- GitHub: AjNavneet
Special thanks to:
- ARCH Documentation for guidance on ARCH/GARCH modeling.
- Flask for API development.
- AWS for cloud deployment tools.
- Docker for containerization.
- The Python open-source community for their invaluable tools and resources.