Dag runs daily at midnight and crawls all the news articles posted on the day.
- Run database migrations and create first user account:
docker-compose up airflow-init
- Start all services:
docker-compose up
- The webserver available at:
http://localhost:8080
. The default account has the loginairflow
and the passwordairflow
.
For detailed info: https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html
Currently it is integrated with news-crawler and runs a docker container in AWS ECS.
There are two dags in /dags
folder:
news_crawler_dag
is scheduled to run daily starting fromJune 1st 2021
news_crawler_historical_dag
processes historical data fromJan 1st 2014
toMay 31st 2021
.