- The user should have a way to download the latest complete dataset of characters from the API by clicking on a button, the collected and transformed data should be stored as a CSV file in the file system. Metadata for downloaded datasets (e.g. filename, date, etc.) should be stored inside the database. Fetching and transformations should be implemented efficiently , minimize the amount of requests, your app should be able to process large amounts of data .
- Add a date column ( %Y-%m-%d ) based on edited date Resolve the homeworld field into the homeworld's name ( /planets/1/ -> Tatooine ) Fields referencing different resources and date fields other than date/birth_year can be dropped
- The user should be able to inspect all previously downloaded datasets, as well as do simple exploratory operations on it.
- By default the table should only show the first 10 rows of the dataset, by clicking on a button “ Load more ” additionally 10 rows should be shown - reloading the page is fine.
- Provide the functionality to count the occurrences of values (combination of values) for columns. For example when selecting the columns date and homeworld the table should show the counts as follows:
- Django and DRF
- Celery and Redis
- petl
- pytest (WIP)
- Asincio, aiohttp and httpx(WIP)
- Amazon S3 or Minio(WIP)
- Create virtual envinronment
python3.9 -m venv venv
- Activate virtual envinronment
source /venv/bin/activate
- Install requirements
pip install -r requirements.txt
- Create .env file base on .env.dev
- Export environment variables in your current shell
source docker/export.sh
- Start Django application
./docker/start.sh
- Run application
docker-compose up app
- Start flower to check celery tasks
docker-compose up flower
Aiohttp AsyncAPIClient in the file external_api/starwars_api/api.py
Tips:
- After timeout, pending tasks will be canceled inside execute() function.
- Retry tasks. Just save retry_tasks_list and wait() them again.
The old sync version of API can be found in feature/sync_api branch
- Write more tests using pytest
- Read and write very large files using custom indexes and .seek()
- Add documentation using sphinx
- Minio or AWS S3 integration for saving files