Aggregates news from the following sources:
- habr
- vc
- Django
- Celery (celery-beat; django-celery-results)
- RabbitMQ
- Posgress
- JS (mainly for ajax requests)
- Django Rest Framework
- Selenium
- Scraping data from habr: top and user's feed (for this one you'll need to provide credentials)
- Scraping data from vc: top and users' feed (for this one you'll need to provide credentials)
- Using celery-beat we can schedule automatic users' feed scraping (default = 5 minutes)
1. docker-compose up => should work right away, but make sure to add your env machine ip to ALLOWED_HOST
2. configurable: crontab() and
scrapers.HabrScraper.scrap_feed(..., pages=3) how many pages you want to scrap from user feed (as far as I'm concerned, habr gives me 50 available pages by deafult, but I reduced that amount based on test reasons)
scrapers.HabrScraper.scrap_feed* more precisely, .scrap_top(), yes, it's a little bit messed up for now, the point being is, if you want to change how many times the page should be scrolled down option, you wanna find the line: return self.scrap_top(times=5, user_feed=True) and change times argument.