Skip to content

Commit

Permalink
Merge pull request #158 from fako/develop
Browse files Browse the repository at this point in the history
Develop
  • Loading branch information
fako authored May 10, 2021
2 parents f4f2566 + 87fad76 commit 1923e43
Show file tree
Hide file tree
Showing 168 changed files with 2,942 additions and 5,065 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ datascope.iml
*.pyc
*.pkl
*.mo
celerybeat-schedule.db
src/celerybeat-schedule

venv/

Expand Down
18 changes: 16 additions & 2 deletions deploy/server/nginx/localhost.conf
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,33 @@ server {
listen 80;

# DJANGO
location /api {
location / {
include /etc/nginx/uwsgi-pass.conf;
}
location /data {
location /api {
include /etc/nginx/uwsgi-pass.conf;
}
location /admin {
include /etc/nginx/uwsgi-pass.conf;
}
# Legacy
location /data {
include /etc/nginx/uwsgi-pass.conf;
}
# App routes
location /static/apps/promo {
rewrite ^/static/apps/promo/(.+)$ /static/apps/gff/$1 break;
include /etc/nginx/uwsgi-pass.conf;
}
location /globe-scope/views {
rewrite ^/(.+)$ /static/apps/$1 break;
include /etc/nginx/uwsgi-pass.conf;
}
location /globe-scope/images {
rewrite ^/(.+)$ /static/apps/$1 break;
include /etc/nginx/uwsgi-pass.conf;
}
# Generic static files
location /static {
include /etc/nginx/uwsgi-pass.conf;
}
Expand Down
2 changes: 1 addition & 1 deletion src/apps/sites/datagrowth/.buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 62ae7c98848b4a025bfdc4f9998d6dd4
config: a2aa043345fbd75ed534cb34691575bd
tags: 645f666f9bcd5a90fca523b33c5a78b7
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,6 @@ The first type of processors that Datagrowth ships with is processors that handl
.. include:: custom.inc.rst


.. _configuration_getting_started: ../configuration
.. _configuration_getting_started: ../configuration/index.html

__ configuration_getting_started_

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@

Configuration
-------------

You can adjust how a ``Resource`` retrieves data by using some configuration options.
See the `configuration`__ section to learn more on how to set configuration defaults.
Here we'll be explaining the available configurations by setting them directly only.

.. _configuration_getting_started: ../configuration/index.html

__ configuration_getting_started_


Caching behaviour
*****************

An important aspect about ``Resource`` is that it will act as a cache if retrieving data was successful.
There are a few configuration options that modify the cache behaviour. All examples below use a namespace of "global" ::

from example import MyResource

# This configuration disables all cache.
# It still stores the Resource, but it will never get used twice.
MyResource(config={
"purge_immediately": True
})

# For more fine grained control the purge_after configuration can be used
MyResource(config={
"purge_after": {
"days": 30
}
})
# Such a configuration will indicate to Datagrowth that the Resource
# should not be used as cache after 30 days.
# The value of purge_after can be any dict that gets accepted as kwargs to Python's timedelta.
# This makes it possible to be very flexible about when a Resource
# should not get used anymore, but it won't delete any Resources.
# Datagrowth just doesn't use them as cache after the specified time.

# Sometimes getting data from a Resource is very computation intensive.
# In such cases it might be a good idea to never actually retrieve data
# unless it is cached by a background process.
# By using the cache_only configuration you can force a Resource
# to only return if there is a cached result and to never start real data retrieval.
resource = MyResource(config={
"cache_only": True
})
resource.get() # this never makes a real request


User Agent configuration
************************

This configuration is only useful for ``HttpResource`` and child classes. It uses the "global" namespace ::

from example import MyResource

# This configuration sets the user agent for any request made by the Resource.
MyResource(config={
"user_agent": "My custom crawler User Agent"
})
2 changes: 2 additions & 0 deletions src/apps/sites/datagrowth/_sources/resources/index.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,5 @@ The Resource makes connecting to data sources easier because:
.. include:: http.inc.rst

.. include:: shell.inc.rst

.. include:: configuration.inc.rst
2 changes: 1 addition & 1 deletion src/apps/sites/datagrowth/_static/css/badge_only.css

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 1923e43

Please sign in to comment.