-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #158 from fako/develop
Develop
- Loading branch information
Showing
168 changed files
with
2,942 additions
and
5,065 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,7 +5,7 @@ datascope.iml | |
*.pyc | ||
*.pkl | ||
*.mo | ||
celerybeat-schedule.db | ||
src/celerybeat-schedule | ||
|
||
venv/ | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
# Sphinx build info version 1 | ||
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. | ||
config: 62ae7c98848b4a025bfdc4f9998d6dd4 | ||
config: a2aa043345fbd75ed534cb34691575bd | ||
tags: 645f666f9bcd5a90fca523b33c5a78b7 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
5 changes: 0 additions & 5 deletions
5
src/apps/sites/datagrowth/_sources/processors/usage.inc.rst.txt
This file was deleted.
Oops, something went wrong.
62 changes: 62 additions & 0 deletions
62
src/apps/sites/datagrowth/_sources/resources/configuration.inc.rst.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
|
||
Configuration | ||
------------- | ||
|
||
You can adjust how a ``Resource`` retrieves data by using some configuration options. | ||
See the `configuration`__ section to learn more on how to set configuration defaults. | ||
Here we'll be explaining the available configurations by setting them directly only. | ||
|
||
.. _configuration_getting_started: ../configuration/index.html | ||
|
||
__ configuration_getting_started_ | ||
|
||
|
||
Caching behaviour | ||
***************** | ||
|
||
An important aspect about ``Resource`` is that it will act as a cache if retrieving data was successful. | ||
There are a few configuration options that modify the cache behaviour. All examples below use a namespace of "global" :: | ||
|
||
from example import MyResource | ||
|
||
# This configuration disables all cache. | ||
# It still stores the Resource, but it will never get used twice. | ||
MyResource(config={ | ||
"purge_immediately": True | ||
}) | ||
|
||
# For more fine grained control the purge_after configuration can be used | ||
MyResource(config={ | ||
"purge_after": { | ||
"days": 30 | ||
} | ||
}) | ||
# Such a configuration will indicate to Datagrowth that the Resource | ||
# should not be used as cache after 30 days. | ||
# The value of purge_after can be any dict that gets accepted as kwargs to Python's timedelta. | ||
# This makes it possible to be very flexible about when a Resource | ||
# should not get used anymore, but it won't delete any Resources. | ||
# Datagrowth just doesn't use them as cache after the specified time. | ||
|
||
# Sometimes getting data from a Resource is very computation intensive. | ||
# In such cases it might be a good idea to never actually retrieve data | ||
# unless it is cached by a background process. | ||
# By using the cache_only configuration you can force a Resource | ||
# to only return if there is a cached result and to never start real data retrieval. | ||
resource = MyResource(config={ | ||
"cache_only": True | ||
}) | ||
resource.get() # this never makes a real request | ||
|
||
|
||
User Agent configuration | ||
************************ | ||
|
||
This configuration is only useful for ``HttpResource`` and child classes. It uses the "global" namespace :: | ||
|
||
from example import MyResource | ||
|
||
# This configuration sets the user agent for any request made by the Resource. | ||
MyResource(config={ | ||
"user_agent": "My custom crawler User Agent" | ||
}) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Oops, something went wrong.