Settings¶
The Frontera settings allows you to customize the behaviour of all components, including the FrontierManager, Middleware and Backend themselves.
The infrastructure of the settings provides a global namespace of key-value mappings that can be used to pull configuration values from. The settings can be populated through different mechanisms, which are described below.
For a list of available built-in settings see: Built-in settings reference.
Designating the settings¶
When you use Frontera, you have to tell it which settings you’re using. As FrontierManager is the main entry point to Frontier usage, you can do this by using the method described in the Loading from settings section.
When using a string path pointing to a settings file for the frontier we propose the following directory structure:
my_project/
frontier/
__init__.py
settings.py
middlewares.py
backends.py
...
These are basically:
- frontier/settings.py: the frontier settings file.
- frontier/middlewares.py: the middlewares used by the frontier.
- frontier/backends.py: the backend(s) used by the frontier.
How to access settings¶
Settings can be accessed through the FrontierManager.settings attribute, that is passed to Middleware.from_manager and Backend.from_manager class methods:
class MyMiddleware(Component):
@classmethod
def from_manager(cls, manager):
manager = crawler.settings
if settings.TEST_MODE:
print "test mode is enabled!"
In other words, settings can be accessed as attributes of the Settings object.
Settings class¶
Built-in frontier settings¶
Here’s a list of all available Frontera settings, in alphabetical order, along with their default values and the scope where they apply.
AUTO_START¶
Default: True
Whether to enable frontier automatic start. See Starting/Stopping the frontier
BACKEND¶
Default: 'frontera.contrib.backends.memory.FIFO'
The Backend to be used by the frontier. For more info see Activating a backend.
EVENT_LOGGER¶
Default: 'frontera.logger.events.EventLogManager'
The EventLoggerManager class to be used by the Frontier.
MAX_NEXT_REQUESTS¶
Default: 0
The maximum number of requests returned by get_next_requests API method. If value is 0 (default), no maximum value will be used.
MAX_REQUESTS¶
Default: 0
Maximum number of returned requests after which Frontera is finished. If value is 0 (default), the frontier will continue indefinitely. See Finishing the frontier.
MIDDLEWARES¶
A list containing the middlewares enabled in the frontier. For more info see Activating a middleware.
Default:
[
'frontera.contrib.middlewares.fingerprint.UrlFingerprintMiddleware',
]
REQUEST_MODEL¶
Default: 'frontera.core.models.Request'
The Request model to be used by the frontier.
RESPONSE_MODEL¶
Default: 'frontera.core.models.Response'
The Response model to be used by the frontier.
OVERUSED_SLOT_FACTOR¶
Default: 5.0
(in progress + queued requests in that slot) / max allowed concurrent downloads per slot before slot is considered overused. This affects only Scrapy scheduler.”
DELAY_ON_EMPTY¶
Default: 30.0
When backend has no requests to fetch, this delay helps to exhaust the rest of the buffer without hitting backend on every request. Increase it if calls to your backend is taking a lot of time, and decrease if you need a fast spider bootstrap from seeds.
Built-in fingerprint middleware settings¶
Settings used by the UrlFingerprintMiddleware and DomainFingerprintMiddleware.
URL_FINGERPRINT_FUNCTION¶
Default: frontera.utils.fingerprint.sha1
The function used to calculate the url fingerprint.
DOMAIN_FINGERPRINT_FUNCTION¶
Default: frontera.utils.fingerprint.sha1
The function used to calculate the domain fingerprint.
Default settings¶
If no settings are specified, frontier will use the built-in default ones. For a complete list of default values see: Built-in settings reference. All default settings can be overridden.
Frontier default settings¶
Values:
PAGE_MODEL = 'frontera.core.models.Page'
LINK_MODEL = 'frontera.core.models.Link'
FRONTIER = 'frontera.core.frontier.Frontier'
MIDDLEWARES = [
'frontera.contrib.middlewares.fingerprint.UrlFingerprintMiddleware',
]
BACKEND = 'frontera.contrib.backends.memory.FIFO'
TEST_MODE = False
MAX_PAGES = 0
MAX_NEXT_PAGES = 0
AUTO_START = True
Fingerprints middleware default settings¶
Values:
URL_FINGERPRINT_FUNCTION = 'frontera.utils.fingerprint.sha1'
DOMAIN_FINGERPRINT_FUNCTION = 'frontera.utils.fingerprint.sha1'
Logging default settings¶
Values:
LOGGER = 'frontera.logger.FrontierLogger'
LOGGING_ENABLED = True
LOGGING_EVENTS_ENABLED = False
LOGGING_EVENTS_INCLUDE_METADATA = True
LOGGING_EVENTS_INCLUDE_DOMAIN = True
LOGGING_EVENTS_INCLUDE_DOMAIN_FIELDS = ['name', 'netloc', 'scheme', 'sld', 'tld', 'subdomain']
LOGGING_EVENTS_HANDLERS = [
"frontera.logger.handlers.COLOR_EVENTS",
]
LOGGING_MANAGER_ENABLED = False
LOGGING_MANAGER_LOGLEVEL = logging.DEBUG
LOGGING_MANAGER_HANDLERS = [
"frontera.logger.handlers.COLOR_CONSOLE_MANAGER",
]
LOGGING_BACKEND_ENABLED = False
LOGGING_BACKEND_LOGLEVEL = logging.DEBUG
LOGGING_BACKEND_HANDLERS = [
"frontera.logger.handlers.COLOR_CONSOLE_BACKEND",
]
LOGGING_DEBUGGING_ENABLED = False
LOGGING_DEBUGGING_LOGLEVEL = logging.DEBUG
LOGGING_DEBUGGING_HANDLERS = [
"frontera.logger.handlers.COLOR_CONSOLE_DEBUGGING",
]
EVENT_LOG_MANAGER = 'frontera.logger.events.EventLogManager'