Settings

The Frontera settings allows you to customize the behaviour of all components, including the FrontierManager, Middleware and Backend themselves.

The infrastructure of the settings provides a global namespace of key-value mappings that can be used to pull configuration values from. The settings can be populated through different mechanisms, which are described below.

For a list of available built-in settings see: Built-in settings reference.

Designating the settings

When you use Frontera, you have to tell it which settings you’re using. As FrontierManager is the main entry point to Frontier usage, you can do this by using the method described in the Loading from settings section.

When using a string path pointing to a settings file for the frontier we propose the following directory structure:

my_project/
    frontier/
        __init__.py
        settings.py
        middlewares.py
        backends.py
    ...

These are basically:

  • frontier/settings.py: the frontier settings file.
  • frontier/middlewares.py: the middlewares used by the frontier.
  • frontier/backends.py: the backend(s) used by the frontier.

How to access settings

Settings can be accessed through the FrontierManager.settings attribute, that is passed to Middleware.from_manager and Backend.from_manager class methods:

class MyMiddleware(Component):

    @classmethod
    def from_manager(cls, manager):
        manager = crawler.settings
        if settings.TEST_MODE:
            print "test mode is enabled!"

In other words, settings can be accessed as attributes of the Settings object.

Settings class

class frontera.settings.Settings(module=None, attributes=None)

An object that holds frontier settings values.

Parameters:
  • module (object/string) – A Settings object or a path string.
  • attributes (dict) – A dict object containing the settings values.

Built-in frontier settings

Here’s a list of all available Frontera settings, in alphabetical order, along with their default values and the scope where they apply.

AUTO_START

Default: True

Whether to enable frontier automatic start. See Starting/Stopping the frontier

BACKEND

Default: 'frontera.contrib.backends.memory.FIFO'

The Backend to be used by the frontier. For more info see Activating a backend.

EVENT_LOGGER

Default: 'frontera.logger.events.EventLogManager'

The EventLoggerManager class to be used by the Frontier.

LOGGER

Default: 'frontera.logger.FrontierLogger'

The Logger class to be used by the Frontier.

MAX_NEXT_REQUESTS

Default: 0

The maximum number of requests returned by get_next_requests API method. If value is 0 (default), no maximum value will be used.

MAX_REQUESTS

Default: 0

Maximum number of returned requests after which Frontera is finished. If value is 0 (default), the frontier will continue indefinitely. See Finishing the frontier.

MIDDLEWARES

A list containing the middlewares enabled in the frontier. For more info see Activating a middleware.

Default:

[
    'frontera.contrib.middlewares.fingerprint.UrlFingerprintMiddleware',
]

REQUEST_MODEL

Default: 'frontera.core.models.Request'

The Request model to be used by the frontier.

RESPONSE_MODEL

Default: 'frontera.core.models.Response'

The Response model to be used by the frontier.

TEST_MODE

Default: False

Whether to enable frontier test mode. See Frontier test mode

OVERUSED_SLOT_FACTOR

Default: 5.0

(in progress + queued requests in that slot) / max allowed concurrent downloads per slot before slot is considered overused. This affects only Scrapy scheduler.”

DELAY_ON_EMPTY

Default: 30.0

When backend has no requests to fetch, this delay helps to exhaust the rest of the buffer without hitting backend on every request. Increase it if calls to your backend is taking a lot of time, and decrease if you need a fast spider bootstrap from seeds.

Built-in fingerprint middleware settings

Settings used by the UrlFingerprintMiddleware and DomainFingerprintMiddleware.

URL_FINGERPRINT_FUNCTION

Default: frontera.utils.fingerprint.sha1

The function used to calculate the url fingerprint.

DOMAIN_FINGERPRINT_FUNCTION

Default: frontera.utils.fingerprint.sha1

The function used to calculate the domain fingerprint.

Default settings

If no settings are specified, frontier will use the built-in default ones. For a complete list of default values see: Built-in settings reference. All default settings can be overridden.

Frontier default settings

Values:

PAGE_MODEL = 'frontera.core.models.Page'
LINK_MODEL = 'frontera.core.models.Link'
FRONTIER = 'frontera.core.frontier.Frontier'
MIDDLEWARES = [
    'frontera.contrib.middlewares.fingerprint.UrlFingerprintMiddleware',
]
BACKEND = 'frontera.contrib.backends.memory.FIFO'
TEST_MODE = False
MAX_PAGES = 0
MAX_NEXT_PAGES = 0
AUTO_START = True

Fingerprints middleware default settings

Values:

URL_FINGERPRINT_FUNCTION = 'frontera.utils.fingerprint.sha1'
DOMAIN_FINGERPRINT_FUNCTION = 'frontera.utils.fingerprint.sha1'

Logging default settings

Values:

LOGGER = 'frontera.logger.FrontierLogger'
LOGGING_ENABLED = True

LOGGING_EVENTS_ENABLED = False
LOGGING_EVENTS_INCLUDE_METADATA = True
LOGGING_EVENTS_INCLUDE_DOMAIN = True
LOGGING_EVENTS_INCLUDE_DOMAIN_FIELDS = ['name', 'netloc', 'scheme', 'sld', 'tld', 'subdomain']
LOGGING_EVENTS_HANDLERS = [
    "frontera.logger.handlers.COLOR_EVENTS",
]

LOGGING_MANAGER_ENABLED = False
LOGGING_MANAGER_LOGLEVEL = logging.DEBUG
LOGGING_MANAGER_HANDLERS = [
    "frontera.logger.handlers.COLOR_CONSOLE_MANAGER",
]

LOGGING_BACKEND_ENABLED = False
LOGGING_BACKEND_LOGLEVEL = logging.DEBUG
LOGGING_BACKEND_HANDLERS = [
    "frontera.logger.handlers.COLOR_CONSOLE_BACKEND",
]

LOGGING_DEBUGGING_ENABLED = False
LOGGING_DEBUGGING_LOGLEVEL = logging.DEBUG
LOGGING_DEBUGGING_HANDLERS = [
    "frontera.logger.handlers.COLOR_CONSOLE_DEBUGGING",
]

EVENT_LOG_MANAGER = 'frontera.logger.events.EventLogManager'