Frontera 0.6 documentation

Frontera is a web crawling tool box, allowing to build crawlers of any scale and purpose.

Frontera provides crawl frontier framework by managing when and what to crawl next, and checking for crawling goal accomplishment.

Frontera also provides replication, sharding and isolation of all crawler components to scale and distribute it.

Frontera contain components to allow creation of fully-operational web crawler with Scrapy. Even though it was originally designed for Scrapy, it can also be used with any other crawling framework/system as the framework offers a generic tool box.


The purpose of this chapter is to introduce you to the concepts behind Frontera so that you can get an idea of how it works and decide if it is suited to your needs.

Frontera at a glance
Understand what Frontera is and how it can help you.
Run modes
High level architecture and Frontera run modes.
Quick start single process
using Scrapy as a container for running Frontera.
Quick start distributed mode
with SQLite and ZeroMQ.
Cluster setup guide
Setting up clustered version of Frontera on multiple machines with HBase and Kafka.

Using Frontera

Installation Guide
HOWTO and Dependencies options.
Frontier objects
Understand the classes used to represent requests and responses.
Filter or alter information for links and documents.
Canonical URL Solver
Identify and make use of canonical url of document.
Define your own crawling policy and custom storage.
Message bus
Built-in message bus reference.
Crawling strategy
Implementing own crawling strategy for distributed backend.
Using the Frontier with Scrapy
Learn how to use Frontera with Scrapy.
Settings reference.

Advanced usage

What is a Crawl Frontier?
Learn Crawl Frontier theory.
Graph Manager
Define fake crawlings for websites to test your frontier.
Recording a Scrapy crawl
Create Scrapy crawl recordings and reproduce them later.
Fine tuning of Frontera cluster
Cluster deployment and fine tuning information.
DNS Service
Few words about DNS service setup.

Developer documentation

Architecture overview
See how Frontera works and its different components.
Frontera API
Learn how to use the frontier.
Using the Frontier with Requests
Learn how to use Frontera with Requests.
Some example projects and scripts using Frontera.
How to run and write Frontera tests.
A list of loggers for use with python native logging system.
Testing a Frontier
Test your frontier in an easy way.
Frequently asked questions.
Contribution guidelines
HOWTO contribute.
Glossary of terms.