Frontera
v0.8.0.1
Frontera at a glance
Run modes
Quick start single process
Quick start distributed mode
Cluster setup guide
Installation Guide
Crawling strategies
Frontier objects
Middlewares
Canonical URL Solver
Backends
Message bus
Writing custom crawling strategy
Using the Frontier with Scrapy
Settings
What is a Crawl Frontier?
Graph Manager
Recording a Scrapy crawl
Fine tuning of Frontera cluster
DNS Service
Architecture overview
Frontera API
Using the Frontier with Requests
Examples
Tests
Logging
Testing a Frontier
Contribution guidelines
Glossary
Frontera
Docs
»
Index
Edit on GitHub
Index
_
|
A
|
B
|
C
|
D
|
E
|
F
|
G
|
H
|
I
|
K
|
L
|
M
|
N
|
O
|
P
|
Q
|
R
|
S
|
T
|
U
|
Z
_
__contains__() (frontera.core.components.DomainMetadata method)
__delitem__() (frontera.core.components.DomainMetadata method)
__getitem__() (frontera.core.components.DomainMetadata method)
__setitem__() (frontera.core.components.DomainMetadata method)
A
AUTO_START
setting
B
BACKEND
setting
Backend (class in frontera.core.components)
BaseCrawlingStrategy (class in frontera.strategy)
BaseDecoder (class in frontera.core.codec)
BaseEncoder (class in frontera.core.codec)
BasicCanonicalSolver (class in frontera.contrib.canonicalsolvers.basic)
BC_MAX_REQUESTS_PER_HOST
setting
BC_MIN_HOSTS
setting
BC_MIN_REQUESTS
setting
body (frontera.core.models.Request attribute)
(frontera.core.models.Response attribute)
C
CANONICAL_SOLVER
setting
close() (frontera.strategy.BaseCrawlingStrategy method)
Component (class in frontera.core.components)
cookies (frontera.core.models.Request attribute)
count() (frontera.core.components.Queue method)
crawling strategy
CrawlPage (built-in class)
create_request() (frontera.strategy.BaseCrawlingStrategy method)
D
db worker
db_worker() (frontera.core.components.DistributedBackend class method)
decode() (frontera.core.codec.BaseDecoder method)
decode_request() (frontera.core.codec.BaseDecoder method)
DELAY_ON_EMPTY
setting
DISCOVERY_MAX_PAGES
setting
DistributedBackend (class in frontera.core.components)
DOMAIN_FINGERPRINT_FUNCTION
setting
DOMAIN_STATS_LOG_INTERVAL
setting
DomainFingerprintMiddleware (class in frontera.contrib.middlewares.fingerprint)
DomainMetadata (class in frontera.core.components)
DomainMiddleware (class in frontera.contrib.middlewares.domain)
E
encode_new_job_id() (frontera.core.codec.BaseEncoder method)
encode_offset() (frontera.core.codec.BaseEncoder method)
encode_page_crawled() (frontera.core.codec.BaseEncoder method)
encode_request() (frontera.core.codec.BaseEncoder method)
encode_request_error() (frontera.core.codec.BaseEncoder method)
encode_update_score() (frontera.core.codec.BaseEncoder method)
F
fetch() (frontera.core.components.States method)
filter_extracted_links() (frontera.strategy.BaseCrawlingStrategy method)
finished() (frontera.core.components.Backend method)
(frontera.strategy.BaseCrawlingStrategy method)
flush() (frontera.core.components.States method)
from_manager() (frontera.core.components.Backend class method)
(frontera.core.components.Component class method)
(frontera.core.components.Middleware class method)
from_worker() (frontera.strategy.BaseCrawlingStrategy class method)
frontera.contrib.backends.remote.codecs.json (module)
frontera.contrib.backends.remote.codecs.msgpack (module)
FRONTERA_SETTINGS
setting
frontier_start() (frontera.core.components.Backend method)
(frontera.core.components.Component method)
(frontera.core.components.Middleware method)
frontier_stop() (frontera.core.components.Backend method)
(frontera.core.components.Component method)
(frontera.core.components.Middleware method)
G
get_next_requests() (frontera.core.components.Backend method)
(frontera.core.components.Queue method)
H
HBASE_BATCH_SIZE
setting
HBASE_DOMAIN_METADATA_BATCH_SIZE
setting
HBASE_DOMAIN_METADATA_CACHE_SIZE
setting
HBASE_DOMAIN_METADATA_TABLE
setting
HBASE_DROP_ALL_TABLES
setting
HBASE_METADATA_TABLE
setting
HBASE_NAMESPACE
setting
HBASE_QUEUE_TABLE
setting
HBASE_STATE_CACHE_SIZE_LIMIT
setting
HBASE_STATES_TABLE
setting
HBASE_THRIFT_HOST
setting
HBASE_THRIFT_PORT
setting
HBASE_USE_FRAMED_COMPACT
setting
HBASE_USE_SNAPPY
setting
headers (frontera.core.models.Request attribute)
(frontera.core.models.Response attribute)
hostname_local_fingerprint() (in module frontera.utils.fingerprint)
I
id (CrawlPage attribute)
is_seed (CrawlPage attribute)
K
KAFKA_CERT_PATH
setting
KAFKA_CODEC
setting
KAFKA_ENABLE_SSL
setting
KAFKA_GET_TIMEOUT
setting
KAFKA_LOCATION
setting
L
links (CrawlPage attribute)
links_extracted() (frontera.strategy.BaseCrawlingStrategy method)
LOCAL_MODE
setting
LOGGING_CONFIG
setting
M
MAX_NEXT_REQUESTS
setting
MAX_REQUESTS
setting
message bus
MESSAGE_BUS
setting
MESSAGE_BUS_CODEC
setting
MessageBusBackend (class in frontera.contrib.backends.remote.messagebus)
meta (frontera.core.models.Request attribute)
(frontera.core.models.Response attribute)
Metadata (class in frontera.core.components)
metadata (frontera.core.components.Backend attribute)
method (frontera.core.models.Request attribute)
Middleware (class in frontera.core.components)
MIDDLEWARES
setting
N
name (frontera.core.components.Component attribute)
NEW_BATCH_DELAY
setting
O
OVERUSED_KEEP_KEYS
setting
OVERUSED_KEEP_PER_KEY
setting
OVERUSED_MAX_KEYS
setting
OVERUSED_MAX_PER_KEY
setting
OVERUSED_SLOT_FACTOR
setting
P
page_crawled() (frontera.core.components.Backend method)
(frontera.core.components.Component method)
(frontera.core.components.Metadata method)
(frontera.core.components.Middleware method)
(frontera.strategy.BaseCrawlingStrategy method)
Q
Queue (class in frontera.core.components)
queue (frontera.core.components.Backend attribute)
R
read_seeds() (frontera.strategy.BaseCrawlingStrategy method)
RECORDER_ENABLED
setting
RECORDER_STORAGE_CLEAR_CONTENT
setting
RECORDER_STORAGE_DROP_ALL_TABLES
setting
RECORDER_STORAGE_ENGINE
setting
referers (CrawlPage attribute)
refresh_states() (frontera.strategy.BaseCrawlingStrategy method)
Request (class in frontera.core.models)
request (frontera.core.models.Response attribute)
request_error() (frontera.core.components.Backend method)
(frontera.core.components.Component method)
(frontera.core.components.Metadata method)
(frontera.core.components.Middleware method)
(frontera.strategy.BaseCrawlingStrategy method)
REQUEST_MODEL
setting
Response (class in frontera.core.models)
RESPONSE_MODEL
setting
S
schedule() (frontera.core.components.Queue method)
(frontera.strategy.BaseCrawlingStrategy method)
scoring log
SCORING_LOG_CONSUMER_BATCH_SIZE
setting
SCORING_LOG_DBW_GROUP
setting
SCORING_LOG_TOPIC
setting
SCORING_PARTITION_ID
setting
set_states() (frontera.core.components.States method)
setting
AUTO_START
BACKEND
BC_MAX_REQUESTS_PER_HOST
BC_MIN_HOSTS
BC_MIN_REQUESTS
CANONICAL_SOLVER
DELAY_ON_EMPTY
DISCOVERY_MAX_PAGES
DOMAIN_FINGERPRINT_FUNCTION
DOMAIN_STATS_LOG_INTERVAL
FRONTERA_SETTINGS
HBASE_BATCH_SIZE
HBASE_DOMAIN_METADATA_BATCH_SIZE
HBASE_DOMAIN_METADATA_CACHE_SIZE
HBASE_DOMAIN_METADATA_TABLE
HBASE_DROP_ALL_TABLES
HBASE_METADATA_TABLE
HBASE_NAMESPACE
HBASE_QUEUE_TABLE
HBASE_STATES_TABLE
HBASE_STATE_CACHE_SIZE_LIMIT
HBASE_THRIFT_HOST
HBASE_THRIFT_PORT
HBASE_USE_FRAMED_COMPACT
HBASE_USE_SNAPPY
KAFKA_CERT_PATH
KAFKA_CODEC
KAFKA_ENABLE_SSL
KAFKA_GET_TIMEOUT
KAFKA_LOCATION
LOCAL_MODE
LOGGING_CONFIG
MAX_NEXT_REQUESTS
MAX_REQUESTS
MESSAGE_BUS
MESSAGE_BUS_CODEC
MIDDLEWARES
NEW_BATCH_DELAY
OVERUSED_KEEP_KEYS
OVERUSED_KEEP_PER_KEY
OVERUSED_MAX_KEYS
OVERUSED_MAX_PER_KEY
OVERUSED_SLOT_FACTOR
RECORDER_ENABLED
RECORDER_STORAGE_CLEAR_CONTENT
RECORDER_STORAGE_DROP_ALL_TABLES
RECORDER_STORAGE_ENGINE
REQUEST_MODEL
RESPONSE_MODEL
SCORING_LOG_CONSUMER_BATCH_SIZE
SCORING_LOG_DBW_GROUP
SCORING_LOG_TOPIC
SCORING_PARTITION_ID
SPIDER_FEED_GROUP
SPIDER_FEED_PARTITIONS
SPIDER_FEED_TOPIC
SPIDER_LOG_CONSUMER_BATCH_SIZE
SPIDER_LOG_DBW_GROUP
SPIDER_LOG_PARTITIONS
SPIDER_LOG_SW_GROUP
SPIDER_LOG_TOPIC
SPIDER_PARTITION_ID
SQLALCHEMYBACKEND_CACHE_SIZE
SQLALCHEMYBACKEND_CLEAR_CONTENT
SQLALCHEMYBACKEND_DROP_ALL_TABLES
SQLALCHEMYBACKEND_ENGINE
SQLALCHEMYBACKEND_ENGINE_ECHO
SQLALCHEMYBACKEND_MODELS
SQLALCHEMYBACKEND_REVISIT_INTERVAL
STATE_CACHE_SIZE
STORE_CONTENT
STRATEGY
STRATEGY_ARGS
SW_FLUSH_INTERVAL
TEST_MODE
TLDEXTRACT_DOMAIN_INFO
URL_FINGERPRINT_FUNCTION
USER_AGENT
ZMQ_ADDRESS
ZMQ_BASE_PORT
Settings (class in frontera.settings)
spider
spider feed
spider log
SPIDER_FEED_GROUP
setting
SPIDER_FEED_PARTITIONS
setting
SPIDER_FEED_TOPIC
setting
SPIDER_LOG_CONSUMER_BATCH_SIZE
setting
SPIDER_LOG_DBW_GROUP
setting
SPIDER_LOG_PARTITIONS
setting
SPIDER_LOG_SW_GROUP
setting
SPIDER_LOG_TOPIC
setting
SPIDER_PARTITION_ID
setting
SQLALCHEMYBACKEND_CACHE_SIZE
setting
SQLALCHEMYBACKEND_CLEAR_CONTENT
setting
SQLALCHEMYBACKEND_DROP_ALL_TABLES
setting
SQLALCHEMYBACKEND_ENGINE
setting
SQLALCHEMYBACKEND_ENGINE_ECHO
setting
SQLALCHEMYBACKEND_MODELS
setting
SQLALCHEMYBACKEND_REVISIT_INTERVAL
setting
state cache
STATE_CACHE_SIZE
setting
States (class in frontera.core.components)
states (frontera.core.components.Backend attribute)
status (CrawlPage attribute)
status_code (frontera.core.models.Response attribute)
STORE_CONTENT
setting
STRATEGY
setting
strategy worker
STRATEGY_ARGS
setting
strategy_worker() (frontera.core.components.DistributedBackend class method)
SW_FLUSH_INTERVAL
setting
T
TEST_MODE
setting
TLDEXTRACT_DOMAIN_INFO
setting
U
update_cache() (frontera.core.components.States method)
url (CrawlPage attribute)
(frontera.core.models.Request attribute)
(frontera.core.models.Response attribute)
URL_FINGERPRINT_FUNCTION
setting
UrlFingerprintMiddleware (class in frontera.contrib.middlewares.fingerprint)
USER_AGENT
setting
Z
ZMQ_ADDRESS
setting
ZMQ_BASE_PORT
setting
Read the Docs
v: v0.8.0.1
Versions
latest
stable
v0.8.0.1
v0.8.0
v0.7.1
v0.7.0
v0.6.0
v0.5.3
v0.5.2
v0.5.0
v0.4.2
v0.4.1
v0.4.0
v0.3.0
v0.2.0
master
Downloads
On Read the Docs
Project Home
Builds
Free document hosting provided by
Read the Docs
.