Message bus

Is the transport layer abstraction mechanism. It provides interface and several implementations. Only one message bus can be used in crawler at the time, and it’s selected with MESSAGE_BUS vim maksetting.

Spiders process can use

to communicate using message bus.

Built-in message bus reference

ZeroMQ

It’s the default option, implemented using lightweight ZeroMQ library in

and can be configured using ZeroMQ message bus settings.

ZeroMQ message bus requires installed ZeroMQ library and running broker process, see Start cluster.

WARNING! ZeroMQ message bus doesn’t support yet multiple SW and DB workers, only one instance of each worker type is allowed.

Kafka

Can be selected with

and configured using Kafka message bus settings.

Requires running Kafka service and more suitable for large-scale web crawling.

Protocol

Depending on stream Frontera is using several message types to code it’s messages. Every message is a python native object serialized using msgpack (also JSON is available, but needs to be selected in code manually).

Here are the classes needed to subclass to implement own codec: