Scrapy Seed Loaders

Frontera has some built-in Scrapy middlewares for seed loading.

Seed loaders use the process_start_requests method to generate requests from a source that are added later to the FrontierManager.

Activating a Seed loader

Just add the Seed Loader middleware to the SPIDER_MIDDLEWARES scrapy settings:

    'frontera.contrib.scrapy.middlewares.seeds.FileSeedLoader': 650


Load seed URLs from a file. The file must be formatted contain one URL per line:

Yo can disable URLs using the # character:



  • SEEDS_SOURCE: Path to the seeds file


Load seeds from a file stored in an Amazon S3 bucket

File format should the same one used in FileSeedLoader.


  • SEEDS_SOURCE: Path to S3 bucket file. eg: s3://some-project/seed-urls/
  • SEEDS_AWS_ACCESS_KEY: S3 credentials Access Key
  • SEEDS_AWS_SECRET_ACCESS_KEY: S3 credentials Secret Access Key