Scrapy Seed Loaders¶

Frontera has some built-in Scrapy middlewares for seed loading.

Seed loaders use the process_start_requests method to generate requests from a source that are added later to the FrontierManager.

Activating a Seed loader¶

Just add the Seed Loader middleware to the SPIDER_MIDDLEWARES scrapy settings:

SPIDER_MIDDLEWARES.update({
    'frontera.contrib.scrapy.middlewares.seeds.FileSeedLoader': 650
})

Load seed URLs from a file. The file must be formatted contain one URL per line:

http://www.asite.com
http://www.anothersite.com
...

Yo can disable URLs using the # character:

...
#http://www.acommentedsite.com
...

Settings:

Load seeds from a file stored in an Amazon S3 bucket

File format should the same one used in FileSeedLoader.

Settings: