Tests

Frontera tests are implemented using the pytest tool.

You can install pytest and the additional required libraries used in the tests using pip:

pip install -r requirements/tests.txt

Running tests

To run all tests go to the root directory of source code and run:

py.test

Writing tests

All functionality (including new features and bug fixes) must include a test case to check that it works as expected, so please include tests for your patches if you want them to get accepted sooner.

Backend testing

A base pytest class for Backend testing is provided: BackendTest

class tests.backends.BackendTest

A simple pytest base class with helper methods for Backend testing.

get_settings()

Returns backend settings

get_frontier()

Returns frontierManager object

setup_backend(method)

Setup method called before each test method call

teardown_backend(method)

Teardown method called after each test method call

Let’s say for instance that you want to test to your backend MyBackend and create a new frontier instance for each test method call, you can define a test class like this:

class TestMyBackend(backends.BackendTest):

    backend_class = 'frontera.contrib.backend.abackend.MyBackend'

    def test_one(self):
        frontier = self.get_frontier()
        ...

    def test_two(self):
        frontier = self.get_frontier()
        ...

    ...

And let’s say too that it uses a database file and you need to clean it before and after each test:

class TestMyBackend(backends.BackendTest):

    backend_class = 'frontera.contrib.backend.abackend.MyBackend'

    def setup_backend(self, method):
        self._delete_test_db()

    def teardown_backend(self, method):
        self._delete_test_db()

    def _delete_test_db(self):
        try:
            os.remove('mytestdb.db')
        except OSError:
            pass

    def test_one(self):
        frontier = self.get_frontier()
        ...

    def test_two(self):
        frontier = self.get_frontier()
        ...

    ...

Testing backend sequences

To test Backend crawling sequences you can use the BackendSequenceTest class.

class tests.backends.BackendSequenceTest

A pytest base class for testing Backend crawling sequences.

get_sequence(site_list, max_next_requests, downloader_simulator=<frontera.utils.tester.BaseDownloaderSimulator object>, frontier_tester=<class 'frontera.utils.tester.FrontierTester'>)

Returns an Frontera iteration sequence from a site list

Parameters:
  • site_list (list) – A list of sites to use as frontier seeds.
  • max_next_requests (int) – Max next requests for the frontier.
assert_sequence(site_list, expected_sequence, max_next_requests)

Asserts that crawling sequence is the one expected

Parameters:
  • site_list (list) – A list of sites to use as frontier seeds.
  • max_next_requests (int) – Max next requests for the frontier.

BackendSequenceTest class will run a complete crawl of the passed site graphs and return the sequence used by the backend for visiting the different pages.

Let’s say you want to test to a backend that sort pages using alphabetic order. You can define the following test:

class TestAlphabeticSortBackend(backends.BackendSequenceTest):

    backend_class = 'frontera.contrib.backend.abackend.AlphabeticSortBackend'

    SITE_LIST = [
        [
            ('C', []),
            ('B', []),
            ('A', []),
        ],
    ]

    def test_one(self):
        # Check sequence is the expected one
        self.assert_sequence(site_list=self.SITE_LIST,
                             expected_sequence=['A', 'B', 'C'],
                             max_next_requests=0)

    def test_two(self):
        # Get sequence and work with it
        sequence = self.get_sequence(site_list=SITE_LIST,
                            max_next_requests=0)
        assert len(sequence) > 2

    ...

Testing basic algorithms

If your backend uses any of the basic algorithms logics, you can just inherit the correponding test base class for each logic and sequences will be automatically tested for it:

from tests import backends


class TestMyBackendFIFO(backends.FIFOBackendTest):
    backend_class = 'frontera.contrib.backends.abackend.MyBackendFIFO'


class TestMyBackendLIFO(backends.LIFOBackendTest):
    backend_class = 'frontera.contrib.backends.abackend.MyBackendLIFO'


class TestMyBackendDFS(backends.DFSBackendTest):
    backend_class = 'frontera.contrib.backends.abackend.MyBackendDFS'


class TestMyBackendBFS(backends.BFSBackendTest):
    backend_class = 'frontera.contrib.backends.abackend.MyBackendBFS'


class TestMyBackendRANDOM(backends.RANDOMBackendTest):
    backend_class = 'frontera.contrib.backends.abackend.MyBackendRANDOM'