Crawler
A Crawler is a system entity responsible for searching explorable links inside a web page. It interacts with the Coordinator to validate the found urls and is responsible for creating scrapers, for extracting data from a page and new crawlers exploring new urls.
We can summarize the interaction between the Crawler and the other components with the following:
Each time a Crawler found a new valid url it spawns a new child crawler that will analyze it. When the analysis of the page is complete, a crawler will continue to signal it to the parent, then when a crawler no longer has an active child it's removed from the system.
Structure
Last modified: 07 August 2024