Scooby Help

Exporter

An Exporter is a system entity responsible for exporting the results obtained by scrapers. Exporters primarily interact with the Scrapers, receiving results and exporting them according to a specific Exporting Behavior. Each Scraper is aware of the available Exporters and forwards the results to them accordingly.

Exporting Behaviors define how the results should be processed (e.g., writing to a file, displaying on standard output). Each Exporter is associated with a single Exporting Behavior, but multiple Exporters can handle the same results in different ways.

Exporters are categorized into Stream Exporters and Batch Exporters.

Stream Exporters

Stream Exporters operate in real-time, exporting results according to the specified exporting behavior as soon as they become available. They work in a memory-less fashion, typically used to display results as they come in, without persistence (e.g., standard output).

The interaction between Scrapers and Stream Exporters can be summarized as follows:

ScraperScraperStreamExporterStreamExporterScrape(document)Scrape document's contentResults(scraped data)Export(results)

Batch Exporters

Batch Exporters accumulate results and apply their Exporting Behavior only after the entire scraping process is completed. Since they receive results from multiple scrapers and cannot process them immediately, they need a method to aggregate the results. This is where Aggregation Behavior comes into play, defining how different results should be combined. Like Exporting Behaviors, multiple default Aggregation Behaviors can be defined.

Note: Batch Exporters must interact with the system manager (Scooby) to determine when the entire system execution has ended, allowing them to proceed with exporting the accumulated results.

The interaction between Scrapers and Batch Exporters can be summarized as follows:

ScraperScraperBatchExporterBatchExporterScoobyScoobyScrape(document)Scrape document's contentResults(scraped data)Aggregate(prevResults, newResults)wait until the end of executionSignalEndExport(results)ExportFinished

Structure

Exporter«Actor»StreamExporterTexportingBehavior: ExportingBehavior<T>«Actor»BatchExporterTexportingBehavior: ExportingBehavior<T>aggregationBehavior: AggregationBehavior<T>ExporterCommandsExport(result:Result[T])SignalEnd(replyTo:ActorRef[ScoobyCommand])ResultTdata: Iterable[T]«Object»ExportingBehaviorswriteOnFile(path: Path, format: FormattingBehavior<T>): ExportingBehavior<T>writeOnConsole(format: FormattingBehavior<T>): ExportingBehavior<T>«Object»FormattingBehaviorsstring: FormattingBehavior<T>json: FormattingBehavior<T>FormattingBehaviorTAggregationBehaviorTExportingBehaviorT«uses»«uses»«uses»«uses»«uses»«uses»«uses»«uses»«uses»
Last modified: 07 August 2024