Get started
To start using Scooby in a new SBT project, you need to manually add the library.
Generate a new project using SBT.
Download the JAR from the latest release of the Scooby library.
Create a new
lib
folder inside your SBT project.Place the downloaded JAR inside the
lib
folder you've just created.Create a class that extends either
org.unibo.scooby.dsl.ScoobyEmbeddable
ororg.unibo.scooby.dsl.ScoobyApplication
.
ScoobyEmbeddable
is a Scala trait that can be added to a class to use the Scooby DSL without it being executable. The scooby
keyword will then return a Future
containing the result of the scraping. ScoobyApplication
, on the other hand, can be extended by a Scala object to be executable directly.
Here's the difference in their usage:
Here's instead a full example of the usage with ScoobyApplication.
Customization
Provided DSL is open to customization, we offer a brief introduction to explore possible configurations.
Network
In order to enlarge visit to websites which require user authentication, it is possible to define multiple headers in headers section as
Crawler
It is possible to define custom policies, which must adhere to type CrawlDocument ?=> Iterable[URL]
. An example could be:
Scraper
It is possible to define custom policies, which must adhere to type ScrapeDocument ?=> Iterable[T]
. It is also possible to mix policies using boolean filter conditions. An example could be:
Exporter
It is possible to define both batch and streaming strategies, even multiple times, concatenating their effects. An example could be:
When output is configured toFile, it's possible to define preferred file action, between Append (append results to already existing text in file) and Overwrite (which delete previous content of the file). Default behavior if not specified is Overwrite.