Francesco Magnani
The primary areas where I've contributed on the implementation side include:
Scooby's start/stop mechanisms
DSL structure, scraper section, and safety mechanisms
Scooby testing class
HTTP library new API
After using the library in various scenarios, we noticed that the original usage could be cumbersome and verbose. To address this, we designed and implemented a new API to provide a more flexible and concise usage, adopting a DSL-like approach (reported here as it is an implementation detail)
For example, in the following snippets it's possible to see the comparison.
The new API relies on the Deserializer mechanism and the PartialCall
class.
Using Scala's Conversion
system, calls are automatically routed through a given
Client with a Backend
that works with a specific response type. When a Response
is received, a given
deserializer converts the Either[HttpError, Response]
into Either[HttpError, T]
, where T
is inferred from the receiver variable's type ( e.g., a ScrapeDocument
).
This mechanism simplifies and clarifies the library's usage while maintaining safety checks and error management.
Scooby Start/Stop Mechanisms
The Scooby system utilizes the Akka actor system, making it crucial to manage the application's start and stop processes gracefully. The stopping process, in particular, is challenging because it requires ensuring all actors have completed their tasks and no longer need processing time.
To address this, we must understand the macro steps of execution and their interdependencies. A comprehensive description of this management is available here.
DSL Scraper Keywords
The scraping keywords consist of a single instruction, scrape
, which opens a Context
for defining the scraping policy.
Additional keywords can be used within this scope, depending on the type of data being scraped. Scraping policies can return results of any Scala type, not just HTML elements (e.g., tuples and strings). However, since the target is typically an HTML document, other keywords specifically facilitate working with HTML elements, offering a more language-assisted customization of the scraping policy.
For example, in the following snippet:
Here, that
is an alias for the Scala collection method filter
, and haveId
is a method that generates a predicate for HTML elements, making the expression compile in Scala. These keywords are implemented as follows:
This approach is used throughout the DSL, making it highly customizable and enabling the creation of various useful keywords.
DSL Safety Mechanism
Keywords like scrape
and scooby
create scopes with a given
context, potentially allowing invalid programs to compile, such as with nested repeated scopes:
The snippet above results invalid, and a safety mechanism is required for preventing user to correctly use the DSL syntax. For instance, ScalaTest throws an exception at runtime if a ... should ... in
keyword is nested incorrectly.
To address this, we instead use Scala 3 macros. Dangerous operators like scrape
have a safe version (exposed outside the DSL package) and an unsafe one (kept private). The safe version uses Scala inline
methods and scala.compiletime
utilities to detect and prevent repeated given
s at compile time.
The macro is implemented as follows:
Scooby Testing Class
To facilitate DSL testing, a ScoobyTest
class has been developed. This class contains methods that simulate Scooby application behavior, making it easier to test various scenarios.