zlacker

Tadpole – A modular and extensible DSL built for web scraping

submitted by zachpe+(OP) on 2026-02-03 16:29:13 | 62 points 13 comments
[view article] [source] [go to bottom]

NOTE: showing posts with links only show all posts
1. zachpe+1[view] [source] 2026-02-03 16:29:13
>>zachpe+(OP)
Hello!

I wanted to share my recent project: Tadpole. It is a custom DSL built on top of KDL specifically for web scraping and browser automation. I wanted there to be a standardized way of writing scrapers and reusing existing scraper logic. This was my solution.

Why?

    Abstraction: Simulating realistic human behavior (bezier curves, easing) through high-level composed actions.
    Zero Config: Import and share scraper modules directly via Git, bypass NPM/Registry overhead.
    Reusability: Actions and evaluators can be composed through slots to create more complex workflows.

Example

This is a fully running example, @tadpole/cli is published on npm:

tadpole run redfin.kdl --input '{"text": "Seattle, WA"}' --auto --output output.json

  import "modules/redfin/mod.kdl" repo="github.com/tadpolehq/community"

  main {
    new_page {
      redfin.search text="=text"
      wait_until
      redfin.extract_from_card extract_to="addresses" {
        address {
          redfin.extract_address_from_card
        }
      }
    }
  }

Roadmap? Planned for 0.2.0

    Control Flow: Add maybe (effectively try/catch) and loop (while {}, do {})
    DOMPick: Used to select elements by index
    DOMFilter: Used to filter elements using evaluators
    More Evaluators: Type casting, regex, exists
    Root Slots: Support for top level dynamic placeholders
    Error Reporting: More robust error reporting
    Logging: More consistent logging from actions and add log action to global registry
0.3.0

    Piping: Allowing different files to chain input/output.
    Outputs: Complex output sinks to databases, s3, kafka, etc.
    DAGs: Use directed acylic graphs to create complex crawling scenarios and parallel compute.
Github Repository: https://github.com/tadpolehq/tadpole

I've also created a community repository for sharing scraper logic: https://github.com/tadpolehq/community

Feedback would be greatly appreciated!

◧◩
11. zachpe+A63[view] [source] [discussion] 2026-02-04 12:33:03
>>himujj+Qh2
Not really. Operating with Chrome Devtools Protocol over WS is completely different. You cannot move your mouse, you cannot open tabs, you cannot control the browser in the same way. It's not really comparable, sorry.

Here is a reference for you to read up on: https://chromedevtools.github.io/devtools-protocol/

[go to top]