pebkac: The AI-Powered Web Automaton Without The Automation

Update: I've written a more detailed guide on setting up the pebkac environment. Not required, but helpful. It's on Medium.

Update 2: This project is now on hold, and I am moving on to other, more exciting projects. Watch this space. Check out the release notes for more project info.

Update 3: https://www.youtube.com/watch?v=BVRAr1iQyQQ watch this one. I've been paying attention to Cory Doctorow since 2009.

Update 4: New thing: https://akickintheteeth.com/ SLOP FIGHTER

What This Is

pebkac browses the web for you. It is a web nonautomation framework powered by SmolAgents and Zendriver. Synchronous communication becomes asynchronous communication in an elegant double helix of English language-powered Python interpretation driven by you, the user. There is no MCP, no n8n, no LangChain or LangGraph. pebkac employs the LLM's native ability to control a web browser by writing Python directly into it.

Zendriver is described as "A blazing fast, async-first, undetectable webscraping/web automation framework based on ultrafunkamsterdam/nodriver."
SmolAgents is "a barebones library for agents that think in code."

Together, they fit to give your localised, secure, rambunctiously stupid LLM a manual and a set of tools to operate a web browser.

✨ Features

Core Capabilities

Autonomous Intelligence

🧠 Code-Writing LLM - Writes Python with loops, conditions, error handling (not limited to sequential tool calls like LangChain)
🎯 Multi-Step Reasoning - Works through complex tasks independently over 10 configurable steps
🔄 Self-Correction - Tries alternative strategies when approaches fail

Stealth & Persistence

👻 Undetectable Automation - Bypasses anti-bot detection (Cloudflare, etc.) using real Chrome instead of WebDriver
🔐 Persistent Sessions - Remembers logins and cookies across restarts (sign in once, done)
🛡️ Anti-Bot Bypass - Automatically handles challenges and verification pages

Smart Data Extraction

📊 Intelligent Content Extraction - Trafilatura parses web pages like a human reader (ignores ads, navigation, footers)
🎯 API Response Capture - Extracts structured JSON from modern websites instead of scraping messy HTML
⚡ Extreme Caching - 500-2000x faster on repeat visits (Redis + DuckDB two-tier cache with hit rate tracking)
📑 Tab Management - Opens relevant pages (max 3) in background after extracting their content for user exploration
📜 Execution History - Automatically saves all agent runs to SQLite with query, result, and step tracking

User Experience

💬 Chat Interface - Type natural language commands at localhost:8888
👁️ Live Browser View - Watch what it's doing via noVNC (1280x720)
📝 Detailed Logging - See every decision and action in real-time
🔍 Web Search Integration - Searches DuckDuckGo and filters out junk results
📊 Cache Statistics - Monitor L1/L2 cache performance, hit rates, memory usage, and execution history

Performance Features

🚀 Parallel Operations - Extracts multiple page elements simultaneously
🎨 Form Automation - Types with human-like delays, handles keyboard navigation
📸 Screenshot Capture - Visual verification of page state
📊 Selector Learning - Remembers which CSS selectors work per site and reuses them automatically (survives restarts)

🚀 Why pebkac Outperforms Traditional Solutions

The Game-Changer: LLMs Write Python, Not JSON

Unlike LangChain's rigid JSON tool-calling or MCP's predefined functions, pebkac's LLM writes actual Python code that executes browser actions. This means your AI will look at its own tools and write Python code to utilise them. This is impossible with LangChain/MCP's approach. They can only call predefined tools sequentially. pebkac's LLM can write loops, conditions, error handling, and complex logic.

This also means that pebkac is only as capable as the LLM that runs it, and the prompts you give it! It is fundamentally of no mind. It has no real understanding of what it is asked to do. All it has is Google Chrome dev tools, a couple libraries, and an API.

Frankly, no LLM has been made that is supposed to fully operate Google Chrome.

The browser runs with noVNC and loads about:blank on startup. You are warned. pebkac is not C-3P0. pebkac is a garden path. pebkac will click the wrong buttons. It will go off on tangents. It works independently through ten (adjustable) steps using its own logic and processes, providing entirely self-directed browsing. While pebkac is active you can check the highly detailed log output below the browser window to see what your LLM is up to.

Or just give it a job and go do something else. Eat an apple. Read a book.

You operate it simply by opening the pebkac Control Panel in your browser (localhost:8888) and typing into the chat window. The control panel displays the browser via noVNC and shows live logs from both the browser automation service and the LLM. pebkac will perform its duties and return nicely-formatted results in the chat window.

🚀 How does pebkac know what to do?

By reading the page, of course, same as you. State of the art extraction technologies are built in to Zendriver's existing framework, giving it an enormous capability boost. I used Trafilatura to achieve this.

Trafilatura is "a cutting-edge Python package and command-line tool designed to gather text on the Web and simplify the process of turning raw HTML into structured, meaningful data."

Basically, pebkac's vision is augmented. Not only is it excellent at text/data extraction (check it out on github: https://github.com/adbar/trafilatura) it utilises its extraction (along with native Zendriver CSS detection) to figure out what to do! This makes things like handling Cloudflare and popups a lot easier.

YOU CAN ALSO interact with the Chrome browser pebkac uses. You can manually sign into websites and ask pebkac to perform actions on the page. Think of it like a co-browser. It can go off on its own, collect the day's news, find out about things, and (maybe) handle little jobs while you do other things, or you can drop in, hang ten over the keyboard, and surf collaboratively. Remember, pebkac and its browser are fully contained, so there's no way the LLM can access your host PC.

This whole project is both an entirely useful web co-browsing service and a stark artistic reminder of the realities of our modular, chronically online existences. We all exist in our little boxes with internet connections to view the outside world, and now more than ever our little boxes are subject to oversight and control by forces far more intelligent than us. I view this project as a black mirror (lol) to our modern life.

It's also never been done before.

It's also incredibly capable.

✨ Technicals

With a powerful enough LLM behind it, this setup is capable of:

Thinking (via LLM)
Seeing (via CSS selection/Trafilatura)
Acting (via SmolAgents and Zendriver)
Remembering (via elaborate, lightweight caching)
Learning (via CSS selector tracking)

Here's what it does:

Avoids the need to pay for API calls. The LLM now works like you do.
Remembers your logins across Podman/Docker sessions.
Interprets your commands with versatility. If you ask it to "search amazon", it'll go to Amazon and search. If you ask it to "wait 1min and reload", it will figure it out.
Coordinates its own tool use so it doesn't get confused. It won't extract before navigating, and knows what page it's already on.
Combines its usage of tools mid-step (with async). Remember how I said it has ten steps to complete a task? Inside each of those steps the LLM makes its own decisions about how to work.
Decides its own workflows. Aside from operating a browser search, its methods are decided on the fly.
Remembers its workflows after each run. All data is collected.
Navigates, types, searches, clicks, visits, extracts, takes screenshots, exports markdown, bypasses cloudflare, fills forms.
Tries, fails, and LEARNS. If one strategy fails, another might work.
Parses text intelligently. Trafilatura is excellent and its responses are formatted cleanly.
Caches useful and relevant data extensvely and returns it when required, conserving massive amounts of context.
Logs each action extensively. All logfiles are available in the control panel.
Validates inputs! I've done much to ensure there is little to no risk from Javascript or SQL injection. Please be careful. I made sure to do this based on an xkcd comic strip I saw in high school: https://m.xkcd.com/327/
A lot more. It is designed to turn your natural language input into results, and does its humble best.

Features Under Consideration:

Voice assist (delayed due to hardware)
Vision assist
Live/Dynamic site data tracking

This version of pebkac is designed to be mindful of context length and run on inexpensive GPUs. I built this whole project on a very budget MiniPC, and tested it with a specific finetuned model. For operating pebkac, I would HIGHLY recommend using David_AU's models, particularly the Brainstorm variants. Not only do they know to operate pebkac nearly 100% of the time, but they seem to have been trained on the SmolAgents library, making much of the 'thinking' already integrated.

Search for and download them here: https://hf.tst.eu/model

I did most testing using DavidAU/Qwen3-Jan-Nano-128k-6B-Brainstorm20x which was fast for my testing cases, but I would VERY MUCH RECOMMEND looking at the MoE models, like Qwen3-30b-whatever. His MoE models are excellent. Between thinking and non-thinking models, I like the results I get from non-thinking models.

It is reasonably important to find a model with an extremely long context length, like 64k or higher.

I would also highly recommend adjusting the extraction method to extract more text, and altering llama.cpp's GPU usage in the .env file. That will truly allow pebkac to work its magic.

And so, I introduce to you pebkac, the web automation service without the automation. It's just a mathematical word-generator with a set of word-tools, let free on the internet.

AUTHOR'S NOTE

For full disclosure, I am a writer, not a developer. I barely know print hello world. I began this project using Claude as a way to automate my own web research and social media activities. What came out of it was a much larger project that took many months to complete and taught me a lot about AI, programming, and computer science. It's not that I assumed it wouldn't be hard, but that I assumed it wouldn't be so complex. I can confidently say that I understand most of this project, but of course, I don't know what I don't know. Use pebkac at your own risk. It's as secure as a VIBE CODING AUTHOR knows how to make it.

What I have learned more than anything is that my very basic hardware cannot handle LLMs very well. I have made sure every part of this project is as lightweight and fast as possible. If you choose to support me by donating, that money would first and foremost go towards making this project harder, better, faster, and stronger. I'd also like to direct you to my Amazon page, because on God, writing books doesn't make money. My novel Well's Rest is on Amazon.

pebkac might even be able to buy it for you.

It will definitely be able to find me on Royal Road: https://www.royalroad.com/fiction/126900/wells-rest-grimdark-pirate-action-watch-book-trailer

I am of the opinion that pebkac demonstrates how mainstream approaches to LLMs are changing. Small, tailored models are the future for operating untold new and old technologies. I do not know if they should be writing words that mean things to humans. Check out my website at www.akickintheteeth.com. That is where I've been documenting my other experiments with AI. It's a brave new world!

Getting Started

Requirements:

Podman or Docker I used Podman. I use a custom storage directory for my Podman setup. You'll have to adapt a little.
Podman/Docker Compose Runs stuff.
Linux I use Mint.
AMD or NVIDIA GPU for llama.cpp acceleration

Initial Setup

For total beginners, start here.

Clone this repository into your podman directory
Rename the ROOT files to .env and docker-compose.yml
Configure docker-compose.yml:
- Set group_add entries to your actual render group GID (find with getent group render)
- Adjust GPU settings for your hardware
- Just thoroughly check through the docker-compose files to suit your hardware. Same with the .env files. Be thorough.
Configure .env file:
- Set LLAMACPP_MODEL to your GGUF model filename (must exist in /podman/models/gguf/) unless setting yourself. This is the trickiest part that I can't help with.
- Set LLAMACPP_GPU_LAYERS based on your VRAM
- Set HF_TOKEN if downloading models from HuggingFace
Open the pebkac Yaml Runner.
Access the Control Panel at http://localhost:8888
- Chat interface for interacting with the LLM
- Live browser view via noVNC (1280x720)
- Real-time logs from zendriver and llama-cpp-server
Start chatting! Type commands like "search for cheese" or "go to amazon and find shoes"

Monitoring

You can monitor logs externally via:

hljs language-bash

podman logs -f zendriver        # Browser automation and agent execution
podman logs -f llama-cpp-server # LLM inference logs

Below is some stuff Claude put together. It's mostly accurate. Just more detail.

Core Architecture

1. Browser Automation Layer (Zendriver)

Undetectable Chrome automation using CDP (Chrome DevTools Protocol) via zendriver
Runs in a virtual Wayland/Sway display with full GPU acceleration (1280x720 default)
VNC debugging on port 5910 for visual monitoring
Browser profiles persist across sessions at /tmp/pebkac_profiles/
Clean architectural separation:
- Routes handle high-level business logic
- BrowserManager service layer encapsulates CDP operations
- Only protocol-level routes (network.py) use CDP directly
Full API with endpoints for:
- Navigation with wait conditions
- Element finding by selector or text
- Clicking, typing (with human-like delays)
- Scrolling (directional and to elements)
- Tab management (create, list, close)
- API response capture
- Element discovery
- Content extraction with fallbacks
- Parallel operations

2. AI Agent Layer (SmolAgents + LLM)

SmolAgents framework integration allowing LLMs to use browser tools autonomously
SafeCodeAgent handles multiple final_answer calls and retries on missing final_answer
Local LLM inference via llama.cpp with Vulkan GPU acceleration
Integrated AgentManager runs inside zendriver container (no separate service)

Tool suite for agents (18 tools in zendriver-docker/app/tools/):

Browser Control

NavigateBrowserTool - Navigate to URLs
ClickElementTool - Click elements
TypeTextTool - Type text into inputs
KeyboardNavigationTool - Press keyboard keys
GetCurrentURLTool - Get current page URL

Content Extraction

ExtractContentTool - Extract page content with intelligent link extraction
ParallelExtractionTool - Extract from multiple selectors simultaneously
CapturePageMarkdownTool - Export page as Markdown

Search & Navigation

WebSearchTool - Search DuckDuckGo with result filtering (LLM decides which tabs to open)
VisitWebpageTool - Visit and extract page content in one call
SearchHistoryTool - Access cached search results

Tab Management (NEW)

OpenBackgroundTabTool - Open URLs in background while keeping main tab active
ListTabsTool - List all tabs with indices and status
CloseTabTool - Close background tabs (tab 0 protected)

API & Network

CaptureAPIResponseTool - Capture JSON API responses during navigate/click (structured data extraction)

Advanced Tools

ScreenshotTool - Capture screenshots
CloudflareBypassTool - Handle anti-bot challenges automatically
GetElementPositionTool - Get element coordinates for verification

3. Caching Infrastructure

Two-tier cache system providing 500-2000x speedup on cached content:

L1 Cache (Redis): Ultra-fast in-memory caching (10ms lookups)
- 512MB with LRU eviction
- Complete response storage with formatted output
- 90-day selector performance memory
- Automatic cleanup every 30 minutes
L2 Cache (DuckDB SQL Database): Persistent disk-based analytical database
- Embedded SQL database optimized for analytics (similar to SQLite)
- Survives container restarts
- HTTP service with 5-connection pool
- SQL tables: cached_pages, cached_elements, cache_metrics
- Permanent selector analytics with success/fail tracking
- Transaction management with conn.commit() for data persistence
- Database file: /mnt/ssd/podman/duckdb-data/cache.db
Tiered Lookup Flow:
1. Check L1 (Redis) → Return if hit (~10ms)
2. Check L2 (DuckDB) → Promote to L1 if hit (~95ms)
3. Extract from browser if both miss (5-20 seconds)
Smart Caching Strategies:
- Selective L2 persistence: Universal extractions, large content (>10KB), or long-lived (≥1 hour TTL)
- Content-based TTL: Structural elements (24h), text selectors (30min), dynamic content (no cache)
- Cache key normalization: Maximizes hits by removing query params and normalizing URLs
- Auto-promotion: L2 hits automatically promoted to L1 for future fast access

4. Control Panel

Web UI at localhost:8888 for chat and system monitoring
Chat interface with localStorage persistence
Tabbed interface: Chat | VNC Browser | Logs
Live logs from zendriver and llama-cpp-server containers
Container controls: Start, stop, reset services

Key Technical Achievements

Robust extraction system:
- Multiple strategies for text extraction
- Metadata extraction capabilities
- Support for both visible and hidden text
Workflows:
- Publication analysis with structure detection
- Parallel operations with proper error handling
- Retry strategies with exponential backoff
- Performance tracking for selectors
Human-like interactions:
- Typing with configurable delays between keystrokes
- Smooth scrolling options
- Tab navigation for accessibility

Production Features

Reliability

Health checks on all services
Automatic restart policies
Error tracking and logging
Cache fallbacks if extraction fails

Performance

GPU-accelerated browser rendering
Parallel extraction capabilities
Intelligent caching to reduce re-scraping
Optimised LLM inference with batching

Security & Stealth

Undetectable by anti-bot systems (Cloudflare, etc.)
Persistent browser profiles
No automation markers
Real browser behavior simulation

Scalability

Docker/Podman-based deployment
Network isolation with custom subnet
Volume management for persistent data
Profile-based configuration

Use Cases Enabled

Autonomous Web Research: LLM can browse, search, and compile information
Data Extraction: Scrape any website, even those with anti-bot protection
Form Automation: Fill out complex forms with AI decision-making
Content Monitoring: Track changes on websites over time
Testing Automation: AI-driven testing of web applications
Information Synthesis: Combine data from multiple sources automatically

CREDITS:

Big thanks to everyone at HuggingFace https://github.com/huggingface/smolagents / https://huggingface.co

cdpdriver for Zendriver: https://github.com/cdpdriver/zendriver / https://zendriver.dev/

adbar for Trafilatura: https://github.com/adbar/trafilatura / https://trafilatura.readthedocs.io/en/latest/

This project is licensed with GNU Public V3

Support me at: https://ko-fi.com/dredgesta www.akickintheteeth.com

BTC: bc1qdl5q2qwupxnqgr6u8r3nvppvzr8k889f4k3j7n

pebkac: The AI-Powered Web Automaton Without The Automation

Update: I've written a more detailed guide on setting up the pebkac environment. Not required, but helpful. It's on Medium.

Update 2: This project is now on hold, and I am moving on to other, more exciting projects. Watch this space. Check out the release notes for more project info.

Update 3: https://www.youtube.com/watch?v=BVRAr1iQyQQ watch this one. I've been paying attention to Cory Doctorow since 2009.

Update 4: New thing: https://akickintheteeth.com/ SLOP FIGHTER

What This Is

Zendriver is described as "A blazing fast, async-first, undetectable webscraping/web automation framework based on ultrafunkamsterdam/nodriver."
SmolAgents is "a barebones library for agents that think in code."

Together, they fit to give your localised, secure, rambunctiously stupid LLM a manual and a set of tools to operate a web browser.

✨ Features

Core Capabilities

Autonomous Intelligence

🧠 Code-Writing LLM - Writes Python with loops, conditions, error handling (not limited to sequential tool calls like LangChain)
🎯 Multi-Step Reasoning - Works through complex tasks independently over 10 configurable steps
🔄 Self-Correction - Tries alternative strategies when approaches fail

Stealth & Persistence

👻 Undetectable Automation - Bypasses anti-bot detection (Cloudflare, etc.) using real Chrome instead of WebDriver
🔐 Persistent Sessions - Remembers logins and cookies across restarts (sign in once, done)
🛡️ Anti-Bot Bypass - Automatically handles challenges and verification pages

Smart Data Extraction

📊 Intelligent Content Extraction - Trafilatura parses web pages like a human reader (ignores ads, navigation, footers)
🎯 API Response Capture - Extracts structured JSON from modern websites instead of scraping messy HTML
⚡ Extreme Caching - 500-2000x faster on repeat visits (Redis + DuckDB two-tier cache with hit rate tracking)
📑 Tab Management - Opens relevant pages (max 3) in background after extracting their content for user exploration
📜 Execution History - Automatically saves all agent runs to SQLite with query, result, and step tracking

User Experience

💬 Chat Interface - Type natural language commands at localhost:8888
👁️ Live Browser View - Watch what it's doing via noVNC (1280x720)
📝 Detailed Logging - See every decision and action in real-time
🔍 Web Search Integration - Searches DuckDuckGo and filters out junk results
📊 Cache Statistics - Monitor L1/L2 cache performance, hit rates, memory usage, and execution history

Performance Features

🚀 Parallel Operations - Extracts multiple page elements simultaneously
🎨 Form Automation - Types with human-like delays, handles keyboard navigation
📸 Screenshot Capture - Visual verification of page state
📊 Selector Learning - Remembers which CSS selectors work per site and reuses them automatically (survives restarts)

🚀 Why pebkac Outperforms Traditional Solutions

The Game-Changer: LLMs Write Python, Not JSON

Frankly, no LLM has been made that is supposed to fully operate Google Chrome.

Or just give it a job and go do something else. Eat an apple. Read a book.

🚀 How does pebkac know what to do?

Trafilatura is "a cutting-edge Python package and command-line tool designed to gather text on the Web and simplify the process of turning raw HTML into structured, meaningful data."

It's also never been done before.

It's also incredibly capable.

✨ Technicals

With a powerful enough LLM behind it, this setup is capable of:

Thinking (via LLM)
Seeing (via CSS selection/Trafilatura)
Acting (via SmolAgents and Zendriver)
Remembering (via elaborate, lightweight caching)
Learning (via CSS selector tracking)

Here's what it does:

Avoids the need to pay for API calls. The LLM now works like you do.
Remembers your logins across Podman/Docker sessions.
Interprets your commands with versatility. If you ask it to "search amazon", it'll go to Amazon and search. If you ask it to "wait 1min and reload", it will figure it out.
Coordinates its own tool use so it doesn't get confused. It won't extract before navigating, and knows what page it's already on.
Combines its usage of tools mid-step (with async). Remember how I said it has ten steps to complete a task? Inside each of those steps the LLM makes its own decisions about how to work.
Decides its own workflows. Aside from operating a browser search, its methods are decided on the fly.
Remembers its workflows after each run. All data is collected.
Navigates, types, searches, clicks, visits, extracts, takes screenshots, exports markdown, bypasses cloudflare, fills forms.
Tries, fails, and LEARNS. If one strategy fails, another might work.
Parses text intelligently. Trafilatura is excellent and its responses are formatted cleanly.
Caches useful and relevant data extensvely and returns it when required, conserving massive amounts of context.
Logs each action extensively. All logfiles are available in the control panel.
Validates inputs! I've done much to ensure there is little to no risk from Javascript or SQL injection. Please be careful. I made sure to do this based on an xkcd comic strip I saw in high school: https://m.xkcd.com/327/
A lot more. It is designed to turn your natural language input into results, and does its humble best.

Features Under Consideration:

Voice assist (delayed due to hardware)
Vision assist
Live/Dynamic site data tracking

Search for and download them here: https://hf.tst.eu/model

It is reasonably important to find a model with an extremely long context length, like 64k or higher.

I would also highly recommend adjusting the extraction method to extract more text, and altering llama.cpp's GPU usage in the .env file. That will truly allow pebkac to work its magic.

And so, I introduce to you pebkac, the web automation service without the automation. It's just a mathematical word-generator with a set of word-tools, let free on the internet.

AUTHOR'S NOTE

pebkac might even be able to buy it for you.

It will definitely be able to find me on Royal Road: https://www.royalroad.com/fiction/126900/wells-rest-grimdark-pirate-action-watch-book-trailer

Getting Started

Requirements:

Podman or Docker I used Podman. I use a custom storage directory for my Podman setup. You'll have to adapt a little.
Podman/Docker Compose Runs stuff.
Linux I use Mint.
AMD or NVIDIA GPU for llama.cpp acceleration

Initial Setup

For total beginners, start here.

Clone this repository into your podman directory
Rename the ROOT files to .env and docker-compose.yml
Configure docker-compose.yml:
- Set group_add entries to your actual render group GID (find with getent group render)
- Adjust GPU settings for your hardware
- Just thoroughly check through the docker-compose files to suit your hardware. Same with the .env files. Be thorough.
Configure .env file:
- Set LLAMACPP_MODEL to your GGUF model filename (must exist in /podman/models/gguf/) unless setting yourself. This is the trickiest part that I can't help with.
- Set LLAMACPP_GPU_LAYERS based on your VRAM
- Set HF_TOKEN if downloading models from HuggingFace
Open the pebkac Yaml Runner.
Access the Control Panel at http://localhost:8888
- Chat interface for interacting with the LLM
- Live browser view via noVNC (1280x720)
- Real-time logs from zendriver and llama-cpp-server
Start chatting! Type commands like "search for cheese" or "go to amazon and find shoes"

Monitoring

You can monitor logs externally via:

hljs language-bash

podman logs -f zendriver        # Browser automation and agent execution
podman logs -f llama-cpp-server # LLM inference logs

Below is some stuff Claude put together. It's mostly accurate. Just more detail.

Core Architecture

1. Browser Automation Layer (Zendriver)

Undetectable Chrome automation using CDP (Chrome DevTools Protocol) via zendriver
Runs in a virtual Wayland/Sway display with full GPU acceleration (1280x720 default)
VNC debugging on port 5910 for visual monitoring
Browser profiles persist across sessions at /tmp/pebkac_profiles/
Clean architectural separation:
- Routes handle high-level business logic
- BrowserManager service layer encapsulates CDP operations
- Only protocol-level routes (network.py) use CDP directly
Full API with endpoints for:
- Navigation with wait conditions
- Element finding by selector or text
- Clicking, typing (with human-like delays)
- Scrolling (directional and to elements)
- Tab management (create, list, close)
- API response capture
- Element discovery
- Content extraction with fallbacks
- Parallel operations

2. AI Agent Layer (SmolAgents + LLM)

SmolAgents framework integration allowing LLMs to use browser tools autonomously
SafeCodeAgent handles multiple final_answer calls and retries on missing final_answer
Local LLM inference via llama.cpp with Vulkan GPU acceleration
Integrated AgentManager runs inside zendriver container (no separate service)

Tool suite for agents (18 tools in zendriver-docker/app/tools/):

Browser Control

NavigateBrowserTool - Navigate to URLs
ClickElementTool - Click elements
TypeTextTool - Type text into inputs
KeyboardNavigationTool - Press keyboard keys
GetCurrentURLTool - Get current page URL

Content Extraction

ExtractContentTool - Extract page content with intelligent link extraction
ParallelExtractionTool - Extract from multiple selectors simultaneously
CapturePageMarkdownTool - Export page as Markdown

Search & Navigation

WebSearchTool - Search DuckDuckGo with result filtering (LLM decides which tabs to open)
VisitWebpageTool - Visit and extract page content in one call
SearchHistoryTool - Access cached search results

Tab Management (NEW)

OpenBackgroundTabTool - Open URLs in background while keeping main tab active
ListTabsTool - List all tabs with indices and status
CloseTabTool - Close background tabs (tab 0 protected)

API & Network

CaptureAPIResponseTool - Capture JSON API responses during navigate/click (structured data extraction)

Advanced Tools

ScreenshotTool - Capture screenshots
CloudflareBypassTool - Handle anti-bot challenges automatically
GetElementPositionTool - Get element coordinates for verification

3. Caching Infrastructure

Two-tier cache system providing 500-2000x speedup on cached content:

L1 Cache (Redis): Ultra-fast in-memory caching (10ms lookups)
- 512MB with LRU eviction
- Complete response storage with formatted output
- 90-day selector performance memory
- Automatic cleanup every 30 minutes
L2 Cache (DuckDB SQL Database): Persistent disk-based analytical database
- Embedded SQL database optimized for analytics (similar to SQLite)
- Survives container restarts
- HTTP service with 5-connection pool
- SQL tables: cached_pages, cached_elements, cache_metrics
- Permanent selector analytics with success/fail tracking
- Transaction management with conn.commit() for data persistence
- Database file: /mnt/ssd/podman/duckdb-data/cache.db
Tiered Lookup Flow:
1. Check L1 (Redis) → Return if hit (~10ms)
2. Check L2 (DuckDB) → Promote to L1 if hit (~95ms)
3. Extract from browser if both miss (5-20 seconds)
Smart Caching Strategies:
- Selective L2 persistence: Universal extractions, large content (>10KB), or long-lived (≥1 hour TTL)
- Content-based TTL: Structural elements (24h), text selectors (30min), dynamic content (no cache)
- Cache key normalization: Maximizes hits by removing query params and normalizing URLs
- Auto-promotion: L2 hits automatically promoted to L1 for future fast access

4. Control Panel

Web UI at localhost:8888 for chat and system monitoring
Chat interface with localStorage persistence
Tabbed interface: Chat | VNC Browser | Logs
Live logs from zendriver and llama-cpp-server containers
Container controls: Start, stop, reset services

Key Technical Achievements

Robust extraction system:
- Multiple strategies for text extraction
- Metadata extraction capabilities
- Support for both visible and hidden text
Workflows:
- Publication analysis with structure detection
- Parallel operations with proper error handling
- Retry strategies with exponential backoff
- Performance tracking for selectors
Human-like interactions:
- Typing with configurable delays between keystrokes
- Smooth scrolling options
- Tab navigation for accessibility

Production Features

Reliability

Health checks on all services
Automatic restart policies
Error tracking and logging
Cache fallbacks if extraction fails

Performance

GPU-accelerated browser rendering
Parallel extraction capabilities
Intelligent caching to reduce re-scraping
Optimised LLM inference with batching

Security & Stealth

Undetectable by anti-bot systems (Cloudflare, etc.)
Persistent browser profiles
No automation markers
Real browser behavior simulation

Scalability

Docker/Podman-based deployment
Network isolation with custom subnet
Volume management for persistent data
Profile-based configuration

Use Cases Enabled

Autonomous Web Research: LLM can browse, search, and compile information
Data Extraction: Scrape any website, even those with anti-bot protection
Form Automation: Fill out complex forms with AI decision-making
Content Monitoring: Track changes on websites over time
Testing Automation: AI-driven testing of web applications
Information Synthesis: Combine data from multiple sources automatically

CREDITS:

Big thanks to everyone at HuggingFace https://github.com/huggingface/smolagents / https://huggingface.co

cdpdriver for Zendriver: https://github.com/cdpdriver/zendriver / https://zendriver.dev/

adbar for Trafilatura: https://github.com/adbar/trafilatura / https://trafilatura.readthedocs.io/en/latest/

This project is licensed with GNU Public V3

Support me at: https://ko-fi.com/dredgesta www.akickintheteeth.com

BTC: bc1qdl5q2qwupxnqgr6u8r3nvppvzr8k889f4k3j7n

pebkac-chrome

pebkac: The AI-Powered Web Automaton Without The Automation

What This Is

✨ Features

Core Capabilities

🚀 Why pebkac Outperforms Traditional Solutions

🚀 How does pebkac know what to do?

✨ Technicals

And so, I introduce to you pebkac, the web automation service without the automation. It's just a mathematical word-generator with a set of word-tools, let free on the internet.

AUTHOR'S NOTE

Getting Started

Requirements:

Initial Setup

Monitoring

Core Architecture

1. Browser Automation Layer (Zendriver)

2. AI Agent Layer (SmolAgents + LLM)

Browser Control

Content Extraction

Search & Navigation

Tab Management (NEW)

API & Network

Advanced Tools

3. Caching Infrastructure

4. Control Panel

Key Technical Achievements

Production Features

Reliability

Performance

Security & Stealth

Scalability

Use Cases Enabled

CREDITS:

Similar Packages

pebkac-chrome

pebkac: The AI-Powered Web Automaton Without The Automation

What This Is

✨ Features

Core Capabilities

🚀 Why pebkac Outperforms Traditional Solutions

🚀 How does pebkac know what to do?

✨ Technicals

And so, I introduce to you pebkac, the web automation service without the automation. It's just a mathematical word-generator with a set of word-tools, let free on the internet.

AUTHOR'S NOTE

Getting Started

Requirements:

Initial Setup

Monitoring

Core Architecture

1. Browser Automation Layer (Zendriver)

2. AI Agent Layer (SmolAgents + LLM)

Browser Control

Content Extraction

Search & Navigation

Tab Management (NEW)

API & Network

Advanced Tools

3. Caching Infrastructure

4. Control Panel

Key Technical Achievements

Production Features

Reliability

Performance

Security & Stealth

Scalability

Use Cases Enabled

CREDITS:

Similar Packages