Results for “benchmark”

54 packages found

Works with: Claude×

@codeguilds-knightCommunity

Drawdown-first portfolio tool with a read-only MCP addon for Claude — a deterministic core computes every number; the AI

0v2.12.0Compare

claudecursorwindsurfcline

AgentWindowsAgentArena

@microsoft✓ Official

Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.

0v0.0.4Compare

claude

Agentai-agents-reality-check

@codeguilds-knightCommunity

Benchmarking the gap between AI agent hype and architecture. Three agent archetypes, 73-point performance spread, stress

0v1.0.0Compare

claude

Skillcoder_eval

@codeguilds-knightCommunity

Evaluate & benchmark AI coding agents and Claude Code skills — sandboxed, reproducible YAML eval suites for Claude Code,

0v0.8.10Compare

claude

Skillgtm-flywheel

@codeguilds-knightCommunity

15 AI-powered GTM skills for Claude Code. Campaign tested frameworks for cold email, ICP research, signal scoring, campa

0v1.0.0Compare

claude

MCP Servertactual

@codeguilds-knightCommunity

Screen-reader navigation cost analyzer — models the real effort to discover, reach, and operate interactive web content

0v1.0.0Compare

claudecursorwindsurfcline

Skillagentpack

@codeguilds-knightCommunity

Local context engine for AI coding agents. Routes tasks to relevant files, tests, rules, and skills, supports prompt cac

0v0.3.25Compare

claude

MCP Serverpodium-mcp

@codeguilds-knightCommunity

One MCP server, 51 tools for AI agents on mobile + canvas UIs: iOS & Android automation, Maestro E2E, evidenced assertio

0v0.4.0Compare

claudecursorwindsurfcline

MCP Serverpdf-reader-mcp

@codeguilds-knightCommunity

📄 Production-ready MCP server for PDF processing - 5-10x faster with parallel processing and 94%+ test coverage

0v2.6.0Compare

claudecursorwindsurfcline

MCP Serverturbomcpstudio

@codeguilds-knightCommunity

A native desktop application for developing, testing, and debugging Model Context Protocol servers.

0v0.1.0Compare

claudecursorwindsurfcline

MCP Servermisterdev

@codeguilds-knightCommunity

Autonomous LLM build orchestrator: plans a goal into tasks, edits code with anchored SEARCH/REPLACE, and verifies every

0v0.3.1Compare

claudecursorwindsurfcline

Skillcontext-kernel

@codeguilds-knightCommunity

Task-induced context normalization for coding agents — a native Claude Code plugin. The task induces a projection; opera

0v1.34.0Compare

claude

AgentAwesome-LLM-Reasoning-with-NeSy

@codeguilds-knightCommunity

✨✨Latest Advances on Neuro-Symbolic Learning in the era of Large Language Models

0v1.0.0Compare

claude

MCP Serverinkcheck

@codeguilds-knightCommunity

CI for ink stories — compile checks, exhaustive branch playtesting, dead-content detection. MCP server + CLI.

0v0.7.2Compare

claudecursorwindsurfcline

Skillawesome-skills

@codeguilds-knightCommunity

A curated system of production-ready Claude Code skills with quantitative evaluation reports, golden test fixtures, and

0v1.0.0Compare

claude

MCP ServerFixMap

@codeguilds-knightCommunity

Local-first repo maps for coding agents—ranked files, test routes, risks, CLI/MCP/GitHub Action, and public GitHub URLs.

0v0.7.0Compare

claudecursorwindsurfcline

Agentclaude-autonomous-deployment

@codeguilds-knightCommunity

Claude 2026: The Developer’s Smartest AI for Long-Context Code & Writing

0v1.0.0Compare

claude

Agentautodev-studio

@codeguilds-knightCommunity

Autonomous multi-agent SDLC harness: describe a feature in plain English and AI agents scope, code, test, review, and op

0v1.0.0Compare

claude

MCP Serverbaton

@codeguilds-knightCommunity

Convert AI coding sessions between Claude Code, Codex, OpenCode, Zed, Aider, Gemini CLI, Cursor, Cline & more — lossless

0v0.4.0Compare

claudecursorwindsurfcline

MCP ServerScrapling

@codeguilds-knightCommunity

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

0v0.4.9Compare

claudecursorwindsurfcline

MCP ServerKryfto

@codeguilds-knightCommunity

The open-source web-browsing backend for AI agents & workflow engines. Ships a 42-tool MCP server for Claude Code/Cursor

0v3.8.0Compare

claudecursorwindsurfcline

Agentpydantic-deepagents

@codeguilds-knightCommunity

Build Claude Code–style deep agents in Python: tool-calling, sandboxed execution, multi-agent teams, skills, checkpoints

0v0.3.29Compare

claude

MCP Serverquantoracle

@codeguilds-knightCommunity

63 deterministic quant computation tools for autonomous financial agents. Options, derivatives, risk, portfolio, statist

0v1.0.0Compare

claudecursorwindsurfcline

MCP Serverempathy-framework

@codeguilds-knightCommunity

Combining a five-level AI framework with git-native memory overcomes session amnesia, enabling anticipation of problems

0v1.0.0Compare

claudecursorwindsurfcline

Skillmedsci-skills

@codeguilds-knightCommunity

Claude Code skills for medical research — literature search, reporting guidelines, statistical analysis, publication fig

0v1.0.0Compare

claude

MCP Serverreelier

@codeguilds-knightCommunity

Agents make claims. Reelier writes receipts — record an agent's tool-call workflow once, replay it deterministically at

0v1.0.0Compare

claudecursorwindsurfcline

SkillTokenBurner

@codeguilds-knightCommunity

A Claude Code skill that burns tokens on demand. Stress test, inflate metrics, or just set money on fire.

0v1.0.0Compare

claude

MCP Serverflox

@codeguilds-knightCommunity

AI-native framework for building trading systems with polyglot bindings.

0v1.0.0Compare

claudecursorwindsurfcline

MCP Servertarn

@codeguilds-knightCommunity

CLI-first API testing tool. YAML-defined tests, structured JSON output, built for AI-assisted workflows.

0v1.0.0Compare

claudecursorwindsurfcline

MCP Serverllmprobe

@codeguilds-knightCommunity

Synthetic monitoring and CI smoke tests for LLM inference endpoints.

0v1.0.0Compare

claudecursorwindsurfcline

AgentXantham-system-blueprint

@codeguilds-knightCommunity

Self-installing personal AI orchestrator. Hand the latest blueprint file to a Claude Code session and it builds a full m

0v1.0.0Compare

claude

MCP Serverbifrost

@codeguilds-knightCommunity

Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ mod

0vent-v1.4.9-stream-pause-baseCompare

claudecursorwindsurfcline

MCP Serverawesome-mcp-devtools

@codeguilds-knightCommunity

A curated list of developer tools, SDKs, libraries, and testing utilities for Model Context Protocol (MCP) server develo

0v1.0.0Compare

claudecursorwindsurfcline

AgentConcoLLMic

@codeguilds-knightCommunity

ConcoLLMic: the first language- and theory-agonistic concolic execution engine via LLM agents

0v1.0.0Compare

claude

Agentwide-lens-engineering

@codeguilds-knightCommunity

Opt-in Codex Skill for practical coding and externally anchored delivery with elastic agent teams, task DAGs, isolated c

0v0.1.0Compare

claude

MCP Servereval-view

@codeguilds-knightCommunity

Regression testing for AI agents. Snapshot behavior,diff tool calls,catch regressions in CI. Works with LangGraph, CrewA

0v1.0.0Compare

claudecursorwindsurfcline

MCP Serverkotlin-mcp-server

@codeguilds-knightCommunity

🧠 Kotlin MCP Server for Android app development using OpenAI, Gemini, or OpenRouter. Enables AI-assisted coding via Aid

0v1.0.0Compare

claudecursorwindsurfcline

MCP Serverthoughtproof-mcp

@codeguilds-knightCommunity

Adversarial multi-model reasoning verification MCP server for AI agents. Claude, Grok, and DeepSeek challenge each decis

0v1.0.0Compare

claudecursorwindsurfcline

Agentagents-md-cookbook

@Taiizor

The tested, tool-agnostic AGENTS.md kit — verified templates, a CI linter, and migrators from .cursorrules/CLAUDE.md/Cop

0v1.0.0Compare

claude

Agentsuper-smoke-test

@codeguilds-knightCommunity

Automated Claude Code QA gate for AI-assisted development — Codex code review + Playwright browser smoke testing before

0v1.0.0Compare

claude

Agentmulti-agent-ralph-loop

@codeguilds-knightCommunity

Autonomous orchestration framework for Claude Code with MemPalace-inspired memory (4-layer stack, 818-token wake-up), pa

0v2.14.0Compare

claude

Skillcatalyst

@codeguilds-knightCommunity

High-performance Rust hooks for Claude Code skill auto-activation. ~2ms startup, zero dependencies, production-tested pa

0v1.0.0Compare

claude

Skillpaper-writing-skill

@codeguilds-knightCommunity

A Claude Code skill that encodes battle-tested editorial principles, section-specific rhetorical moves, and a structured

0v2.0Compare

claude

Skillsix-sigma-in-r-skill

@codeguilds-knightCommunity

AI skill for Claude Code and Codex that helps agents write correct R for Six Sigma and SPC work, including control chart

0v1.0.0Compare

claude

MCP Serverpromptspeak-mcp-server

@codeguilds-knightCommunity

Pre-execution governance for AI agents. Sub-millisecond tool call validation, drift detection, circuit breakers, human-i

0v1.0.0Compare

claudecursorwindsurfcline

MCP Serverquokkapix-mcp

@codeguilds-knightCommunity

Private browser image workflows for AI agents via MCP

0v0.3.6Compare

claudecursorwindsurfcline

MCP Serverpresence

@codeguilds-knightCommunity

Per-repo memory, outcome telemetry, and a calibrated-confidence gate for Claude Code, with MCP and AGENTS.md projections

0v0.7.1Compare

claudecursorwindsurfcline

MCP Serverskill-backtesting-arena

@codeguilds-knightCommunity

AI agent skill package for Backtesting Arena — daily Bitcoin/crypto cycle scoring + on-demand backtests via Public, REST

0v1.2.0Compare

claudecursorwindsurfcline

MCP Serverflutter-skill

@codeguilds-knightCommunity

AI-powered E2E testing for 10 platforms. 253 MCP tools. Zero config. Works with Claude, Cursor, Windsurf, Copilot. Test

0v0.9.34Compare

claudecursorwindsurfcline

Skilleval-layer

@erezweinstein5

A Claude Code skill that adds a rubric-based eval layer to any agent project. Framework-agnostic — generates rubric, tes

0v1.0.0Compare

claude