Results for “benchmark”

10 packages found

@microsoft✓ Official

Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.

0v0.0.4Compare

claude

@anthropics✓ Official

Benchmark for evaluating LLM agents on smart-contract vulnerability discovery and exploitation

0v1.0.0Compare

claude

@semgrep✓ Official

Jagged Frontier: LLM vulnerability detection benchmark harnesses (API + Claude Code agentic)

0v1.0.0Compare

claudecursorwindsurfcline

@anthropics✓ Official

An implementation of the ConnectRPC protocol for Rust

0v0.7.0Compare

claude

@anthropics✓ Official

Rust implementation of protobuf with editions support, JSON serialization, and zero-copy views

0v0.7.1Compare

claude

@microsoft✓ Official

The Power BI Modeling MCP Server, brings Power BI semantic modeling capabilities to your AI agents.

0v1.0.0Compare

claudecursorwindsurfcline