Results for “benchmark”

175 packages found

Agent×

Agentawesome-gpt-5.6-usecases

@codeguilds-knightCommunity

Source-backed GPT-5.6 use cases for coding, agents, creative work, integrations, benchmarks, and practical limits.

0v1.0.0Compare

claude

AgentICLR2025-Papers-with-Code

@codeguilds-knightCommunity

历年ICLR论文和开源项目合集，包含ICLR2021、ICLR2022、ICLR2023、ICLR2024、ICLR2025.

0v1.0.0Compare

claude

Agentawesome-agent-evolution

@codeguilds-knightCommunity

Open survey and evidence map for AI agent evolution, self-evolving agents, memory, skills, harnesses, benchmarks, and ag

0v1.0.0Compare

claude

AgentAwesome-LLMs-ICLR-24

@codeguilds-knightCommunity

It is a comprehensive resource hub compiling all LLM papers accepted at the International Conference on Learning Represe

0v1.0.0Compare

claude

AgentLLM-Agents-Papers

@codeguilds-knightCommunity

A repo lists papers related to LLM based agent

0v1.0.0Compare

claude

AgentAwesome-OpenClaw-Papers

@codeguilds-knightCommunity

Official companion repository for our survey "A Survey of the OpenClaw Ecosystem: From Platform Extensibility to Constra

0v1.0.0Compare

claude

Agentclaude-flows

@codeguilds-knightCommunity

🌊 The leading agent orchestration platform for Claude. Deploy intelligent multi-agent swarms, coordinate autonomous wor

0v1.0.0Compare

claude

AgentAwesome-AI-For-Security

@codeguilds-knightCommunity

A curated list of tools, papers, and datasets for applying AI to cybersecurity tasks. This list primarily focuses on mod

0v1.0.0Compare

claude

AgentDecryptPrompt

@codeguilds-knightCommunity

总结Prompt&LLM论文，开源数据&模型，AIGC应用

0v1.0.0Compare

claude

AgentAwesome-GUI-Agent

@codeguilds-knightCommunity

💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.

0v1.0.0Compare

claude

AgentDM-Code-Agent

@codeguilds-knightCommunity

Lightweight, auditable Python code agent (~1500 LOC) — ReAct + Planner + Reflexion + Hybrid RAG, with SWE-bench Lite e

0v1.0.0Compare

claude

Agentlumen

@codeguilds-knightCommunity

Save 30% token costs when using Claude Code, Codex, OpenCode for free - with open source, local semantic search. Works f

0v0.0.41Compare

claude

Agentawesome-generative-ai

@codeguilds-knightCommunity

A curated list of Generative AI tools, works, models, and references

0v1.0.0Compare

claude

AgentWindowsAgentArena

@microsoft✓ Official

Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.

0v0.0.4Compare

claude

Agentawesome-ai-tokenomics

@codeguilds-knightCommunity

A map of what AI tokens actually cost, and where they're wasted vs. well spent. Tools, research, practices, and copy-pas

0v1.0.0Compare

claude

Agentchinese-llm-benchmark

@codeguilds-knightCommunity

非线智能 NoneLinear - ReLE评测：中文AI大模型能力评测（持续更新）：目前已囊括374个大模型，覆盖chatgpt、gpt-5.4、谷歌gemini-3.1-pro、Claude-4.6、文心ERNIE-X1.1、ERNIE

0v5.10Compare

claude

AgentAwesome-Graphs-Meet-Agents

@codeguilds-knightCommunity

[Up-to-date] A curated list of resources on graph-empowered agents and agent-facilitated graph learning (Graphs Meet Age

0v1.0.0Compare

claude

Agentlocal-deep-research

@codeguilds-knightCommunity

~95% on SimpleQA (e.g. Qwen3.6-27B on a 3090). Supports all local and cloud LLMs (llama.cpp, Ollama, Google, ...). 10+

0v1.7.0Compare

claude

AgentGTA

@codeguilds-knightCommunity

[NeurIPS 2024 D&B] GTA: A Benchmark for General Tool Agents & [arXiv 2026] GTA-2

0v0.2.0Compare

claude

AgentDeep-Research-Survey

@codeguilds-knightCommunity

A Systematic Survey of Deep Research

0v1.0.0Compare

claude

AgentHealthFlow

@codeguilds-knightCommunity

HealthFlow: Automating electronic health record analysis via a strategically self-evolving multi-agent framework

0vdatasetsCompare

claude

Agentchrome-cdp-ex

@codeguilds-knightCommunity

Give your AI agent eyes and hands on your real Chrome browser — your tabs, your logins, your page state. 42 commands, ze

0v2.14.0Compare

claude

AgentSDYJ_Multi_Agents

@codeguilds-knightCommunity

A LangGraph-powered multi-agent deep research system featuring task planning, human-in-the-loop review, multi-source ret

0v1.0.0Compare

claude

Agentllm-srbench

@codeguilds-knightCommunity

[ICML2025 Oral] LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models

0v1.0.0Compare

claude

AgentLLMCompiler

@codeguilds-knightCommunity

[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling

0v1.0.0Compare

claude

Agentlazarus

@codeguilds-knightCommunity

An agent that takes a dead research repo and turns it into a callable pipeline component.

0v0.3.0Compare

claude

Agentai-agents-reality-check

@codeguilds-knightCommunity

Benchmarking the gap between AI agent hype and architecture. Three agent archetypes, 73-point performance spread, stress

0v1.0.0Compare

claude

Agentmassw

@codeguilds-knightCommunity

MASSW is a comprehensive text dataset on Multi-Aspect Summarization of Scientific Workflows. MASSW includes more than 15

0v1.0.0Compare

claude

AgentAwesome-LLM-in-Social-Science

@codeguilds-knightCommunity

Awesome papers involving LLMs in Social Science.

0v1.0.0Compare

claude

AgentWebCanvas

@codeguilds-knightCommunity

All-in-one Web Agent framework for post-training. Start building with a few clicks!

0v1.0.0Compare

claude

AgentOpen-AgentRL

@codeguilds-knightCommunity

RLAnything (ICML 2026) & AutoTool (ICML 2026), DemyAgent: Open-Source RL for LLMs and Agentic Scenarios

0v1.0.0Compare

claude

AgentAwesome-LLM-Papers-Comprehensive-Topics

@codeguilds-knightCommunity

Awesome LLM Papers and repos on very comprehensive topics.

0vreadabilityCompare

claude

Agentclaude-code-eco

@codeguilds-knightCommunity

Eco mode for Claude Code. /eco: -31% to -73% output tokens with critical findings intact; /eco-max: up to -75% with lowe

0v1.1.4Compare

claude

Agentawesome-claude-fable-5-prompt-vault

@codeguilds-knightCommunity

Ultimate Claude Fable 5 Guide 2026: Use Cases, Integrations & Benchmarks

0v1.0.0Compare

claude

Agentgovernor

@codeguilds-knightCommunity

Claude Code usage governor: compact professional output, context slimming, tool-output filtering, telemetry, and drift g

0v0.2.2Compare

claude

Agentgreppy

@codeguilds-knightCommunity

Local code navigation for coding agents: deterministic symbol graph, semantic search, compact briefings, and byte-exact

0v0.2.1Compare

claude

AgentxLAM

@codeguilds-knightCommunity

xLAM: A Family of Large Action Models to Empower AI Agent Systems

0v1.0.0Compare

claude

Agentclaude-leverage

@codeguilds-knightCommunity

Make any repo AI-first - write sustainable code from the start, or refactor a legacy codebase to prepare it for agent-dr

0v1.0.0Compare

claude

Agentai-agents-design-patterns

@codeguilds-knightCommunity

20 runnable LLM agent design patterns in Python, benchmarked, traced, offline-compatible. ReAct, multi-agent, constituti

0v1.0.0Compare

claude

Agentawesome-opus5-use-cases

@codeguilds-knightCommunity

Source-backed Claude Opus 5 use cases, workflows, model comparisons, benchmarks, integrations, costs, and limitations.

0v1.0.0Compare

claude

Agentvideogui

@codeguilds-knightCommunity

[NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos

0v1.0.0Compare

claude

Agentskill-receipts

@codeguilds-knightCommunity

Agent skills for Claude Code where every entry ships with receipts: accuracy-gated benchmarks vs baseline AND placebo. R

0v1.0.0Compare

claude

AgentGenericAgent

@codeguilds-knightCommunity

Self-evolving agent: grows skill tree from 3.3K-line seed, achieving full system control with 6x less token consumption

0v0.1.0Compare

claude

AgentPhysicianBench

@codeguilds-knightCommunity

The benchmark tasks and evaluation harness for "PhysicianBench: Evaluating LLM Agents in Real-World EHR Environments".

0v1.0.0Compare

claude

Agentagentsys

@codeguilds-knightCommunity

AI writes code. This automates everything else · 24 plugins · 49 agents · 44 skills · for Claude Code, OpenCode, Codex,

0v6.0.0Compare

claude

AgentLLM-SR

@codeguilds-knightCommunity

[ICLR 2025 Oral] This is the official repo for the paper "LLM-SR" on Scientific Equation Discovery and Symbolic Regressi

0v1.0.0Compare

claude

Agentpfi

@codeguilds-knightCommunity

PFI: Prompt Flow Integrity to Prevent Privilege Escalation in LLM Agents

0v1.0.0Compare

claude

AgentZengram

@codeguilds-knightCommunity

A Multi Agent Memory MCP That Connect Agents Across Systems and Machines

0v4.3.0Compare

claude

AgentAgent_Memory_Techniques

@codeguilds-knightCommunity

Agent memory for LLMs: 30 runnable Jupyter notebooks covering conversation buffers, vector stores, knowledge graphs, epi

0v1.0.0Compare

claude

AgentYunjue-Agent

@codeguilds-knightCommunity

Yunjue Agent: A Fully Reproducible, Zero-Start In-Situ Self-Evolving Agent System for Open-Ended Tasks

0v1.0.0Compare

claude