Results for “dataset”

45 packages found

Works with: Claude×

AgentMIRAI

@codeguilds-knightCommunity

Code and Data for "MIRAI: Evaluating LLM Agents for Event Forecasting"

0v1.0.0Compare

claude

Agentllm-srbench

@codeguilds-knightCommunity

[ICML2025 Oral] LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models

0v1.0.0Compare

claude

AgentLLM-Agents-Papers

@codeguilds-knightCommunity

A repo lists papers related to LLM based agent

0v1.0.0Compare

claude

AgentAwesome-LLM-in-Social-Science

@codeguilds-knightCommunity

Awesome papers involving LLMs in Social Science.

0v1.0.0Compare

claude

AgentgroundingLMM

@codeguilds-knightCommunity

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural langu

0v1.0.0Compare

claude

Agenttimechara

@codeguilds-knightCommunity

🧙🏻 Code and benchmark for our Findings of ACL 2024 paper - "TimeChara: Evaluating Point-in-Time Character Hallucinatio

0v1.0.0Compare

claude

Agentawesome-generative-ai

@codeguilds-knightCommunity

A curated list of Generative AI tools, works, models, and references

0v1.0.0Compare

claude

AgentDecryptPrompt

@codeguilds-knightCommunity

总结Prompt&LLM论文，开源数据&模型，AIGC应用

0v1.0.0Compare

claude

AgentOpenRCA

@microsoft✓ Official

[ICLR'25] OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?

0v1.0.0Compare

claude

AgentxLAM

@codeguilds-knightCommunity

xLAM: A Family of Large Action Models to Empower AI Agent Systems

0v1.0.0Compare

claude

AgentAgentPoison

@codeguilds-knightCommunity

[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Po

0v1.0.0Compare

claude

AgentGTA

@codeguilds-knightCommunity

[NeurIPS 2024 D&B] GTA: A Benchmark for General Tool Agents & [arXiv 2026] GTA-2

0v0.2.0Compare

claude

AgentLLM-SR

@codeguilds-knightCommunity

[ICLR 2025 Oral] This is the official repo for the paper "LLM-SR" on Scientific Equation Discovery and Symbolic Regressi

0v1.0.0Compare

claude

AgentAgentBench

@codeguilds-knightCommunity

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

0v1.0.0Compare

claude

AgentLLMafia

@codeguilds-knightCommunity

Asynchronous LLM Agent playing games of Mafia against human players

0v1.0.0Compare

claude

AgentChatSim

@codeguilds-knightCommunity

[CVPR2024 Highlight] Editable Scene Simulation for Autonomous Driving via LLM-Agent Collaboration

0v1.0.0Compare

claude

AgentVisualAgentBench

@codeguilds-knightCommunity

Towards Large Multimodal Models as Visual Foundation Agents

0v1.0.0Compare

claude

AgentOdyssey

@codeguilds-knightCommunity

Odyssey: Empowering Minecraft Agents with Open-World Skills

0v1.0.0Compare

claude

Agentcode-act

@codeguilds-knightCommunity

Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan

0v1.0.0Compare

claude

Agentcodeinterpreter-api

@codeguilds-knightCommunity

👾 Open source implementation of the ChatGPT Code Interpreter

0v0.1.20Compare

claude

AgentARIA

@codeguilds-knightCommunity

Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".

0v1.0.0Compare

claude

AgentDeep-Research-Survey

@codeguilds-knightCommunity

A Systematic Survey of Deep Research

0v1.0.0Compare

claude

AgentAwesome-LLM-Papers-Comprehensive-Topics

@codeguilds-knightCommunity

Awesome LLM Papers and repos on very comprehensive topics.

0vreadabilityCompare

claude

Agentml-dev-bench

@codeguilds-knightCommunity

ML-Dev-Bench is a benchmark for evaluating AI agents against various ML development tasks.

0v0.1.0Compare

claude

MCP ServerHugging Face MCP Server

Community

Search and explore Hugging Face models, datasets, Spaces, and documentation

0v0.3.203 months agoCompare

claudecursorwindsurfcline

Agentphoenix

@codeguilds-knightCommunity

AI Observability & Evaluation

0varize-phoenix-v17.6.0Compare

claude

Agentawesome-ai-tools

@codeguilds-knightCommunity

🔴 VERY LARGE AI TOOL LIST! 🔴 Curated list of AI Tools - Updated 2026

0v1.0.0Compare

claude

AgentAReaL

@codeguilds-knightCommunity

The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.

0v1.0.4Compare

claude

AgentVideoGLaMM

@codeguilds-knightCommunity

[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

0v1.0.0Compare

claude

AgentAwesome-Graphs-Meet-Agents

@codeguilds-knightCommunity

[Up-to-date] A curated list of resources on graph-empowered agents and agent-facilitated graph learning (Graphs Meet Age

0v1.0.0Compare

claude

AgentYunjue-Agent

@codeguilds-knightCommunity

Yunjue Agent: A Fully Reproducible, Zero-Start In-Situ Self-Evolving Agent System for Open-Ended Tasks

0v1.0.0Compare

claude

AgentscAgent

@codeguilds-knightCommunity

scAgent: No-code single-cell analysis for every biologist

0v1.0.0Compare

claude

Agentsotopia

@codeguilds-knightCommunity

Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)

0v0.1.5Compare

claude

Agentcactus

@codeguilds-knightCommunity

LLM Agent that leverages cheminformatics tools to provide informed responses.

0v1.0.0Compare

claude

AgentMedAgents

@codeguilds-knightCommunity

[ACL 2024 Findings] MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning https://arxiv.org/

0v1.0.0Compare

claude

AgentCodeGym

@codeguilds-knightCommunity

[ICLR2026] The official repository for the CodeGym project: "Generalizable End-to-End Tool-Use RL with Synthetic CodeGym

0v1.0.0Compare

claude

Agentdistrl-open

@codeguilds-knightCommunity

DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents

0v1.0.0Compare

claude

Agentchinese-llm-benchmark

@codeguilds-knightCommunity

非线智能 NoneLinear - ReLE评测：中文AI大模型能力评测（持续更新）：目前已囊括374个大模型，覆盖chatgpt、gpt-5.4、谷歌gemini-3.1-pro、Claude-4.6、文心ERNIE-X1.1、ERNIE

0v5.10Compare

claude

AgentPersonalGPT

@codeguilds-knightCommunity

Your own GPT-powered Personal Assistant to whom you can ORDER or INSTRUCT to do some task or search for something using

0v1.0.0Compare

claude

AgentDataParasite

@codeguilds-knightCommunity

A simple yet versatile context engineered for scalable online data collection

0v1.0.0Compare

claude

AgentAIlice

@codeguilds-knightCommunity

AIlice is a fully autonomous, general-purpose AI agent.

0v0.2.0-alphaCompare

claude

AgentMR-Video

@codeguilds-knightCommunity

MR. Video: MapReduce is the Principle for Long Video Understanding

0v1.0.0Compare

claude

AgentRepairAgent

@codeguilds-knightCommunity

RepairAgent is an autonomous LLM-based agent for software repair.

0v1.0.0Compare

claude

Agentalbert-code

@codeguilds-knightCommunity

Bundle agentic coding souverain : OpenCode + agent-vm + Albert API + skills État + MCP

0v1.0.0Compare

claude

Agentllm-rl-environments-lil-course

@codeguilds-knightCommunity

🌱 A little course on Reinforcement Learning Environments for evaluating and training Language Models

0v1.0.0Compare

claude