A community-driven registry for Claude, Cursor, Windsurf, Cline & more. Not affiliated with Anthropic.
Are you the author? Sign in to claim
Claude Code skill for benchmark research. Survey papers to find datasets, metrics, and evaluation protocols used in a re
benchmark-research-skill is a Claude Code style skill for benchmark research.
It is built for two jobs:
The design is deliberately simple:
Given an arXiv ID or PDF, the skill will:
benchmarks.jsonGiven a topic, the skill will:
benchmark-research-skill/
SKILL.md
config.example.yaml
requirements.txt
scripts/
config_utils.py
fetch_context.py
search_papers.py
collect_links.py
generate_report.py
Only four scripts are part of the public workflow:
fetch_context.pysearch_papers.pycollect_links.pygenerate_report.pyconfig_utils.py is only an internal helper for config-based path resolution.
pip install -r requirements.txt
Copy config.example.yaml to config.yaml, then set a real vault_path.
language: "zh"
vault_path: "E:/Your/Obsidian/Vault"
benchmark_survey:
workspace_dir: "Benchmark_Surveys"
papers_dir_name: "papers"
reports_dir_name: "reports"
survey_runs_dir_name: "surveys"
enable_collect_links: false
These are the fields that matter most:
| Field | Used For | Important? |
|---|---|---|
vault_path | root save location | yes |
workspace_dir | benchmark workspace root | yes |
papers_dir_name | single-paper evidence folder | yes |
reports_dir_name | final Markdown reports | yes |
survey_runs_dir_name | topic survey runs | yes |
enable_collect_links | whether to run network link search by default | yes |
language | output preference / future template behavior | optional |
These are only used by search_papers.py, mostly for direction survey mode.
research_domains:
"your-topic":
keywords:
- "main phrase"
- "synonym"
- "related task phrase"
arxiv_categories:
- "cs.AI"
- "cs.LG"
excluded_keywords:
- "workshop"
- "survey"
- "review"
If you mostly use single-paper analysis, this section is not important.
For paper 2503.10522, the config above resolves to:
Path/to/Your/Obsidian/Vault/
Benchmark_Surveys/
papers/
2503.10522/
context.json
benchmarks.json
links.json
assets/
reports/
2503.10522.md
For topic text-to-audio, the config above resolves to:
Path/to/Your/Obsidian/Vault/
Benchmark_Surveys/
surveys/
YYYY-MM-DD-text-to-audio/
papers.json
links.json
reports/
benchmark-survey.md
You normally use this as a Claude Code skill in natural language.
Ask Claude Code:
Use benchmark-research-skill to analyze what benchmarks this paper uses: 2503.10522
Or:
Use benchmark-research-skill to read this PDF and tell me the datasets, metrics, baselines, and experiment figures.
Ask Claude Code:
Use benchmark-research-skill to survey benchmarks for text-to-audio generation.
Or:
Use benchmark-research-skill to find what benchmarks this direction uses, follow related work up to two layers, and produce a Markdown report.
# Benchmark Analysis: [Paper Title]
## Conclusion
## Benchmark Table
| Dataset | Link | Evaluated Ability | Metrics | Compared Methods | Evidence |
|---|---|---|---|---|---|
## Datasets And Metrics
## Related Work / Baselines
# Benchmark Survey: [Direction]
## TL;DR
## Basic Metrics
## Benchmark Overview
| Dataset | Link | Suitable Task | Common Metrics | Representative Work |
|---|---|---|---|---|
## Representative Works
## Grouped By Benchmark
## Experiment Figures/Tables
## Related Work Discovery Path
1000+ skills curated from Anthropic, Vercel, Stripe, and other engineering teams
A Claude Code skill by Hao (駱君昊) that learns your Facebook voice and auto-posts to FB / IG / Threads / X with a 14-day c
Claude Code skill for YouTube creators — channel audits, video SEO, retention scripts, thumbnails, content strategy, Sho
AI image generation skill for Claude Code -- Creative Director powered by Gemini