A community-driven registry for the Claude Code ecosystem. Not affiliated with Anthropic.
Are you the author? Sign in to claim
Pre-submission AI review stress-test for research papers. A Claude Code skill: review, verdict, revise, verify.
English · 中文
A pre-submission AI review stress-test for research papers.
Before a reviewer tears it apart, let a jury do it first.
PaperJury turns paper feedback into a closed loop: review → verdict → revise → verify. Instead of taking every AI suggestion at face value, it sorts each issue into one of three outcomes:
It offers three modes: direct-edit, review, and auto. PaperJury is built for pre-submission self-checking. It does not replace peer review, it does not invent missing experiments, and it keeps research-level decisions with the author.
Interactive overview: the live site (GitHub Pages), or docs/overview.html in-repo.
🚀 2026-06-05: PaperJury's Codex-first port has shipped. Open it here: paperjury-codex.
🧪 Dogfood sample added: this repo now includes a compact dogfood sample with before/after PDFs and a human-verified run report.
PaperJury is a pre-submission self-check workflow. It does not replace the author's scientific judgment, and it does not replace peer review. It should never be used to invent experiments, fabricate results, add unsupported claims, or hide a paper's limitations.
When an issue needs a new experiment, missing evidence, private knowledge, or a research-level decision, PaperJury routes it to the author instead of patching it automatically. The Fixable / Author-required / Invalid outcomes exist precisely so that judgment calls stay with you.
The intended use is to surface avoidable problems earlier, while you can still act on them: unclear claims, weak logical connections, unsupported wording, formatting risks, and the kind of reviewer-style concerns worth checking before submission.
It is a Claude Code skill, installable two ways. For the Codex-first port, use paperjury-codex.
Option A: Claude Code plugin (one command). From inside Claude Code:
/plugin marketplace add u7079256/paperjury
/plugin install paperjury@u7079256
Option B: clone as a skill. Clone the repo into the folder Claude Code reads skills from:
# macOS / Linux
git clone https://github.com/u7079256/paperjury ~/.claude/skills/paperjury
# Windows (PowerShell)
git clone https://github.com/u7079256/paperjury "$env:USERPROFILE\.claude\skills\paperjury"
(or under <project>/.claude/skills/ to scope it to one project). Claude Code auto-discovers it through SKILL.md and it shows up as the paperjury skill. node is required (the deterministic checks run on it); a LaTeX toolchain is optional (the real-compile and layout checks use it, and degrade honestly when it is absent).
For Claude / coding agents: the deep "how to drive this" reference is docs/AGENT-GUIDE.md: install, the three modes and their triggers, the engine pipeline, the auto vs /goal distinction, and how the fan-out launches, written for an agent to read. Curious about the internals? Just point Claude at that file and ask.
Most writing tools only push your paper forward: they draft and they polish. None of them argues the other side of your claims the way a reviewer will. PaperJury is built around that gap, in four parts.
references/review-engine-v3.md).full (whole paper) or passage (one section / paragraph / claim)./goal (or config mode: auto) to run the review-revise loop unattended toward a verifiable goal./goal context or a project config mode: auto.spine and the reviewer assignment up front (the human steps), then the engine applies safe fixes under the bounded-aggressive + edit-safety policy, queues the rest, and runs multiple rounds until it stops: on clerk convergence, or an applied-quiescence / hard-limit backstop. See references/auto-mode.md.You don't run commands; you say what you want and the skill picks the mode.
Edit one thing (the everyday case → direct-edit):
<your idea>."Get the paper critiqued before submission (→ review):
<the claim you paste>."Harden it unattended toward a goal (→ auto, needs /goal):
/goal "harden the paper until ledger.js gate passes (0 gate-blocking major)"/goal driver: turning on "auto" tool-permission and sending a normal prompt runs one round and stops, it does not loop (see docs/AGENT-GUIDE.md §3).Make sure it won't get desk-rejected:
Rule of thumb: one change → just say it; want it picked apart → say "review"; want it run unattended → /goal.
The courtroom engine is assign-reviewers → reading-check → coverage-auditor → merge → {trial ‖ polish} → recall-audit → drafter → {edit-audit | meaning-audit} → clerk. Generation is bounded (N holistic domain reviewers, not a per-(unit × lens) flood); adjudication is routed by contestability; edits are guarded by risk; the multi-round loop converges via a deterministic clerk. The deterministic guards in scripts/ run orchestrator-side via Bash between workflow calls.
decompose: split manuscript into reading units, the canonical section list, and stable passage-ids (which prevent text drift and give jurors local context).spine (auto only): extract anchors, author confirm, freeze → spine.json.ledger.js: JSON ledger plus MD view; gate = /goal completion fact (0 gate-blocking active major; author-required is gate-OK and accumulates to the human queue). CLI: init/add/set/count/gate/get/docket/unadjudicated/render.journal.js: append-only per-edit revert log (JSONL).apply-patch.js: atomic apply plus journal of a drafted patch, and revert (exact-once guard on before text).anchor-diff.js: locate frozen anchors; flag which need_audit when the support region changed.cross-ref.js: edit-safety risk pre-filter: does a changed salient token in a patch appear in other passages?compile-guard.js: real LaTeX compile (latexmk/pdflatex) or a degraded structural-lint path with compiled:null (it reports when it cannot verify).compliance-check.js: submission-readiness A: deterministic desk-reject screening.assign-reviewers: name N subfields, instantiate N domain reviewers from the project gatekeeper core + a generated domain overlay; config-pin / verifier / per-slot degrade headless.reading-check: N holistic reviewers each read the WHOLE paper once → weaknesses (significance + kind + verbatim quote; a reviewer that cannot quote the source did not read it) + one overall_confidence + a per-section coverage report; targeted re-invoke mode for anti-skim.coverage-auditor: anti-skim L2: flag skimmed (reviewer, section) pairs across the coverage reports.merge: semantic dedup across reviewers; the workflow derives significance (MAX) / kind (substantive-dominates) / corroboration deterministically.trial: a 5-juror trial tier: whole-paper defense → independent local-context jury (with on-demand context expansion) → a deterministic majority verdict (quorum reached, one side >60%) + a judge that routes a decided-valid charge (valid-fixable vs author-required); escalate to a 12-juror tier only on no clear majority.polish: the track that skips the jury: batch copy-edit (mechanical) + batch light-check (minor-substantive); can escalate a misrouted major back to trial.recall-audit: Mode A revives wrongly-dropped charges (bias to revive); Mode B spot-checks strong-consensus majors before the edit (guards against the whole panel agreeing on the same mistake).drafter: minimal-edit patch for valid-fixable charges.edit-audit / meaning-audit: the edit-safety semantic half: edit-audit checks a risky non-anchor edit (make-sense + cross-section alignment); meaning-audit is the four-state frozen-anchor + arc audit.clerk: the round boundary: reconcile carried open-questions against this round's edits, dedup re-raises via a deterministic passage_id + similarity merge key, and emit the deterministic convergence counts.Also present: review-panel.workflow.js: a quick simple 3-lens panel (fast path).
Skill (entry point + methodology): the protocol, the reviewer assignment, the consensus gate, the writing toolkit, the human gates. Detail in references/review-engine-v3.md, references/reviewer-personas.md, references/writing-toolkit.md.
Workflow (fan-out engine): the semantic, no-human-in-the-middle steps run as Workflows (parallelism plus schema-validated output by construction). Simple panel = workflows/review-panel.workflow.js; the courtroom engine = assign-reviewers → reading-check → coverage-auditor → merge → {trial ‖ polish} → recall-audit → drafter → {edit-audit | meaning-audit} → clerk. The deterministic guards run orchestrator-side via Bash because the Workflow sandbox has no fs: scripts/ holds decompose, ledger, journal, apply-patch, anchor-diff, cross-ref, spine, compile-guard, compliance-check.
Memory (durable state + learned conventions), two layers:
LEDGER.json resolved at runtime = the machine source of truth, plus a rendered LEDGER.md view; managed by scripts/ledger.js. The live, mutable issue state across rounds and sessions. Schema plus status state machine: references/ledger-schema.md.The panel is N domain-expert HOLISTIC reviewers (default 3, range 2-4), assigned at runtime to the paper's subfields, all sharing a senior-reviewer gatekeeper core (harsh, precise, constructive; separate fatal flaws from fixable nits; reason across sections). When a reviewer slot cannot be confirmed (headless, unverifiable), that slot degrades to a generic gatekeeper (one bad slot never degrades the whole panel); the generic fallback lenses are:
(These are an unordered tendency, not fixed slots; reviewer IDs R1..RN are positional, assigned by subfield order.)
The writing toolkit names (prompt bodies not shown here): translate-to-english, polish-english, de-ai, compress, expand, caption, experiment-analysis, logic-check.
spine + reviewer-assignment confirmation plus the pre-authorized bounded-aggressive policy) plus the return queue, not per-edit sign-off.ledger. Enforced by (a) what goes into each agent's prompt AND (b) an explicit ISOLATION instruction in every reviewer-type prompt.close_criterion (one concrete sentence describing what an edit must satisfy), set by the judge.compile-guard.js is explicit about what it cannot verify: when it cannot truly compile, it degrades to structural lint and reports compiled:null.compliance-check.js plus a semantic agent; B = a compile-driven layout loop reusing compile-guard.js plus Read-on-PDF.Your project files, ledger, journal, and patches stay inside your local paper project. PaperJury has no backend or server of its own, so nothing is sent to a PaperJury server. The review runs through your own Claude Code session, which means the model itself runs in the cloud: how your content is handled there follows the terms and settings of that Claude Code environment, not anything PaperJury adds on top.
Where this is going (planned, not yet shipped):
.cls / template.references/review-engine-v3.mdreferences/auto-mode.mdreferences/reviewer-personas.md, references/writing-toolkit.md, references/methodology.mdreferences/ledger-schema.mdreferences/submission-compliance.mddocs/REVIEW_ENGINE_V3_DESIGN.mdscripts/ (decompose, ledger, journal, apply-patch, anchor-diff, cross-ref, spine, compile-guard, compliance-check)workflows/ (assign-reviewers, reading-check, coverage-auditor, merge, trial, polish, recall-audit, drafter, edit-audit, meaning-audit, clerk, review-panel)The spine and anti-drift design (the anchor logic-transfer audit, the claim register, and the minimal-edit, intent-preserving revision policy) is inspired by PaperSpine, a motivation-driven paper drafting and rewriting skill. PaperSpine is a forward generate/rewrite tool with no adversarial loop; PaperJury borrows its anchoring idea and its "deterministic scripts for checkable steps, model agents for judgment" mechanism, then adds the adversarial courtroom review engine on top.
1000+ skills curated from Anthropic, Vercel, Stripe, and other engineering teams
Design enforcement with memory — keeps your UI consistent across a project
Universal SEO skill for Claude Code. 25 sub-skills + 18 sub-agents covering technical SEO, E-E-A-T, schema, GEO/AEO, bac
Route Claude Code traffic to any of 17 provider backends including free or local models