A community-driven registry for Claude, Cursor, Windsurf, Cline & more. Not affiliated with Anthropic.
Are you the author? Sign in to claim
Agentic Theorem Prover for Rocq for Program Verification
Paper: FSE 2026
This repository contains the source code of AutoRocq, an agent prover in Rocq (formerly Coq) 8.18.0.
To discharge a formally stated theorem in Rocq, the agent runs in the following loop:
context = get_initial_context()
tools = ['plan', 'tactic', 'context_search']
while not coq.is_proof_complete():
action = llm.next_action(goal, context)
coq.apply(action)
context.update()
goal.update()
where the LLM interacts with the Rocq proof assistant (via CoqPyt) in real time to develop a proof.
AutoRocq now supports your favorite model through LiteLLM! See the supported model list and configuration.
AutoRocq-bench/ # Benchmark of verification theorems
dockerfile/ # Dockerfile of AutoRocq and comparison tools
eval/ # Directory for eval results
└── final/ # Final evluation results
proof-search/ # Directory of proof agent src
├── main.py # Entry point
├── agent/
│ ├── proof_controller.py # Main loop
│ ├── context_manager.py # LLM interaction and context management
│ ├── context_search.py # Local context search
│ ├── history_recorder.py # Manages proof histories
│ └── proof_tree.py # Manages proof tree
├── backend/ # Interface with CoqPyt
├── coqpyt/ # Interact with Coq
└── utils/ # Helper functions
scripts/ # Directory of scripts
├── analyze/ # Analysis scripts of final results
└── get_results.py # Parser of .json results
pip install -r requirement.txt
opam switch import deps.opam
Set up API key in the config or by running export OPENAI_API_KEY=.... If you prefer models from other providers, see here.
To prove examples/example.v with a minimal config, go to proof-search directory and run:
python3 -m main examples/example.v --config ./configs/minimal.json
If AutoRocq runs successfully, you will be able to see in the terminal
[INFO] [Main]: 🎉 Proof completed successfully!
and the proof script is saved in the same example.v file. You will also be able to find saved proof states and aggregated results at data/, which can be reused to prove other goals in the future.
For more configurations of the tool, check out the readme or run with --help for more options.
AutoRocq-bench is a corpus of Rocq/Coq proof obligations extracted from real C code with Frama-C curated as part of the evaluation of AutoRocq. The benchmark consists of 641 theorems generated from SV-COMP and 60 theorems from assertions in the Linux kernel. On average, running each theorem costs ~$0.5 with GPT-4.1. To test on this benchmark:
git submodule update --init --recursive
libautorocq by runningcd AutoRocq-bench/libautorocq; make
Configure library_paths in proof-search/configs/default_config.json to point to libautorocq.
Run the agent by pointing to the target .v file. The first run may take a few minutes to initialize the library.
For example, go to proof-search directory and run:
python3 -m main examples/main_assert_rte_signed_overflow.v --config ./configs/default_config.json
python3 scripts/analyze/draw_complexity.py \
./eval/final/complexity-svcomp.csv \
./eval/final/complexity-coqgym-sample.csv
python3 scripts/analyze/draw_results.py \
./eval/final/results-svcomp.csv ./eval/final/complexity-svcomp.csv \
./eval/final/results-coqgym.csv ./eval/final/complexity-coqgym-sample.csv
python3 scripts/analyze/plot_searches.py
Coming soon...
If you are interested in the work, consider joining the Discord server for the latest discussions/development of agentic program verification!
If you use our work for academic research, please cite our paper:
@article{autorocq,
title={Agentic Verification of Software Systems},
author={Tu, Haoxin and Zhao, Huan and Song, Yahui and Zafar, Mehtab and Meng, Ruijie and Roychoudhury, Abhik},
journal={Proceedings of the ACM on Software Engineering},
volume={1},
number={FSE},
year={2026},
publisher={ACM New York, NY, USA}
}
Pocket Flow: Codebase to Tutorial
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
干净、强大、属于你的 AI Agent 平台 --AI agents, without the clutter.
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.