AutoRocq: Agentic Theorem Prover for Verification

Paper: FSE 2026

This repository contains the source code of AutoRocq, an agent prover in Rocq (formerly Coq) 8.18.0.

To discharge a formally stated theorem in Rocq, the agent runs in the following loop:

hljs language-python

context = get_initial_context()
tools = ['plan', 'tactic', 'context_search']
while not coq.is_proof_complete():
    action = llm.next_action(goal, context)
    coq.apply(action)
    context.update()
    goal.update()

where the LLM interacts with the Rocq proof assistant (via CoqPyt) in real time to develop a proof.

What's New

LiteLLM support

AutoRocq now supports your favorite model through LiteLLM! See the supported model list and configuration.

Directory Structure

hljs language-php

AutoRocq-bench/                    # Benchmark of verification theorems

dockerfile/                        # Dockerfile of AutoRocq and comparison tools

eval/                              # Directory for eval results
└── final/                         # Final evluation results

proof-search/                      # Directory of proof agent src
├── main.py                        # Entry point
├── agent/                         
│   ├── proof_controller.py        # Main loop
│   ├── context_manager.py         # LLM interaction and context management
│   ├── context_search.py          # Local context search
│   ├── history_recorder.py        # Manages proof histories
│   └── proof_tree.py              # Manages proof tree
├── backend/                       # Interface with CoqPyt
├── coqpyt/                        # Interact with Coq
└── utils/                         # Helper functions

scripts/                           # Directory of scripts
├── analyze/                       # Analysis scripts of final results
└── get_results.py                 # Parser of .json results

Setup Instructions

Install dependencies in Python

hljs language-bash

pip install -r requirement.txt

Install dependencies in opam

hljs language-bash

opam switch import deps.opam

Minimal Example of Proof Agent

Set up API key in the config or by running export OPENAI_API_KEY=.... If you prefer models from other providers, see here.
To prove examples/example.v with a minimal config, go to proof-search directory and run:

hljs language-bash

python3 -m main examples/example.v --config ./configs/minimal.json

If AutoRocq runs successfully, you will be able to see in the terminal

hljs language-css

[INFO] [Main]: 🎉 Proof completed successfully!

and the proof script is saved in the same example.v file. You will also be able to find saved proof states and aggregated results at data/, which can be reused to prove other goals in the future.

For more configurations of the tool, check out the readme or run with --help for more options.

Proving Theorems from Verification Tasks

AutoRocq-bench is a corpus of Rocq/Coq proof obligations extracted from real C code with Frama-C curated as part of the evaluation of AutoRocq. The benchmark consists of 641 theorems generated from SV-COMP and 60 theorems from assertions in the Linux kernel. On average, running each theorem costs ~$0.5 with GPT-4.1. To test on this benchmark:

Clone the submodule with

hljs language-bash

git submodule update --init --recursive

Compile libautorocq by running

hljs language-bash

cd AutoRocq-bench/libautorocq; make

Configure library_paths in proof-search/configs/default_config.json to point to libautorocq.
Run the agent by pointing to the target .v file. The first run may take a few minutes to initialize the library.

For example, go to proof-search directory and run:

hljs language-bash

python3 -m main examples/main_assert_rte_signed_overflow.v --config ./configs/default_config.json

Replicating Results from Paper

Reproducing Figures

Figure 3

hljs language-bash

python3 scripts/analyze/draw_complexity.py \
  ./eval/final/complexity-svcomp.csv \
  ./eval/final/complexity-coqgym-sample.csv

Figure 4, 5, 6, and 7

hljs language-bash

python3 scripts/analyze/draw_results.py \
  ./eval/final/results-svcomp.csv ./eval/final/complexity-svcomp.csv \
  ./eval/final/results-coqgym.csv ./eval/final/complexity-coqgym-sample.csv

Figure 8

hljs language-bash

python3 scripts/analyze/plot_searches.py

Setting Up Comparison Tools

Coming soon...

Citation / Attribution

If you are interested in the work, consider joining the Discord server for the latest discussions/development of agentic program verification!

If you use our work for academic research, please cite our paper:

hljs language-ini

@article{autorocq,
  title={Agentic Verification of Software Systems},
  author={Tu, Haoxin and Zhao, Huan and Song, Yahui and Zafar, Mehtab and Meng, Ruijie and Roychoudhury, Abhik},
  journal={Proceedings of the ACM on Software Engineering},
  volume={1},
  number={FSE},
  year={2026},
  publisher={ACM New York, NY, USA}
}

AutoRocq: Agentic Theorem Prover for Verification

Paper: FSE 2026

This repository contains the source code of AutoRocq, an agent prover in Rocq (formerly Coq) 8.18.0.

To discharge a formally stated theorem in Rocq, the agent runs in the following loop:

hljs language-python

context = get_initial_context()
tools = ['plan', 'tactic', 'context_search']
while not coq.is_proof_complete():
    action = llm.next_action(goal, context)
    coq.apply(action)
    context.update()
    goal.update()

where the LLM interacts with the Rocq proof assistant (via CoqPyt) in real time to develop a proof.

What's New

LiteLLM support

AutoRocq now supports your favorite model through LiteLLM! See the supported model list and configuration.

Directory Structure

hljs language-php

AutoRocq-bench/                    # Benchmark of verification theorems

dockerfile/                        # Dockerfile of AutoRocq and comparison tools

eval/                              # Directory for eval results
└── final/                         # Final evluation results

proof-search/                      # Directory of proof agent src
├── main.py                        # Entry point
├── agent/                         
│   ├── proof_controller.py        # Main loop
│   ├── context_manager.py         # LLM interaction and context management
│   ├── context_search.py          # Local context search
│   ├── history_recorder.py        # Manages proof histories
│   └── proof_tree.py              # Manages proof tree
├── backend/                       # Interface with CoqPyt
├── coqpyt/                        # Interact with Coq
└── utils/                         # Helper functions

scripts/                           # Directory of scripts
├── analyze/                       # Analysis scripts of final results
└── get_results.py                 # Parser of .json results

Setup Instructions

Install dependencies in Python

hljs language-bash

pip install -r requirement.txt

Install dependencies in opam

hljs language-bash

opam switch import deps.opam

Minimal Example of Proof Agent

Set up API key in the config or by running export OPENAI_API_KEY=.... If you prefer models from other providers, see here.
To prove examples/example.v with a minimal config, go to proof-search directory and run:

hljs language-bash

python3 -m main examples/example.v --config ./configs/minimal.json

If AutoRocq runs successfully, you will be able to see in the terminal

hljs language-css

[INFO] [Main]: 🎉 Proof completed successfully!

and the proof script is saved in the same example.v file. You will also be able to find saved proof states and aggregated results at data/, which can be reused to prove other goals in the future.

For more configurations of the tool, check out the readme or run with --help for more options.

Proving Theorems from Verification Tasks

Clone the submodule with

hljs language-bash

git submodule update --init --recursive

Compile libautorocq by running

hljs language-bash

cd AutoRocq-bench/libautorocq; make

Configure library_paths in proof-search/configs/default_config.json to point to libautorocq.
Run the agent by pointing to the target .v file. The first run may take a few minutes to initialize the library.

For example, go to proof-search directory and run:

hljs language-bash

python3 -m main examples/main_assert_rte_signed_overflow.v --config ./configs/default_config.json

Replicating Results from Paper

Reproducing Figures

Figure 3

hljs language-bash

python3 scripts/analyze/draw_complexity.py \
  ./eval/final/complexity-svcomp.csv \
  ./eval/final/complexity-coqgym-sample.csv

Figure 4, 5, 6, and 7

hljs language-bash

python3 scripts/analyze/draw_results.py \
  ./eval/final/results-svcomp.csv ./eval/final/complexity-svcomp.csv \
  ./eval/final/results-coqgym.csv ./eval/final/complexity-coqgym-sample.csv

Figure 8

hljs language-bash

python3 scripts/analyze/plot_searches.py

Setting Up Comparison Tools

Coming soon...

Citation / Attribution

If you are interested in the work, consider joining the Discord server for the latest discussions/development of agentic program verification!

If you use our work for academic research, please cite our paper:

hljs language-ini

@article{autorocq,
  title={Agentic Verification of Software Systems},
  author={Tu, Haoxin and Zhao, Huan and Song, Yahui and Zafar, Mehtab and Meng, Ruijie and Roychoudhury, Abhik},
  journal={Proceedings of the ACM on Software Engineering},
  volume={1},
  number={FSE},
  year={2026},
  publisher={ACM New York, NY, USA}
}

AutoRocq

AutoRocq: Agentic Theorem Prover for Verification

What's New

LiteLLM support

Directory Structure

Setup Instructions

Minimal Example of Proof Agent

Proving Theorems from Verification Tasks

Replicating Results from Paper

Citation / Attribution

Similar Packages

AutoRocq

AutoRocq: Agentic Theorem Prover for Verification

What's New

LiteLLM support

Directory Structure

Setup Instructions

Minimal Example of Proof Agent

Proving Theorems from Verification Tasks

Replicating Results from Paper

Citation / Attribution

Similar Packages