ConcoLLMic: Agentic Concolic Execution

Paper: IEEE S&P 2026

ConcoLLMic is the first language- and theory-agnostic concolic executor powered by LLM agents. Unlike traditional symbolic execution tools that require language-specific implementations and struggle with constraint solving, ConcoLLMic:

Works with any programming language and environment interactions — supports C, C++, Python, Java, ..., and even multi-language systems; requires no additional environment modeling and handles them efficiently.
Handles diverse constraint theories — including floating-point arithmetic, strings, structured data, and bit-level operations.
Achieves superior coverage — in ~4 hours, ConcoLLMic covers 115%-233% more branches than state-of-the-art tools like KLEE and 81% more than AFL++ running for 48 hours.
Finds real bugs — has already discovered 10+ previously unknown vulnerabilities in real-world softwares.

How it works: ConcoLLMic uses LLM agents to

instrument code at source level with logging statements like fprintf(stderr, ...);
summarize path constraints flexibly in natural language, code, or SMT formulas;
solve constraints intelligently — directly reasoning constraints over semantics and delegating to grounded solvers when needed.

This eliminates the need for hand-crafted language-specific symbolic interpreters and environment modeling as well as expensive constraint solving.

Workflow Example
Getting Started
Command Reference
Running Large-Scale Experiments
License
Citation

Workflow Example

Phase 1: Instrumentation

Instrumentation of ConcoLLMic

Example instrumentation of src/count.c from FP-Bench, by inserting fprintf at key points in the execution. Output in stderr allows us to reconstruct the dynamic execution path.

Phase 2: Concolic Testing

Testing Phase of ConcoLLMic

This shows one round of concolic execution with ConcoLLMic. Compared to conventional tools with verbose, implementation-level formulas:

hljs language-css

start = atof(I[0]) ∧ end = atof(I[1]) ∧ start < end ∧ cur[0] = start ∧ 
(∀ 0 ≤ i < cnt, cur[i] ≠ end) ∧ cur[cnt] = end ∧ cnt ≤ 20

ConcoLLMic summarizes the path constraints concisely using high-level natural-language:

hljs language-sql

The number of representable FP values in the range must be ≤ 20.

Getting Started

Prerequisites

Python 3.10 or higher
API key for Claude Sonnet (from Anthropic)

Installation

Clone the repository

hljs language-bash

git clone https://github.com/ConcoLLMic/ConcoLLMic.git
cd ConcoLLMic

Install Python dependencies

hljs language-bash

pip install -r requirements.txt
pip install -r requirements-dev.txt # optional

Set up your API key

hljs language-bash

export ANTHROPIC_API_KEY="your_api_key_here"

By default, ConcoLLMic uses Claude 4.5 Sonnet. You can configure other models in ACE.py (see setup_model()).

Running Your First Example

ConcoLLMic's workflow consists of two main steps: instrumentation & compilation and concolic execution. Below are two examples to get started:

Example 1: Bracket Matching Validation

A simple bracket matching validation program that checks if brackets are balanced.

Step 1: Instrumentation & Compilation

Instrumented code is already available in code_example/instr/validate_brackets.cpp.

To instrument from scratch:

hljs language-bash

rm ./code_example/instr/validate_brackets.cpp

python3 ACE.py instrument \
    --src_dir ./code_example/src/ \
    --out_dir ./code_example/instr/ \
    --instr_languages c,cpp,python,java

Note: The --instr_languages flag is only used for filtering files by extension. No additional language-specific support is needed.

Compile the instrumented code normally:

hljs language-bash

g++ -o ./code_example/instr/validate_brackets ./code_example/instr/validate_brackets.cpp

Step 2: Concolic Execution

Run the concolic execution agent with the provided test harness code_example/harness/validate_brackets.py, which is a Python function that prepares program input and executes the program:

hljs language-bash

python3 ACE.py run \
    --project_dir ./code_example/instr/ \
    --execution ./code_example/harness/validate_brackets.py \
    --out ./out/ \
    --rounds 2 \
    --parallel_num 3

Command Parameters Explained:

--project_dir: Directory containing the instrumented code and the compiled program
--execution: Test harness (a Python script that executes the target program and returns stderr + exit code)
--out: Output directory for generated test cases and logs
--rounds: Number of concolic execution rounds (default: unlimited until coverage plateau)
--parallel_num: Maximum number of concurrent test case generations per round

Tip: To observe the agent's step-by-step workflow more clearly (e.g., for demos or debugging), set --parallel_num 1 so that each phase runs sequentially.

Output:

./out/ConcoLLMic_*.log — detailed execution log
./out/queue/id:*.yaml — generated test cases with metadata

Step 3: View Statistics (Optional)

Analyze testing statistics (tokens used, costs):

hljs language-bash

python3 ACE.py run_data --out_dir ./out/

Step 4: Replay & Measure Coverage (Optional)

hljs language-bash

# Compile with coverage
cd ./code_example/src/
g++ --coverage -o validate_brackets validate_brackets.cpp
lcov -z -d .
cd ../../

# Replay test cases
python3 ACE.py replay \
    ./out/ \
    ./code_example/src/ \
    ./coverage.csv \
    --cov_script ./code_example/src/coverage.sh

Output:

./coverage.csv — time-series coverage data for each test case
./code_example/src/validate_brackets.cpp.gcov — line-by-line coverage report

Example 2: Floating-Point Counting

A program from FP-Bench that counts representable floating-point values in a range.

Step 1: Instrumentation & Compilation

Instrumented code is already available in code_example/instr/count.c.

To instrument from scratch:

hljs language-bash

rm ./code_example/instr/count.c

python3 ACE.py instrument \
    --src_dir ./code_example/src/ \
    --out_dir ./code_example/instr/ \
    --instr_languages c,cpp,python,java

Note: The --instr_languages flag is only used for filtering files by extension. No additional language-specific support is needed.

Compile the instrumented code normally:

hljs language-bash

gcc -o ./code_example/instr/count ./code_example/instr/count.c

Step 2: Concolic Execution

Run the concolic execution agent with the provided test harness code_example/harness/count.py, which is a Python function that prepares program input and executes the program:

hljs language-bash

python3 ACE.py run \
    --project_dir ./code_example/instr/ \
    --execution ./code_example/harness/count.py \
    --out ./out/ \
    --rounds 2 \
    --parallel_num 3

Command Parameters Explained:

--project_dir: Directory containing the instrumented code and the compiled program
--execution: Test harness (a Python script that executes the target program and returns stderr + exit code)
--out: Output directory for generated test cases and logs
--rounds: Number of concolic execution rounds (default: unlimited until coverage plateau)
--parallel_num: Maximum number of concurrent test case generations per round

Tip: To observe the agent's step-by-step workflow more clearly (e.g., for demos or debugging), set --parallel_num 1 so that each phase runs sequentially.

Output:

./out/ConcoLLMic_*.log — detailed execution log
./out/queue/id:*.yaml — generated test cases with metadata

Expected cost: ~$0.40 for 2 rounds with Claude-3.7-Sonnet

Step 3: View Statistics (Optional)

Analyze testing statistics (tokens used, costs):

hljs language-bash

python3 ACE.py run_data --out_dir ./out/

Step 4: Replay & Measure Coverage (Optional)

hljs language-bash

# Compile with coverage
cd ./code_example/src/
gcc --coverage -o count count.c
lcov -z -d .
cd ../../

# Replay test cases
python3 ACE.py replay \
    ./out/ \
    ./code_example/src/ \
    ./coverage.csv \
    --cov_script ./code_example/src/coverage.sh

Output:

./coverage.csv — time-series coverage data for each test case
./code_example/src/count.c.gcov — line-by-line coverage report

The gcov report will show that the bug at line 67 has been triggered.

Command Reference

ConcoLLMic provides a unified CLI interface:

hljs language-bash

python3 ACE.py <command> [options]

Main Commands

Command	Description
`instrument`	Instrument source code for tracing
`run`	Execute concolic testing
`replay`	Replay generated test cases
`instrument_data`	Analyze instrumentation statistics
`run_data`	Display testing statistics and costs

For detailed options, run:

hljs language-bash

python3 ACE.py <command> --help

Or visit the documentation.

Running Large-Scale Experiments

To reproduce the paper's evaluation results or run ConcoLLMic on real-world benchmarks, see the experiments/README.md.

We provide Docker-based setups for:

C/C++ benchmarks
Multi-language systems
Comparison with other tools — KLEE, KLEE-Pending, AFL++, SymCC, SymSan

License

This project is dual licensed:

Non-commercial use: Licensed under GPLv3 for academic and non-commercial purposes
Commercial use: Please contact the authors for commercial licensing

For full license details, see LICENSE.

For commercial licensing inquiries, contact:

Abhik Roychoudhury (abhik@nus.edu.sg)
Cristian Cadar (c.cadar@imperial.ac.uk)
Jonathan Tan (jonathan_tan@nus.edu.sg)

Citation

hljs language-bibtex

@inproceedings{luo2026concollmic,
  title={Agentic Concolic Execution},
  author={Luo, Zhengxiong and Zhao, Huan and Wolff, Dylan and Cadar, Cristian and Roychoudhury, Abhik},
  booktitle={2026 IEEE Symposium on Security and Privacy (S\&P)},
  pages={1--19},
  year={2026},
  organization={IEEE}
}

ConcoLLMic: Agentic Concolic Execution

Paper: IEEE S&P 2026

Works with any programming language and environment interactions — supports C, C++, Python, Java, ..., and even multi-language systems; requires no additional environment modeling and handles them efficiently.
Handles diverse constraint theories — including floating-point arithmetic, strings, structured data, and bit-level operations.
Achieves superior coverage — in ~4 hours, ConcoLLMic covers 115%-233% more branches than state-of-the-art tools like KLEE and 81% more than AFL++ running for 48 hours.
Finds real bugs — has already discovered 10+ previously unknown vulnerabilities in real-world softwares.

How it works: ConcoLLMic uses LLM agents to

instrument code at source level with logging statements like fprintf(stderr, ...);
summarize path constraints flexibly in natural language, code, or SMT formulas;
solve constraints intelligently — directly reasoning constraints over semantics and delegating to grounded solvers when needed.

This eliminates the need for hand-crafted language-specific symbolic interpreters and environment modeling as well as expensive constraint solving.

Workflow Example
Getting Started
Command Reference
Running Large-Scale Experiments
License
Citation

Workflow Example

Phase 1: Instrumentation

Instrumentation of ConcoLLMic

Example instrumentation of src/count.c from FP-Bench, by inserting fprintf at key points in the execution. Output in stderr allows us to reconstruct the dynamic execution path.

Phase 2: Concolic Testing

Testing Phase of ConcoLLMic

This shows one round of concolic execution with ConcoLLMic. Compared to conventional tools with verbose, implementation-level formulas:

hljs language-css

start = atof(I[0]) ∧ end = atof(I[1]) ∧ start < end ∧ cur[0] = start ∧ 
(∀ 0 ≤ i < cnt, cur[i] ≠ end) ∧ cur[cnt] = end ∧ cnt ≤ 20

ConcoLLMic summarizes the path constraints concisely using high-level natural-language:

hljs language-sql

The number of representable FP values in the range must be ≤ 20.

Getting Started

Prerequisites

Python 3.10 or higher
API key for Claude Sonnet (from Anthropic)

Installation

Clone the repository

hljs language-bash

git clone https://github.com/ConcoLLMic/ConcoLLMic.git
cd ConcoLLMic

Install Python dependencies

hljs language-bash

pip install -r requirements.txt
pip install -r requirements-dev.txt # optional

Set up your API key

hljs language-bash

export ANTHROPIC_API_KEY="your_api_key_here"

By default, ConcoLLMic uses Claude 4.5 Sonnet. You can configure other models in ACE.py (see setup_model()).

Running Your First Example

ConcoLLMic's workflow consists of two main steps: instrumentation & compilation and concolic execution. Below are two examples to get started:

Example 1: Bracket Matching Validation

A simple bracket matching validation program that checks if brackets are balanced.

Step 1: Instrumentation & Compilation

Instrumented code is already available in code_example/instr/validate_brackets.cpp.

To instrument from scratch:

hljs language-bash

rm ./code_example/instr/validate_brackets.cpp

python3 ACE.py instrument \
    --src_dir ./code_example/src/ \
    --out_dir ./code_example/instr/ \
    --instr_languages c,cpp,python,java

Note: The --instr_languages flag is only used for filtering files by extension. No additional language-specific support is needed.

Compile the instrumented code normally:

hljs language-bash

g++ -o ./code_example/instr/validate_brackets ./code_example/instr/validate_brackets.cpp

Step 2: Concolic Execution

Run the concolic execution agent with the provided test harness code_example/harness/validate_brackets.py, which is a Python function that prepares program input and executes the program:

hljs language-bash

python3 ACE.py run \
    --project_dir ./code_example/instr/ \
    --execution ./code_example/harness/validate_brackets.py \
    --out ./out/ \
    --rounds 2 \
    --parallel_num 3

Command Parameters Explained:

--project_dir: Directory containing the instrumented code and the compiled program
--execution: Test harness (a Python script that executes the target program and returns stderr + exit code)
--out: Output directory for generated test cases and logs
--rounds: Number of concolic execution rounds (default: unlimited until coverage plateau)
--parallel_num: Maximum number of concurrent test case generations per round

Tip: To observe the agent's step-by-step workflow more clearly (e.g., for demos or debugging), set --parallel_num 1 so that each phase runs sequentially.

Output:

./out/ConcoLLMic_*.log — detailed execution log
./out/queue/id:*.yaml — generated test cases with metadata

Step 3: View Statistics (Optional)

Analyze testing statistics (tokens used, costs):

hljs language-bash

python3 ACE.py run_data --out_dir ./out/

Step 4: Replay & Measure Coverage (Optional)

hljs language-bash

# Compile with coverage
cd ./code_example/src/
g++ --coverage -o validate_brackets validate_brackets.cpp
lcov -z -d .
cd ../../

# Replay test cases
python3 ACE.py replay \
    ./out/ \
    ./code_example/src/ \
    ./coverage.csv \
    --cov_script ./code_example/src/coverage.sh

Output:

./coverage.csv — time-series coverage data for each test case
./code_example/src/validate_brackets.cpp.gcov — line-by-line coverage report

Example 2: Floating-Point Counting

A program from FP-Bench that counts representable floating-point values in a range.

Step 1: Instrumentation & Compilation

Instrumented code is already available in code_example/instr/count.c.

To instrument from scratch:

hljs language-bash

rm ./code_example/instr/count.c

python3 ACE.py instrument \
    --src_dir ./code_example/src/ \
    --out_dir ./code_example/instr/ \
    --instr_languages c,cpp,python,java

Note: The --instr_languages flag is only used for filtering files by extension. No additional language-specific support is needed.

Compile the instrumented code normally:

hljs language-bash

gcc -o ./code_example/instr/count ./code_example/instr/count.c

Step 2: Concolic Execution

Run the concolic execution agent with the provided test harness code_example/harness/count.py, which is a Python function that prepares program input and executes the program:

hljs language-bash

python3 ACE.py run \
    --project_dir ./code_example/instr/ \
    --execution ./code_example/harness/count.py \
    --out ./out/ \
    --rounds 2 \
    --parallel_num 3

Command Parameters Explained:

--project_dir: Directory containing the instrumented code and the compiled program
--execution: Test harness (a Python script that executes the target program and returns stderr + exit code)
--out: Output directory for generated test cases and logs
--rounds: Number of concolic execution rounds (default: unlimited until coverage plateau)
--parallel_num: Maximum number of concurrent test case generations per round

Tip: To observe the agent's step-by-step workflow more clearly (e.g., for demos or debugging), set --parallel_num 1 so that each phase runs sequentially.

Output:

./out/ConcoLLMic_*.log — detailed execution log
./out/queue/id:*.yaml — generated test cases with metadata

Expected cost: ~$0.40 for 2 rounds with Claude-3.7-Sonnet

Step 3: View Statistics (Optional)

Analyze testing statistics (tokens used, costs):

hljs language-bash

python3 ACE.py run_data --out_dir ./out/

Step 4: Replay & Measure Coverage (Optional)

hljs language-bash

# Compile with coverage
cd ./code_example/src/
gcc --coverage -o count count.c
lcov -z -d .
cd ../../

# Replay test cases
python3 ACE.py replay \
    ./out/ \
    ./code_example/src/ \
    ./coverage.csv \
    --cov_script ./code_example/src/coverage.sh

Output:

./coverage.csv — time-series coverage data for each test case
./code_example/src/count.c.gcov — line-by-line coverage report

The gcov report will show that the bug at line 67 has been triggered.

Command Reference

ConcoLLMic provides a unified CLI interface:

hljs language-bash

python3 ACE.py <command> [options]

Main Commands

Command	Description
`instrument`	Instrument source code for tracing
`run`	Execute concolic testing
`replay`	Replay generated test cases
`instrument_data`	Analyze instrumentation statistics
`run_data`	Display testing statistics and costs

For detailed options, run:

hljs language-bash

python3 ACE.py <command> --help

Or visit the documentation.

Running Large-Scale Experiments

To reproduce the paper's evaluation results or run ConcoLLMic on real-world benchmarks, see the experiments/README.md.

We provide Docker-based setups for:

C/C++ benchmarks
Multi-language systems
Comparison with other tools — KLEE, KLEE-Pending, AFL++, SymCC, SymSan

License

This project is dual licensed:

Non-commercial use: Licensed under GPLv3 for academic and non-commercial purposes
Commercial use: Please contact the authors for commercial licensing

For full license details, see LICENSE.

For commercial licensing inquiries, contact:

Abhik Roychoudhury (abhik@nus.edu.sg)
Cristian Cadar (c.cadar@imperial.ac.uk)
Jonathan Tan (jonathan_tan@nus.edu.sg)

Citation

hljs language-bibtex

@inproceedings{luo2026concollmic,
  title={Agentic Concolic Execution},
  author={Luo, Zhengxiong and Zhao, Huan and Wolff, Dylan and Cadar, Cristian and Roychoudhury, Abhik},
  booktitle={2026 IEEE Symposium on Security and Privacy (S\&P)},
  pages={1--19},
  year={2026},
  organization={IEEE}
}

ConcoLLMic

ConcoLLMic: Agentic Concolic Execution

Table of Contents

Workflow Example

Phase 1: Instrumentation

Phase 2: Concolic Testing

Getting Started

Prerequisites

Installation

Running Your First Example

Command Reference

Main Commands

Running Large-Scale Experiments

License

Citation

Similar Packages

ConcoLLMic

ConcoLLMic: Agentic Concolic Execution

Table of Contents

Workflow Example

Phase 1: Instrumentation

Phase 2: Concolic Testing

Getting Started

Prerequisites

Installation

Running Your First Example

Command Reference

Main Commands

Running Large-Scale Experiments

License

Citation

Similar Packages