A community-driven registry for Claude, Cursor, Windsurf, Cline & more. Not affiliated with Anthropic.
Are you the author? Sign in to claim
AI-powered test case generator that converts BRD/PRD documents into structured, multi-step test cases using a 4-agent pi
AI-powered test case generator — upload a requirements document, get structured, prioritized, multi-step test cases.
Supports three backends: Ollama (free, local), Claude API (Anthropic direct), and AWS Bedrock (enterprise). Includes a Streamlit web UI and CLI.
You give it a requirements document (BRD, PRD, user stories). It generates structured test cases with:
A 5-requirement login BRD generates 37 test cases. A 21-requirement rideshare BRD generates 50-80 test cases.
pip install -r requirements.txt
aws configure # one-time setup
python test_local.py --backend bedrock --model us.anthropic.claude-sonnet-4-20250514-v1:0
ollama pull qwen3:8b
pip install -r requirements.txt
python test_local.py --backend ollama --model qwen3:8b
pip install -r requirements.txt
python -m streamlit run app.py
The tool uses a chunked multi-agent pipeline — 4 specialized AI agents that mirror how senior QA engineers think:
┌─────────────────────────────────────────────┐
│ INPUT: BRD Document │
│ (.docx, .pdf, .md, .txt) │
└──────────────────┬──────────────────────────┘
▼
┌─────────────────────────────────────────────┐
│ Agent 1: Requirement Parser │
│ Extracts all testable requirements │
│ (called once for the full document) │
└──────────────────┬──────────────────────────┘
▼
┌─────────────────────────────────────────────┐
│ CHUNKING LAYER │
│ Splits N requirements into groups of 3 │
│ 21 requirements → 7 chunks │
└──────────────────┬──────────────────────────┘
▼
┌─────────┴─────────┐
│ FOR EACH CHUNK: │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Agent 2: Test │ │ Agent 3: Edge │
│ Case Generator │ │ Case Finder │
│ Happy paths, │ │ Boundary, │
│ alternate flows │ │ negative, error │
│ 2-3 tests/req │ │ 1-2 tests/req │
└────────┬────────┘ └────────┬────────┘
└─────────┬─────────┘
▼
┌─────────────────────────────────────────────┐
│ Agent 4: Formatter & Validator │
│ Dedup, validate coverage, assign priority │
└──────────────────┬──────────────────────────┘
▼
┌─────────────────────────────────────────────┐
│ OUTPUT FILES │
│ CSV │ JSON │ Markdown │ JIRA CSV │
└─────────────────────────────────────────────┘
A single LLM call with 20+ requirements produces ~12 shallow test cases (output truncated at token limit). Chunking splits requirements into groups of 3 and processes each separately. Result: every requirement gets full coverage with multi-step test cases.
See ARCHITECTURE.md for the full technical deep-dive.
Prerequisites: AWS account with Bedrock access, AWS CLI configured.
Step 1: Install AWS CLI
winget install Amazon.AWSCLI (reopen terminal after install)brew install awsclicurl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && unzip awscliv2.zip && sudo ./aws/installStep 2: Configure AWS credentials
aws login
# Or: aws configure (with access key + secret key)
Step 3: Enable Anthropic models on Bedrock
Go to AWS Bedrock Model Catalog. For Anthropic models, first-time users need to submit a use case form. Fill it out and wait ~15 minutes for approval.
Step 4: Install and run
git clone https://github.com/aymuos19/astra-qa-ai-testgen.git
cd astra-qa-ai-testgen
pip install -r requirements.txt
pip install boto3 botocore "botocore[crt]"
python test_local.py --backend bedrock --model us.anthropic.claude-sonnet-4-20250514-v1:0
Available Bedrock models:
| Model | Speed | Quality | Cost |
|---|---|---|---|
us.anthropic.claude-sonnet-4-20250514-v1:0 | Fast | Excellent | ~$0.50-1.00/run |
us.anthropic.claude-haiku-4-5-20251001-v1:0 | Fastest | Good | ~$0.10-0.20/run |
us.anthropic.claude-opus-4-20250514-v1:0 | Slower | Best | ~$2.00-3.00/run |
Note: Bedrock requires inference profile IDs (prefixed with
us.orglobal.), not base model IDs. If you get "on-demand throughput isn't supported", add theus.prefix.
git clone https://github.com/aymuos19/astra-qa-ai-testgen.git
cd astra-qa-ai-testgen
pip install -r requirements.txt
Get an API key at console.anthropic.com ($5 free credits).
# macOS/Linux
export ANTHROPIC_API_KEY="sk-ant-your-key"
# Windows
set ANTHROPIC_API_KEY=sk-ant-your-key
python test_local.py --backend claude --model claude-sonnet-4-20250514
Step 1: Install Ollama
brew install ollama && brew services start ollamacurl -fsSL https://ollama.com/install.sh | shStep 2: Pull a model
ollama pull qwen3:8b
Step 3: Install and run
git clone https://github.com/aymuos19/astra-qa-ai-testgen.git
cd astra-qa-ai-testgen
pip install -r requirements.txt
python test_local.py --backend ollama --model qwen3:8b
| Model | RAM | Quality | Speed |
|---|---|---|---|
qwen3:8b | 8 GB | Great | 15-40 min |
gemma3:12b | 12 GB | Excellent | 20-50 min |
mistral:7b | 6 GB | Good | 10-30 min |
llava:7b | 8 GB | Good | For Figma image input |
Note: Ollama processes chunks sequentially. For a 21-requirement BRD, expect 15-40 minutes. Bedrock/Claude finish in 2-3 minutes.
python -m streamlit run app.py
Opens at http://localhost:8501. Upload a document, configure the backend in the sidebar, click Generate.
# Parse only — verify document extraction (instant, no LLM)
python test_local.py --parse-only --input docs/uber_rider.md
# Full pipeline with Bedrock
python test_local.py --backend bedrock --model us.anthropic.claude-sonnet-4-20250514-v1:0
# Full pipeline with Ollama
python test_local.py --backend ollama --model qwen3:8b
# Custom input document
python test_local.py --backend bedrock --input path/to/your_brd.docx
# Production CLI
python testgen.py --input docs/login_brd.md --format csv --backend bedrock
All files exported to output/ folder.
| Format | File | Description |
|---|---|---|
| CSV | test_cases.csv | One row per step. Opens in Excel/Sheets. |
| JSON | test_cases.json | Full structured output with summary stats. |
| Markdown | test_cases.md | Readable with step tables per test case. |
| JIRA CSV | test_cases_jira.csv | Importable into JIRA test management. |
{
"test_id": "TC-FUNC-001",
"requirement_ids": ["FR-R-003"],
"summary": "Rider selects UberX Share and confirms ride request",
"type": "functional",
"priority": "P0",
"precondition": "1. Uber app open.\n2. Rider logged in.\n3. Destination entered.",
"steps": [
{
"step_number": 1,
"instruction": "Tap the 'UberX Share' card.",
"expected_result": "Card highlighted with fare estimate."
},
{
"step_number": 2,
"instruction": "Tap 'Confirm UberX Share'.",
"expected_result": "'Searching...' animation appears."
}
],
"platforms": ["iOS Rider App", "Android Rider App"],
"tags": ["rider", "request"]
}
Edit config.yaml to set defaults:
backend: bedrock # bedrock, claude, or ollama
model: us.anthropic.claude-sonnet-4-20250514-v1:0
aws_region: us-east-1 # For Bedrock
temperature: 0.3
max_tokens: 8192 # 8192 for Claude/Bedrock, 16384 for Ollama
examples_path: examples/
output_format: csv
CLI flags override config values.
Add your own historical test cases to examples/sample_test_cases.json to match your team's style. The repo ships with 5 detailed examples. More examples = better output quality.
| Problem | Fix |
|---|---|
| Bedrock "on-demand throughput not supported" | Use inference profile ID: us.anthropic.claude-... not anthropic.claude-... |
| Bedrock "use case details not submitted" | Submit the Anthropic form in AWS Bedrock console. Wait 15 min. |
| Bedrock "model marked as Legacy" | Use a newer model: us.anthropic.claude-sonnet-4-20250514-v1:0 |
ModuleNotFoundError: botocore | Run: pip install boto3 botocore "botocore[crt]" |
| Ollama timeout | Parser needs 5-10 min for large BRDs. The 600s timeout should suffice. |
| Only 1 step per test | Update to latest version — prompts enforce 4-7 steps. |
streamlit not recognized | Use: python -m streamlit run app.py |
| Unicode error saving JSON | Update test_local.py — add encoding="utf-8" to file open. |
This architecture is framework-agnostic and maps directly to enterprise orchestration:
| This Project | LangGraph | Strands Agents | AWS Bedrock |
|---|---|---|---|
| Agent 1-4 | Graph nodes | Agent instances | Bedrock agents |
| Chunking | Map-reduce pattern | Parallel agent spawn | Lambda fan-out |
| Few-shot examples | Vector store RAG | Knowledge Base | Bedrock Knowledge Base |
| Output | File export | Tool integration | S3 + API Gateway |
The 4-agent pipeline, prompt engineering, and chunking strategy are the core IP. Frameworks handle deployment plumbing. See ARCHITECTURE.md for the full technical write-up.
astra-qa-ai-testgen/
├── app.py # Streamlit web UI
├── testgen.py # Production CLI
├── test_local.py # Interactive test runner
├── config.yaml # Configuration
├── requirements.txt # Dependencies
├── ARCHITECTURE.md # Technical deep-dive
├── agents/
│ ├── base_agent.py # 3 backends: Claude, Bedrock, Ollama
│ ├── orchestrator.py # Chunked multi-agent pipeline
│ ├── requirement_parser.py # Agent 1
│ ├── test_generator.py # Agent 2
│ ├── edge_case_finder.py # Agent 3
│ └── formatter.py # Agent 4
├── prompts/ # System prompts per agent
├── parsers/ # .docx, .pdf, .md, image parsers
├── exporters/ # CSV, JSON, Markdown, JIRA export
├── examples/
│ └── sample_test_cases.json # Few-shot examples
└── docs/ # Sample BRDs
├── uber_share_ride_brd.docx
└── login_brd.md
| Backend | Cost/Run | Time (5 reqs) | Time (21 reqs) | Quality |
|---|---|---|---|---|
| Ollama | $0.00 | 5-15 min | 15-40 min | Good |
| Bedrock Haiku | ~$0.10 | 20 sec | 1-2 min | Good |
| Bedrock Sonnet | ~$0.50 | 30 sec | 2-3 min | Excellent |
| Claude Sonnet | ~$0.50 | 30 sec | 2-3 min | Excellent |
MIT — see LICENSE.
Built by Soumya Thekkinkattil Sathyan — Senior QA Engineer at Amazon leading AI-driven quality engineering.
npx CLI installing 100+ agents, commands, hooks, and integrations in one command
干净、强大、属于你的 AI Agent 平台 --AI agents, without the clutter.
Native macOS app to monitor Claude AI usage limits and watch your coding sessions live
Pocket Flow: Codebase to Tutorial