🧪 Astra QA - AI Test Suite Generator

AI-powered test case generator — upload a requirements document, get structured, prioritized, multi-step test cases.

Supports three backends: Ollama (free, local), Claude API (Anthropic direct), and AWS Bedrock (enterprise). Includes a Streamlit web UI and CLI.

What It Does

You give it a requirements document (BRD, PRD, user stories). It generates structured test cases with:

4-7 detailed test steps per test case (not just 1-line summaries)
Priority — P0 (must-pass), P1 (should-pass), P2 (nice-to-have)
Type — functional, boundary, negative, error handling, security, accessibility
Full traceability — every test maps back to requirement IDs from your document
Coverage report — flags requirements with no test cases

A 5-requirement login BRD generates 37 test cases. A 21-requirement rideshare BRD generates 50-80 test cases.

Quick Start

Fastest Path (AWS Bedrock)

hljs language-bash

pip install -r requirements.txt
aws configure    # one-time setup
python test_local.py --backend bedrock --model us.anthropic.claude-sonnet-4-20250514-v1:0

Free Path (Ollama)

hljs language-bash

ollama pull qwen3:8b
pip install -r requirements.txt
python test_local.py --backend ollama --model qwen3:8b

Web UI

hljs language-bash

pip install -r requirements.txt
python -m streamlit run app.py

Architecture

The tool uses a chunked multi-agent pipeline — 4 specialized AI agents that mirror how senior QA engineers think:

hljs language-sql

┌─────────────────────────────────────────────┐
│            INPUT: BRD Document               │
│         (.docx, .pdf, .md, .txt)             │
└──────────────────┬──────────────────────────┘
                   ▼
┌─────────────────────────────────────────────┐
│        Agent 1: Requirement Parser           │
│   Extracts all testable requirements         │
│   (called once for the full document)        │
└──────────────────┬──────────────────────────┘
                   ▼
┌─────────────────────────────────────────────┐
│            CHUNKING LAYER                    │
│   Splits N requirements into groups of 3     │
│   21 requirements → 7 chunks                 │
└──────────────────┬──────────────────────────┘
                   ▼
         ┌─────────┴─────────┐
         │  FOR EACH CHUNK:  │
         ▼                   ▼
┌─────────────────┐ ┌─────────────────┐
│ Agent 2: Test    │ │ Agent 3: Edge    │
│ Case Generator   │ │ Case Finder      │
│ Happy paths,     │ │ Boundary,        │
│ alternate flows  │ │ negative, error  │
│ 2-3 tests/req    │ │ 1-2 tests/req    │
└────────┬────────┘ └────────┬────────┘
         └─────────┬─────────┘
                   ▼
┌─────────────────────────────────────────────┐
│       Agent 4: Formatter & Validator         │
│   Dedup, validate coverage, assign priority  │
└──────────────────┬──────────────────────────┘
                   ▼
┌─────────────────────────────────────────────┐
│              OUTPUT FILES                    │
│     CSV  │  JSON  │  Markdown  │  JIRA CSV   │
└─────────────────────────────────────────────┘

Why Chunked?

A single LLM call with 20+ requirements produces ~12 shallow test cases (output truncated at token limit). Chunking splits requirements into groups of 3 and processes each separately. Result: every requirement gets full coverage with multi-step test cases.

See ARCHITECTURE.md for the full technical deep-dive.

Setup

Option A: AWS Bedrock (Recommended — fast, enterprise-grade)

Prerequisites: AWS account with Bedrock access, AWS CLI configured.

Step 1: Install AWS CLI

Windows: winget install Amazon.AWSCLI (reopen terminal after install)
macOS: brew install awscli
Linux: curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && unzip awscliv2.zip && sudo ./aws/install

Step 2: Configure AWS credentials

hljs language-bash

aws login
# Or: aws configure (with access key + secret key)

Step 3: Enable Anthropic models on Bedrock

Go to AWS Bedrock Model Catalog. For Anthropic models, first-time users need to submit a use case form. Fill it out and wait ~15 minutes for approval.

Step 4: Install and run

hljs language-bash

git clone https://github.com/aymuos19/astra-qa-ai-testgen.git
cd astra-qa-ai-testgen
pip install -r requirements.txt
pip install boto3 botocore "botocore[crt]"
python test_local.py --backend bedrock --model us.anthropic.claude-sonnet-4-20250514-v1:0

Available Bedrock models:

Model	Speed	Quality	Cost
`us.anthropic.claude-sonnet-4-20250514-v1:0`	Fast	Excellent	~$0.50-1.00/run
`us.anthropic.claude-haiku-4-5-20251001-v1:0`	Fastest	Good	~$0.10-0.20/run
`us.anthropic.claude-opus-4-20250514-v1:0`	Slower	Best	~$2.00-3.00/run

Note: Bedrock requires inference profile IDs (prefixed with us. or global.), not base model IDs. If you get "on-demand throughput isn't supported", add the us. prefix.

Option B: Claude API Direct (Simple, paid)

hljs language-bash

git clone https://github.com/aymuos19/astra-qa-ai-testgen.git
cd astra-qa-ai-testgen
pip install -r requirements.txt

Get an API key at console.anthropic.com ($5 free credits).

hljs language-bash

# macOS/Linux
export ANTHROPIC_API_KEY="sk-ant-your-key"

# Windows
set ANTHROPIC_API_KEY=sk-ant-your-key

python test_local.py --backend claude --model claude-sonnet-4-20250514

Option C: Ollama (Free, local, no API key)

Step 1: Install Ollama

Windows: Download from ollama.com/download
macOS: brew install ollama && brew services start ollama
Linux: curl -fsSL https://ollama.com/install.sh | sh

Step 2: Pull a model

hljs language-bash

ollama pull qwen3:8b

Step 3: Install and run

hljs language-bash

git clone https://github.com/aymuos19/astra-qa-ai-testgen.git
cd astra-qa-ai-testgen
pip install -r requirements.txt
python test_local.py --backend ollama --model qwen3:8b

Model	RAM	Quality	Speed
`qwen3:8b`	8 GB	Great	15-40 min
`gemma3:12b`	12 GB	Excellent	20-50 min
`mistral:7b`	6 GB	Good	10-30 min
`llava:7b`	8 GB	Good	For Figma image input

Note: Ollama processes chunks sequentially. For a 21-requirement BRD, expect 15-40 minutes. Bedrock/Claude finish in 2-3 minutes.

Running the Web UI

hljs language-bash

python -m streamlit run app.py

Opens at http://localhost:8501. Upload a document, configure the backend in the sidebar, click Generate.

Running the CLI

hljs language-bash

# Parse only — verify document extraction (instant, no LLM)
python test_local.py --parse-only --input docs/uber_rider.md

# Full pipeline with Bedrock
python test_local.py --backend bedrock --model us.anthropic.claude-sonnet-4-20250514-v1:0

# Full pipeline with Ollama
python test_local.py --backend ollama --model qwen3:8b

# Custom input document
python test_local.py --backend bedrock --input path/to/your_brd.docx

# Production CLI
python testgen.py --input docs/login_brd.md --format csv --backend bedrock

Output Formats

All files exported to output/ folder.

Format	File	Description
CSV	`test_cases.csv`	One row per step. Opens in Excel/Sheets.
JSON	`test_cases.json`	Full structured output with summary stats.
Markdown	`test_cases.md`	Readable with step tables per test case.
JIRA CSV	`test_cases_jira.csv`	Importable into JIRA test management.

Test Case Structure

hljs language-json

{
  "test_id": "TC-FUNC-001",
  "requirement_ids": ["FR-R-003"],
  "summary": "Rider selects UberX Share and confirms ride request",
  "type": "functional",
  "priority": "P0",
  "precondition": "1. Uber app open.\n2. Rider logged in.\n3. Destination entered.",
  "steps": [
    {
      "step_number": 1,
      "instruction": "Tap the 'UberX Share' card.",
      "expected_result": "Card highlighted with fare estimate."
    },
    {
      "step_number": 2,
      "instruction": "Tap 'Confirm UberX Share'.",
      "expected_result": "'Searching...' animation appears."
    }
  ],
  "platforms": ["iOS Rider App", "Android Rider App"],
  "tags": ["rider", "request"]
}

Configuration

Edit config.yaml to set defaults:

hljs language-yaml

backend: bedrock                    # bedrock, claude, or ollama
model: us.anthropic.claude-sonnet-4-20250514-v1:0
aws_region: us-east-1               # For Bedrock
temperature: 0.3
max_tokens: 8192                    # 8192 for Claude/Bedrock, 16384 for Ollama
examples_path: examples/
output_format: csv

CLI flags override config values.

Few-Shot Examples

Add your own historical test cases to examples/sample_test_cases.json to match your team's style. The repo ships with 5 detailed examples. More examples = better output quality.

Troubleshooting

Problem	Fix
Bedrock "on-demand throughput not supported"	Use inference profile ID: `us.anthropic.claude-...` not `anthropic.claude-...`
Bedrock "use case details not submitted"	Submit the Anthropic form in AWS Bedrock console. Wait 15 min.
Bedrock "model marked as Legacy"	Use a newer model: `us.anthropic.claude-sonnet-4-20250514-v1:0`
`ModuleNotFoundError: botocore`	Run: `pip install boto3 botocore "botocore[crt]"`
Ollama timeout	Parser needs 5-10 min for large BRDs. The 600s timeout should suffice.
Only 1 step per test	Update to latest version — prompts enforce 4-7 steps.
`streamlit` not recognized	Use: `python -m streamlit run app.py`
Unicode error saving JSON	Update `test_local.py` — add `encoding="utf-8"` to file open.

Scaling to Enterprise

This architecture is framework-agnostic and maps directly to enterprise orchestration:

This Project	LangGraph	Strands Agents	AWS Bedrock
Agent 1-4	Graph nodes	Agent instances	Bedrock agents
Chunking	Map-reduce pattern	Parallel agent spawn	Lambda fan-out
Few-shot examples	Vector store RAG	Knowledge Base	Bedrock Knowledge Base
Output	File export	Tool integration	S3 + API Gateway

The 4-agent pipeline, prompt engineering, and chunking strategy are the core IP. Frameworks handle deployment plumbing. See ARCHITECTURE.md for the full technical write-up.

Project Structure

hljs language-graphql

astra-qa-ai-testgen/
├── app.py                     # Streamlit web UI
├── testgen.py                 # Production CLI
├── test_local.py              # Interactive test runner
├── config.yaml                # Configuration
├── requirements.txt           # Dependencies
├── ARCHITECTURE.md            # Technical deep-dive
├── agents/
│   ├── base_agent.py          # 3 backends: Claude, Bedrock, Ollama
│   ├── orchestrator.py        # Chunked multi-agent pipeline
│   ├── requirement_parser.py  # Agent 1
│   ├── test_generator.py      # Agent 2
│   ├── edge_case_finder.py    # Agent 3
│   └── formatter.py           # Agent 4
├── prompts/                   # System prompts per agent
├── parsers/                   # .docx, .pdf, .md, image parsers
├── exporters/                 # CSV, JSON, Markdown, JIRA export
├── examples/
│   └── sample_test_cases.json # Few-shot examples
└── docs/                      # Sample BRDs
    ├── uber_share_ride_brd.docx
    └── login_brd.md

Cost and Performance

Backend	Cost/Run	Time (5 reqs)	Time (21 reqs)	Quality
Ollama	$0.00	5-15 min	15-40 min	Good
Bedrock Haiku	~$0.10	20 sec	1-2 min	Good
Bedrock Sonnet	~$0.50	30 sec	2-3 min	Excellent
Claude Sonnet	~$0.50	30 sec	2-3 min	Excellent

Roadmap

License

MIT — see LICENSE.

Built by Soumya Thekkinkattil Sathyan — Senior QA Engineer at Amazon leading AI-driven quality engineering.

🧪 Astra QA - AI Test Suite Generator

AI-powered test case generator — upload a requirements document, get structured, prioritized, multi-step test cases.

Supports three backends: Ollama (free, local), Claude API (Anthropic direct), and AWS Bedrock (enterprise). Includes a Streamlit web UI and CLI.

What It Does

You give it a requirements document (BRD, PRD, user stories). It generates structured test cases with:

4-7 detailed test steps per test case (not just 1-line summaries)
Priority — P0 (must-pass), P1 (should-pass), P2 (nice-to-have)
Type — functional, boundary, negative, error handling, security, accessibility
Full traceability — every test maps back to requirement IDs from your document
Coverage report — flags requirements with no test cases

A 5-requirement login BRD generates 37 test cases. A 21-requirement rideshare BRD generates 50-80 test cases.

Quick Start

Fastest Path (AWS Bedrock)

hljs language-bash

pip install -r requirements.txt
aws configure    # one-time setup
python test_local.py --backend bedrock --model us.anthropic.claude-sonnet-4-20250514-v1:0

Free Path (Ollama)

hljs language-bash

ollama pull qwen3:8b
pip install -r requirements.txt
python test_local.py --backend ollama --model qwen3:8b

Web UI

hljs language-bash

pip install -r requirements.txt
python -m streamlit run app.py

Architecture

The tool uses a chunked multi-agent pipeline — 4 specialized AI agents that mirror how senior QA engineers think:

hljs language-sql

┌─────────────────────────────────────────────┐
│            INPUT: BRD Document               │
│         (.docx, .pdf, .md, .txt)             │
└──────────────────┬──────────────────────────┘
                   ▼
┌─────────────────────────────────────────────┐
│        Agent 1: Requirement Parser           │
│   Extracts all testable requirements         │
│   (called once for the full document)        │
└──────────────────┬──────────────────────────┘
                   ▼
┌─────────────────────────────────────────────┐
│            CHUNKING LAYER                    │
│   Splits N requirements into groups of 3     │
│   21 requirements → 7 chunks                 │
└──────────────────┬──────────────────────────┘
                   ▼
         ┌─────────┴─────────┐
         │  FOR EACH CHUNK:  │
         ▼                   ▼
┌─────────────────┐ ┌─────────────────┐
│ Agent 2: Test    │ │ Agent 3: Edge    │
│ Case Generator   │ │ Case Finder      │
│ Happy paths,     │ │ Boundary,        │
│ alternate flows  │ │ negative, error  │
│ 2-3 tests/req    │ │ 1-2 tests/req    │
└────────┬────────┘ └────────┬────────┘
         └─────────┬─────────┘
                   ▼
┌─────────────────────────────────────────────┐
│       Agent 4: Formatter & Validator         │
│   Dedup, validate coverage, assign priority  │
└──────────────────┬──────────────────────────┘
                   ▼
┌─────────────────────────────────────────────┐
│              OUTPUT FILES                    │
│     CSV  │  JSON  │  Markdown  │  JIRA CSV   │
└─────────────────────────────────────────────┘

Why Chunked?

See ARCHITECTURE.md for the full technical deep-dive.

Setup

Option A: AWS Bedrock (Recommended — fast, enterprise-grade)

Prerequisites: AWS account with Bedrock access, AWS CLI configured.

Step 1: Install AWS CLI

Windows: winget install Amazon.AWSCLI (reopen terminal after install)
macOS: brew install awscli
Linux: curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && unzip awscliv2.zip && sudo ./aws/install

Step 2: Configure AWS credentials

hljs language-bash

aws login
# Or: aws configure (with access key + secret key)

Step 3: Enable Anthropic models on Bedrock

Go to AWS Bedrock Model Catalog. For Anthropic models, first-time users need to submit a use case form. Fill it out and wait ~15 minutes for approval.

Step 4: Install and run

hljs language-bash

git clone https://github.com/aymuos19/astra-qa-ai-testgen.git
cd astra-qa-ai-testgen
pip install -r requirements.txt
pip install boto3 botocore "botocore[crt]"
python test_local.py --backend bedrock --model us.anthropic.claude-sonnet-4-20250514-v1:0

Available Bedrock models:

Model	Speed	Quality	Cost
`us.anthropic.claude-sonnet-4-20250514-v1:0`	Fast	Excellent	~$0.50-1.00/run
`us.anthropic.claude-haiku-4-5-20251001-v1:0`	Fastest	Good	~$0.10-0.20/run
`us.anthropic.claude-opus-4-20250514-v1:0`	Slower	Best	~$2.00-3.00/run

Note: Bedrock requires inference profile IDs (prefixed with us. or global.), not base model IDs. If you get "on-demand throughput isn't supported", add the us. prefix.

Option B: Claude API Direct (Simple, paid)

hljs language-bash

git clone https://github.com/aymuos19/astra-qa-ai-testgen.git
cd astra-qa-ai-testgen
pip install -r requirements.txt

Get an API key at console.anthropic.com ($5 free credits).

hljs language-bash

# macOS/Linux
export ANTHROPIC_API_KEY="sk-ant-your-key"

# Windows
set ANTHROPIC_API_KEY=sk-ant-your-key

python test_local.py --backend claude --model claude-sonnet-4-20250514

Option C: Ollama (Free, local, no API key)

Step 1: Install Ollama

Windows: Download from ollama.com/download
macOS: brew install ollama && brew services start ollama
Linux: curl -fsSL https://ollama.com/install.sh | sh

Step 2: Pull a model

hljs language-bash

ollama pull qwen3:8b

Step 3: Install and run

hljs language-bash

git clone https://github.com/aymuos19/astra-qa-ai-testgen.git
cd astra-qa-ai-testgen
pip install -r requirements.txt
python test_local.py --backend ollama --model qwen3:8b

Model	RAM	Quality	Speed
`qwen3:8b`	8 GB	Great	15-40 min
`gemma3:12b`	12 GB	Excellent	20-50 min
`mistral:7b`	6 GB	Good	10-30 min
`llava:7b`	8 GB	Good	For Figma image input

Note: Ollama processes chunks sequentially. For a 21-requirement BRD, expect 15-40 minutes. Bedrock/Claude finish in 2-3 minutes.

Running the Web UI

hljs language-bash

python -m streamlit run app.py

Opens at http://localhost:8501. Upload a document, configure the backend in the sidebar, click Generate.

Running the CLI

hljs language-bash

# Parse only — verify document extraction (instant, no LLM)
python test_local.py --parse-only --input docs/uber_rider.md

# Full pipeline with Bedrock
python test_local.py --backend bedrock --model us.anthropic.claude-sonnet-4-20250514-v1:0

# Full pipeline with Ollama
python test_local.py --backend ollama --model qwen3:8b

# Custom input document
python test_local.py --backend bedrock --input path/to/your_brd.docx

# Production CLI
python testgen.py --input docs/login_brd.md --format csv --backend bedrock

Output Formats

All files exported to output/ folder.

Format	File	Description
CSV	`test_cases.csv`	One row per step. Opens in Excel/Sheets.
JSON	`test_cases.json`	Full structured output with summary stats.
Markdown	`test_cases.md`	Readable with step tables per test case.
JIRA CSV	`test_cases_jira.csv`	Importable into JIRA test management.

Test Case Structure

hljs language-json

{
  "test_id": "TC-FUNC-001",
  "requirement_ids": ["FR-R-003"],
  "summary": "Rider selects UberX Share and confirms ride request",
  "type": "functional",
  "priority": "P0",
  "precondition": "1. Uber app open.\n2. Rider logged in.\n3. Destination entered.",
  "steps": [
    {
      "step_number": 1,
      "instruction": "Tap the 'UberX Share' card.",
      "expected_result": "Card highlighted with fare estimate."
    },
    {
      "step_number": 2,
      "instruction": "Tap 'Confirm UberX Share'.",
      "expected_result": "'Searching...' animation appears."
    }
  ],
  "platforms": ["iOS Rider App", "Android Rider App"],
  "tags": ["rider", "request"]
}

Configuration

Edit config.yaml to set defaults:

hljs language-yaml

backend: bedrock                    # bedrock, claude, or ollama
model: us.anthropic.claude-sonnet-4-20250514-v1:0
aws_region: us-east-1               # For Bedrock
temperature: 0.3
max_tokens: 8192                    # 8192 for Claude/Bedrock, 16384 for Ollama
examples_path: examples/
output_format: csv

CLI flags override config values.

Few-Shot Examples

Add your own historical test cases to examples/sample_test_cases.json to match your team's style. The repo ships with 5 detailed examples. More examples = better output quality.

Troubleshooting

Problem	Fix
Bedrock "on-demand throughput not supported"	Use inference profile ID: `us.anthropic.claude-...` not `anthropic.claude-...`
Bedrock "use case details not submitted"	Submit the Anthropic form in AWS Bedrock console. Wait 15 min.
Bedrock "model marked as Legacy"	Use a newer model: `us.anthropic.claude-sonnet-4-20250514-v1:0`
`ModuleNotFoundError: botocore`	Run: `pip install boto3 botocore "botocore[crt]"`
Ollama timeout	Parser needs 5-10 min for large BRDs. The 600s timeout should suffice.
Only 1 step per test	Update to latest version — prompts enforce 4-7 steps.
`streamlit` not recognized	Use: `python -m streamlit run app.py`
Unicode error saving JSON	Update `test_local.py` — add `encoding="utf-8"` to file open.

Scaling to Enterprise

This architecture is framework-agnostic and maps directly to enterprise orchestration:

This Project	LangGraph	Strands Agents	AWS Bedrock
Agent 1-4	Graph nodes	Agent instances	Bedrock agents
Chunking	Map-reduce pattern	Parallel agent spawn	Lambda fan-out
Few-shot examples	Vector store RAG	Knowledge Base	Bedrock Knowledge Base
Output	File export	Tool integration	S3 + API Gateway

The 4-agent pipeline, prompt engineering, and chunking strategy are the core IP. Frameworks handle deployment plumbing. See ARCHITECTURE.md for the full technical write-up.

Project Structure

hljs language-graphql

astra-qa-ai-testgen/
├── app.py                     # Streamlit web UI
├── testgen.py                 # Production CLI
├── test_local.py              # Interactive test runner
├── config.yaml                # Configuration
├── requirements.txt           # Dependencies
├── ARCHITECTURE.md            # Technical deep-dive
├── agents/
│   ├── base_agent.py          # 3 backends: Claude, Bedrock, Ollama
│   ├── orchestrator.py        # Chunked multi-agent pipeline
│   ├── requirement_parser.py  # Agent 1
│   ├── test_generator.py      # Agent 2
│   ├── edge_case_finder.py    # Agent 3
│   └── formatter.py           # Agent 4
├── prompts/                   # System prompts per agent
├── parsers/                   # .docx, .pdf, .md, image parsers
├── exporters/                 # CSV, JSON, Markdown, JIRA export
├── examples/
│   └── sample_test_cases.json # Few-shot examples
└── docs/                      # Sample BRDs
    ├── uber_share_ride_brd.docx
    └── login_brd.md

Cost and Performance

Backend	Cost/Run	Time (5 reqs)	Time (21 reqs)	Quality
Ollama	$0.00	5-15 min	15-40 min	Good
Bedrock Haiku	~$0.10	20 sec	1-2 min	Good
Bedrock Sonnet	~$0.50	30 sec	2-3 min	Excellent
Claude Sonnet	~$0.50	30 sec	2-3 min	Excellent

Roadmap

License

MIT — see LICENSE.

Built by Soumya Thekkinkattil Sathyan — Senior QA Engineer at Amazon leading AI-driven quality engineering.

astra-qa-ai-testgen

🧪 Astra QA - AI Test Suite Generator

What It Does

Quick Start

Fastest Path (AWS Bedrock)

Free Path (Ollama)

Web UI

Architecture

Why Chunked?

Setup

Option A: AWS Bedrock (Recommended — fast, enterprise-grade)

Option B: Claude API Direct (Simple, paid)

Option C: Ollama (Free, local, no API key)

Running the Web UI

Running the CLI

Output Formats

Test Case Structure

Configuration

Few-Shot Examples

Troubleshooting

Scaling to Enterprise

Project Structure

Cost and Performance

Roadmap

License

Similar Packages

astra-qa-ai-testgen

🧪 Astra QA - AI Test Suite Generator

What It Does

Quick Start

Fastest Path (AWS Bedrock)

Free Path (Ollama)

Web UI

Architecture

Why Chunked?

Setup

Option A: AWS Bedrock (Recommended — fast, enterprise-grade)

Option B: Claude API Direct (Simple, paid)

Option C: Ollama (Free, local, no API key)

Running the Web UI

Running the CLI

Output Formats

Test Case Structure

Configuration

Few-Shot Examples

Troubleshooting

Scaling to Enterprise

Project Structure

Cost and Performance

Roadmap

License

Similar Packages