Global Capacity Orchestrator (GCO)

One API. Every Accelerator. Any Region.

Multi-region accelerated-compute orchestration for AWS — NVIDIA GPUs, AWS Trainium, AWS Inferentia, and CPU (amd64 + arm64 / Graviton) — with capacity-aware scheduling, spot fallback, and multi-region autoscaling inference endpoints with automatic failover and latency-aware routing, all from a single REST API and CLI.

🎬 Live demo recording

GCO Live Demo

gco CLI demo: capacity discovery, cost visibility, 5 schedulers (Volcano, Kueue, YuniKorn, Slurm, KEDA), FSx, Valkey, live LLM inference, and EFS — all against one already-deployed cluster. (source · re-record)

📦 Deploy recording

GCO Deploy

Fresh gco stacks deploy-all -y from a clean account (re-record)

🗑️ Destroy recording

GCO Destroy

Full teardown with gco stacks destroy-all -y (re-record)

What it does. Spins up EKS Auto Mode clusters across AWS regions, wired together with Global Accelerator for latency-aware anycast routing and automatic failover. Submit Kubernetes manifests via a single REST API or CLI — GCO handles capacity-aware scheduling, spot fallback, multi-region autoscaling inference endpoints, and output persistence.

Who it's for. Teams running accelerated workloads — LLM training and inference, batch ML, HPC, and general CPU jobs — that need multi-region redundancy, automatic capacity discovery, and IAM-based access without per-cluster kubeconfig distribution. Pre-wired nodepools for NVIDIA GPUs (g4dn, g5, and ARM64 g5g), AWS Trainium, AWS Inferentia, and general-purpose CPU on both amd64 and arm64 / Graviton.

Why it's different. Capacity-aware routing across regions out of the box, full-stack observability (CloudWatch dashboards, alarms, SNS), and a CDK app validated across 20+ config matrix combinations in CI.

Deploy everything and tear it all down with one command each:

hljs language-bash

gco stacks deploy-all -y      # stand up every region defined in cdk.json
gco stacks destroy-all -y     # destroy every stack across every region — no orphaned resources

Recommended: run everything from the dev container. GCO pins exact versions of a lot of Python packages (CDK, AWS SDKs, FastAPI, mypy, Ruff, etc.), and installing them on top of an existing Python environment is the most common source of "it doesn't install" reports. The dev container ships a fully resolved environment (Python 3.14, Node.js 24, CDK, kubectl, AWS CLI, all Python deps) so you skip the whole problem.

hljs language-bash

git clone git@github.com:awslabs/global-capacity-orchestrator-on-aws.git
cd global-capacity-orchestrator-on-aws

docker build -f Dockerfile.dev -t gco-dev .
docker run -it --rm \
  -v ~/.aws:/root/.aws:ro \
  -v $(pwd):/workspace \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -w /workspace \
  gco-dev

The docker.sock mount lets gco stacks deploy-all bundle Lambda assets through your host Docker daemon. See Prerequisites for Colima/Finch socket paths and the security note about host-socket pass-through.

Prefer to install on your host? (advanced — the dev container is recommended)

Host installs are the advanced, non-recommended path. GCO pins exact versions of many Python packages, so installing on top of an existing Python environment frequently fails with dependency-resolver errors (ResolutionImpossible). The dev container shown above is the recommended path — it ships every dependency at the pinned versions — and the Quick Start Guide walks through it end to end. If you still want a host install, use a clean virtual environment or pipx.

hljs language-bash

git clone git@github.com:awslabs/global-capacity-orchestrator-on-aws.git
cd global-capacity-orchestrator-on-aws && pipx install -e .

See the Quick Start for the full install + first-job walkthrough, or docs/CLI.md for every CLI command.

💡 New to the codebase? GCO ships with the GCO MCP server — an MCP server exposing 95 tools by default (up to 127 with feature flags) that index the whole project: docs, examples, source code, K8s manifests, and scripts. Connect it to an AI-powered IDE with MCP support (like Kiro) and explore GCO conversationally — ask questions about the codebase instead of reading repository files directly: "How does region recommendation work?", "Walk me through the inference deployment flow". See mcp/README.md.

Table of contents

Why GCO?
Quick Start
Architecture Overview
Key Features
Documentation
Project Structure
Contributing
License
Support
Security

Why GCO?

Running GPU workloads at scale is hard. You need to find regions with available capacity, provision clusters, handle authentication, deal with failover, and persist outputs after pods terminate. GCO solves all of this with a single deployable platform.

Challenge	Traditional Approach	With GCO
GPU availability	Manually check each region	Auto-routes to available capacity
Node provisioning	Pre-provision or wait for scaling	EKS Auto Mode provisions on-demand
Multi-region ops	Manage clusters separately	Single API, automatic routing
Authentication	Configure per-cluster access	IAM-based, uses existing AWS credentials
Job outputs	Lost when pods terminate	Persisted to EFS/FSx storage
Inference serving	Deploy and manage per-region	Deploy once, serve globally
Failover	Manual intervention required	Automatic via Global Accelerator

When to use GCO:

You need to run GPU workloads (training, inference, batch processing)
You want to deploy inference endpoints across multiple regions with a single command
You want multi-region redundancy without managing multiple clusters
You prefer IAM authentication over kubeconfig management
You need job outputs to persist after completion

Quick Start

Install and Deploy

The fastest, most reliable path is the dev container — it sidesteps the dependency-conflict issues that come with installing GCO's pinned Python packages on top of your existing Python environment.

Build the dev container (Python, Node.js, CDK, kubectl, and the AWS CLI are all pinned and pre-installed), then drop into a shell with the gco CLI already on the path:

hljs language-bash

docker build -f Dockerfile.dev -t gco-dev .
docker run -it --rm \
  -v ~/.aws:/root/.aws:ro \
  -v $(pwd):/workspace \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -w /workspace \
  gco-dev

From inside the container, deploy everything — CDK bootstrap runs automatically for every region defined in cdk.json:

hljs language-bash

gco stacks deploy-all -y

If you'd rather install on your host, use a clean virtual environment or pipx — see the Prerequisites and QUICKSTART.md for the details and known caveats.

Optional: configure kubectl access (requires PUBLIC_AND_PRIVATE endpoint mode). The default endpoint mode is PRIVATE — see docs/CUSTOMIZATION.md for details. Most users don't need this; submit jobs via SQS or API Gateway instead.

Submit Your First Job

Check GPU capacity in a region before you submit:

hljs language-bash

gco capacity check --instance-type g4dn.xlarge --region us-east-1

Submit a job using whichever path fits your setup — via SQS (recommended), via the global DynamoDB queue, via API Gateway, or directly through kubectl:

hljs language-bash

gco jobs submit-sqs examples/simple-job.yaml --region us-east-1
gco queue submit examples/simple-job.yaml --region us-east-1
gco jobs submit examples/simple-job.yaml -n gco-jobs
gco jobs submit-direct examples/simple-job.yaml -r us-east-1

Check status and pull logs:

hljs language-bash

gco jobs list --all-regions
gco jobs logs hello-gco -n gco-jobs -r us-east-1

Deploy an Inference Endpoint

hljs language-bash

gco inference deploy my-llm -i vllm/vllm-openai:v0.22.0 --gpu-count 1
gco inference status my-llm
gco inference scale my-llm --replicas 3

See the Quick Start Guide for the full step-by-step walkthrough, or the CLI Reference for all available commands.

Architecture Overview

Figure 1: Global Capacity Orchestrator — multi-region control plane and regional EKS data planes

Multi-Region Reference Architecture workflow

DevOps / Platform engineers own the deployment. They configure the platform through cdk.json and drive everything from the gco CLI.
The AWS CDK app synthesises and deploys the GCO stacks with a single gco stacks deploy-all, provisioning the global control plane and one regional stack per target region.
Users submit jobs and inference requests through the gco CLI, which signs every call with AWS SigV4 credentials.
Amazon API Gateway (edge-optimized) is the global entry point. It enforces IAM (SigV4) authentication on every request before anything reaches the backend.
An AWS Lambda proxy injects a rotating secret header sourced from AWS Secrets Manager, adding a second authentication factor in front of the regional load balancers.
AWS Global Accelerator routes each request over the AWS backbone via anycast IPs to the nearest healthy region, providing automatic cross-region failover.
A regional AWS Application Load Balancer receives Global Accelerator traffic and forwards it into the cluster. ALBs accept only Global Accelerator IPs.
Each region runs an Amazon EKS cluster (EKS Auto Mode optional) with Karpenter GPU / Trainium / Inferentia / CPU node pools plus the GCO platform services — Health Monitor, Manifest Processor, Queue Processor, and Inference endpoints.

Below is the per-region view showing how a single regional stack is composed.

Figure 2: Regional stack — EKS cluster, Karpenter node pools, platform services, and regional AWS services

Regional Architecture workflow

A public-subnet Application Load Balancer accepts inbound traffic restricted to Global Accelerator IPs only.
The Amazon EKS cluster is the heart of the regional stack, hosting both platform services and user workloads.
Karpenter node pools provision capacity on demand across system, general-purpose, gpu-x86 (g4dn/g5), gpu-arm (g5g), inference, and gpu-efa (p4d/p5/p6) pools.
Workloads & platform services run across namespaces: gco-system (Health Monitor, Manifest Processor, Queue Processor, Inference Monitor) and gco-jobs / gco-inference (training and batch jobs, inference endpoints, and job DAG pipelines).
Storage & data services back the workloads: Amazon EFS (shared RWX), optional FSx for Lustre (HPC), optional Valkey cache, optional Aurora pgvector (RAG), and Amazon S3 for KMS-encrypted model weights.
An optional Regional API Gateway (IAM auth over a VPC Link) provides direct in-VPC access for private clusters without public ALB exposure.
An internal Network Load Balancer in private subnets fronts in-cluster services for VPC-internal traffic.
Regional AWS services complete the stack: Amazon SQS for the job queue, Amazon DynamoDB for state, and Amazon CloudWatch for metrics and logs.

📊 Full Architecture Diagram (click to expand)

Full Architecture

Regenerate this diagram and every per-stack view on demand with python diagrams/infra_diagrams/generate.py — it synthesises the current CDK app through AWS PDK cdk-graph so the diagrams never drift from the source. See diagrams/infra_diagrams/README.md for per-stack flags (--stack global|api-gateway|regional|regional-api|monitoring|analytics|all). Flowcharts of the code itself (Lambda handlers, CLI commands) live alongside them under diagrams/code_diagrams/.

The regional stack can be deployed to any AWS region. Add or remove regions by editing the deployment_regions.regional array in cdk.json.

Security Model

GCO Security Architecture and Request Flow

Figure 3: Defense-in-depth — five security layers applied across the request flow

Five layers protect every request:

IAM Authentication — API Gateway validates AWS credentials (SigV4)
Secret Header — Lambda injects a rotating token from Secrets Manager
IP Restriction — ALBs only accept Global Accelerator IPs
Header Validation — Backend services verify the secret token
IRSA — Pods assume IAM roles for AWS access (no static credentials)

hljs language-text

Request flow: User → API Gateway (SigV4) → Lambda (adds secret) → Global Accelerator
  → ALB (GA IPs only) → Services (validate secret)

For private clusters, Regional API Gateways provide direct VPC access without public ALB exposure.

See Architecture Details for the full deep dive.

Key Features

Compute & Orchestration

EKS Auto Mode with automatic node provisioning — no pre-scaling needed
GPU support for x86_64 (g4dn, g5) and ARM64 (g5g) via Karpenter nodepools
Multiple submission methods: API Gateway, SQS queues, DynamoDB job queue, or direct kubectl
Job pipelines (DAGs): Multi-step ML pipelines with dependency ordering and failure handling
Helm-managed ecosystem: KEDA, Volcano, KubeRay, Kueue, GPU Operator, DRA, and more — configurable via cdk.json

Inference Serving

Multi-region inference: Deploy endpoints (vLLM, TGI, Triton, TorchServe, SGLang) across regions with a single command
Canary deployments: A/B test new model versions with weighted traffic routing
Model weight management: Central S3 bucket with KMS encryption, automatic sync to each region
Spot instance support: Run inference on spot GPUs for significant cost savings
Autoscaling: HPA-based scaling with CPU/memory metrics

Networking & Security

Global Accelerator: Single anycast endpoint with automatic failover
IAM authentication: SigV4 at the API Gateway — no kubeconfig distribution
Compliance validated: CDK-nag checks for AWS Solutions, HIPAA, NIST 800-53, PCI DSS
Network policies: Default-deny with explicit allow rules for all service communication
EFA support: Optional Elastic Fabric Adapter for high-bandwidth distributed training and NIXL-based inference (toggle on/off)

Storage & Data

EFS: Shared elastic storage for job outputs that persist after pod termination
FSx for Lustre: Optional high-performance parallel file system for ML training (toggle on/off)
Valkey cache: Optional serverless key-value cache for prompt caching and session state
Aurora pgvector: Optional serverless vector database for RAG, semantic search, and embedding storage

Operations

Cost visibility: Track spend by service, region, and workload via Cost Explorer integration
Auto-bootstrap: CDK bootstrap runs automatically for new regions during deploy
Multi-region monitoring: CloudWatch dashboards, alarms, and SNS alerts across all regions

ML & Analytics Environment

ML & Analytics Environment: Optional SageMaker Studio domain + EMR Serverless + Cognito user pool for interactive notebook analytics, with an always-on Cluster_Shared_Bucket that all cluster jobs can read and write. Off by default — enable with gco analytics enable. See Analytics Guide.

Mission

Goal-directed iteration loop for orchestrated workflows. The operator declares a natural-language directive plus machine-checkable success criteria, a tool allowlist, and a budget; Mission runs five-phase iterations (propose → execute → observe → evaluate → decide) until a verdict is reached. Off by default — enable with GCO_ENABLE_MISSION=true. See Mission Guide.

Deterministic verdict cascade with optional advisory LLM sampling (MCP host or Amazon Bedrock). Sampling shapes only the next strategy; it never moves the verdict.
Budget caps on iterations and wall clock — the engine terminates cleanly when any cap fires. Cost guardrails live out-of-band via AWS Budgets and Cost Anomaly Detection at the account level.
Scripted strategies opt-in: an AST-validated Python sandbox with bounded duration and memory limits.
CLI + MCP surface: ten gco mission subcommands (including the chained gco mission run that scaffolds criteria and drives a session to completion in one call) and matching MCP tools, plus three mission://sessions/{id} resource templates.

Documentation

New to GCO? Start here:

Your Goal	Read This
Understand what GCO does	Core Concepts
Get running in under 60 minutes	Quick Start Guide
Learn the architecture	Architecture Details
Browse every guide in one place	Documentation Index

Day-to-day operations:

Your Goal	Read This
CLI commands and usage	CLI Reference
Deploy inference endpoints	Inference Guide
Use the REST API directly	API Reference
Fix issues	Troubleshooting
Respond to incidents	Operational Runbooks
Run interactive notebook analytics	Analytics Guide
Drive a goal-directed iteration loop	Mission Guide

Customization and development:

Your Goal	Read This
Add regions, tune nodepools, enable FSx	Customization Guide
Choose a scheduler for your workload	Schedulers & Orchestrators
Configure the SQS queue processor	Queue Processor Config
Contribute to the project	Contributing
API client examples (Python, curl, AWS CLI)	Client Examples
IAM policy templates	IAM Policies
Presentation slides and demo scripts	Demo Starter Kit

Prerequisites

Recommended path — dev container only:

AWS CLI configured with appropriate credentials (or ~/.aws to mount in)
Docker (or Finch / Colima) — that's it. The container ships Python 3.14, Node.js 24, CDK, kubectl, and AWS CLI at pinned versions.

hljs language-bash

docker build -f Dockerfile.dev -t gco-dev .
docker run -it --rm -v ~/.aws:/root/.aws:ro -v $(pwd):/workspace -w /workspace gco-dev

For gco stacks deploy-all, cdk deploy needs to run Docker to bundle Lambda assets. Mount the host Docker socket so the container's CLI talks to your host daemon (works with Docker Desktop on macOS/Windows, with Docker on Linux, and with Colima on macOS — see Dockerfile.dev for Colima-specific socket paths):

hljs language-bash

docker run --rm -it \
  -v ~/.aws:/root/.aws:ro \
  -v $(pwd):/workspace \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -w /workspace \
  gco-dev gco stacks deploy-all -y

This is host-socket pass-through, not true Docker-in-Docker. Anyone with access to the container has root-equivalent access to the host Docker daemon, so keep the container on a trusted host.

Host install path (advanced):

AWS CLI configured with appropriate credentials
Python 3.14+ and Node.js LTS (v24)
AWS CDK CLI (npm install -g aws-cdk)
Docker or Finch (for building container images)
A clean Python virtual environment or pipx — GCO pins exact versions of many packages, so installing it into an existing environment will commonly fail with dependency-resolver errors. If you hit ResolutionImpossible, switch to the dev container instead of debugging your local env.

Project Structure

hljs language-text

.
├── app.py                               # CDK app entry point
├── cdk.json                             # CDK configuration (regions, features, thresholds)
├── pyproject.toml                       # Project metadata, dependencies, and CLI installation
│
├── cli/                                 # GCO CLI (jobs, stacks, capacity, inference, costs, DAGs)
├── diagrams/                            # Auto-generated architecture diagrams (infra_diagrams/) and code flowcharts (code_diagrams/)
├── docs/                                # Documentation (architecture, CLI, API, inference, customization, analytics)
├── examples/                            # Example manifests (jobs, inference, Ray, Volcano, Kueue, Slurm, YuniKorn)
├── gco/
│   ├── config/                          # Configuration loader with validation
│   ├── models/                          # Data models for k8s clusters, health monitor, inference monitor and manifest processor
│   ├── services/                        # K8s services (health monitor, inference monitor, manifest processor, queue processor)
│   └── stacks/                          # CDK stacks (global, regional, API gateway, monitoring)
│       └── constants.py                 # Pinned versions: EKS addons, Lambda runtime, Aurora engine
│
├── lambda/                              # Lambda functions
│   ├── alb-header-validator/            # ALB header validation for auth tokens
│   ├── analytics-cleanup/               # Custom resource that deletes Studio user profiles + EFS access points on stack destroy
│   ├── analytics-presigned-url/         # Generates presigned SageMaker Studio URLs for Cognito-authenticated users
│   ├── api-gateway-proxy/               # API Gateway → Global Accelerator proxy
│   ├── cross-region-aggregator/         # Cross-region job/health aggregation
│   ├── drift-detection/                 # Scheduled drift checks against deployed CDK stacks
│   ├── ga-registration/                 # Global Accelerator endpoint registration
│   ├── helm-installer/                  # Installs Helm charts (schedulers, GPU operators, cert-manager)
│   │   └── charts.yaml                  # Helm chart configuration (schedulers, GPU operators, cert-manager)
│   ├── image-lookup/                    # Adopt-or-create custom resource for the project's gco/* ECR repositories
│   ├── kubectl-applier-simple/          # Applies K8s manifests during deployment
│   │   └── manifests/                   # Kubernetes manifests (nodepools, RBAC, services, storage)
│   ├── proxy-shared/                    # Shared utilities for proxy Lambdas
│   ├── regional-api-proxy/              # Regional API Gateway → internal ALB proxy
│   └── secret-rotation/                 # Daily secret rotation
│
├── mcp/                                 # MCP server for LLM interaction (95 tools default, up to 127 with feature flags)
├── scripts/                             # Utility scripts (version bump, cluster access setup)
└── tests/                               # PyTest + BATS test suites (counts tracked via badges)

Contributing

See CONTRIBUTING.md for development setup, testing, the GitHub Actions CI/CD layout, release process, and dependency scanning schedules.

Quick start for contributors (dev container — recommended):

hljs language-bash

docker build -f Dockerfile.dev -t gco-dev .
docker run --rm -v $(pwd):/workspace -w /workspace gco-dev pytest tests/ -v --cov=gco --cov=cli --cov=mcp

Or, in a clean virtual environment on your host:

hljs language-bash

python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest tests/ -v --cov=gco --cov=cli --cov=mcp

If pip install -e ".[dev]" fails with dependency-resolver errors, that's the pinned-versions issue mentioned in Prerequisites. Use the dev container instead — it ships everything at the exact versions CI uses.

License

See the LICENSE file for details.

Support

Check Troubleshooting for common issues
Review CloudWatch logs for Lambda and EKS errors
Open an issue on GitHub

Security

For security issues, do not open a public GitHub issue. See .github/SECURITY.md for the disclosure process.

Global Capacity Orchestrator (GCO)

One API. Every Accelerator. Any Region.

🎬 Live demo recording

GCO Live Demo

📦 Deploy recording

GCO Deploy

Fresh gco stacks deploy-all -y from a clean account (re-record)

🗑️ Destroy recording

GCO Destroy

Full teardown with gco stacks destroy-all -y (re-record)

Deploy everything and tear it all down with one command each:

hljs language-bash

gco stacks deploy-all -y      # stand up every region defined in cdk.json
gco stacks destroy-all -y     # destroy every stack across every region — no orphaned resources

hljs language-bash

git clone git@github.com:awslabs/global-capacity-orchestrator-on-aws.git
cd global-capacity-orchestrator-on-aws

docker build -f Dockerfile.dev -t gco-dev .
docker run -it --rm \
  -v ~/.aws:/root/.aws:ro \
  -v $(pwd):/workspace \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -w /workspace \
  gco-dev

Prefer to install on your host? (advanced — the dev container is recommended)

hljs language-bash

git clone git@github.com:awslabs/global-capacity-orchestrator-on-aws.git
cd global-capacity-orchestrator-on-aws && pipx install -e .

See the Quick Start for the full install + first-job walkthrough, or docs/CLI.md for every CLI command.

💡 New to the codebase? GCO ships with the GCO MCP server — an MCP server exposing 95 tools by default (up to 127 with feature flags) that index the whole project: docs, examples, source code, K8s manifests, and scripts. Connect it to an AI-powered IDE with MCP support (like Kiro) and explore GCO conversationally — ask questions about the codebase instead of reading repository files directly: "How does region recommendation work?", "Walk me through the inference deployment flow". See mcp/README.md.

Table of contents

Why GCO?
Quick Start
Architecture Overview
Key Features
Documentation
Project Structure
Contributing
License
Support
Security

Why GCO?

Challenge	Traditional Approach	With GCO
GPU availability	Manually check each region	Auto-routes to available capacity
Node provisioning	Pre-provision or wait for scaling	EKS Auto Mode provisions on-demand
Multi-region ops	Manage clusters separately	Single API, automatic routing
Authentication	Configure per-cluster access	IAM-based, uses existing AWS credentials
Job outputs	Lost when pods terminate	Persisted to EFS/FSx storage
Inference serving	Deploy and manage per-region	Deploy once, serve globally
Failover	Manual intervention required	Automatic via Global Accelerator

When to use GCO:

You need to run GPU workloads (training, inference, batch processing)
You want to deploy inference endpoints across multiple regions with a single command
You want multi-region redundancy without managing multiple clusters
You prefer IAM authentication over kubeconfig management
You need job outputs to persist after completion

Quick Start

Install and Deploy

The fastest, most reliable path is the dev container — it sidesteps the dependency-conflict issues that come with installing GCO's pinned Python packages on top of your existing Python environment.

Build the dev container (Python, Node.js, CDK, kubectl, and the AWS CLI are all pinned and pre-installed), then drop into a shell with the gco CLI already on the path:

hljs language-bash

docker build -f Dockerfile.dev -t gco-dev .
docker run -it --rm \
  -v ~/.aws:/root/.aws:ro \
  -v $(pwd):/workspace \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -w /workspace \
  gco-dev

From inside the container, deploy everything — CDK bootstrap runs automatically for every region defined in cdk.json:

hljs language-bash

gco stacks deploy-all -y

If you'd rather install on your host, use a clean virtual environment or pipx — see the Prerequisites and QUICKSTART.md for the details and known caveats.

Optional: configure kubectl access (requires PUBLIC_AND_PRIVATE endpoint mode). The default endpoint mode is PRIVATE — see docs/CUSTOMIZATION.md for details. Most users don't need this; submit jobs via SQS or API Gateway instead.

Submit Your First Job

Check GPU capacity in a region before you submit:

hljs language-bash

gco capacity check --instance-type g4dn.xlarge --region us-east-1

Submit a job using whichever path fits your setup — via SQS (recommended), via the global DynamoDB queue, via API Gateway, or directly through kubectl:

hljs language-bash

gco jobs submit-sqs examples/simple-job.yaml --region us-east-1
gco queue submit examples/simple-job.yaml --region us-east-1
gco jobs submit examples/simple-job.yaml -n gco-jobs
gco jobs submit-direct examples/simple-job.yaml -r us-east-1

Check status and pull logs:

hljs language-bash

gco jobs list --all-regions
gco jobs logs hello-gco -n gco-jobs -r us-east-1

Deploy an Inference Endpoint

hljs language-bash

gco inference deploy my-llm -i vllm/vllm-openai:v0.22.0 --gpu-count 1
gco inference status my-llm
gco inference scale my-llm --replicas 3

See the Quick Start Guide for the full step-by-step walkthrough, or the CLI Reference for all available commands.

Architecture Overview

Figure 1: Global Capacity Orchestrator — multi-region control plane and regional EKS data planes

Multi-Region Reference Architecture workflow

DevOps / Platform engineers own the deployment. They configure the platform through cdk.json and drive everything from the gco CLI.
The AWS CDK app synthesises and deploys the GCO stacks with a single gco stacks deploy-all, provisioning the global control plane and one regional stack per target region.
Users submit jobs and inference requests through the gco CLI, which signs every call with AWS SigV4 credentials.
Amazon API Gateway (edge-optimized) is the global entry point. It enforces IAM (SigV4) authentication on every request before anything reaches the backend.
An AWS Lambda proxy injects a rotating secret header sourced from AWS Secrets Manager, adding a second authentication factor in front of the regional load balancers.
AWS Global Accelerator routes each request over the AWS backbone via anycast IPs to the nearest healthy region, providing automatic cross-region failover.
A regional AWS Application Load Balancer receives Global Accelerator traffic and forwards it into the cluster. ALBs accept only Global Accelerator IPs.
Each region runs an Amazon EKS cluster (EKS Auto Mode optional) with Karpenter GPU / Trainium / Inferentia / CPU node pools plus the GCO platform services — Health Monitor, Manifest Processor, Queue Processor, and Inference endpoints.

Below is the per-region view showing how a single regional stack is composed.

Figure 2: Regional stack — EKS cluster, Karpenter node pools, platform services, and regional AWS services

Regional Architecture workflow

A public-subnet Application Load Balancer accepts inbound traffic restricted to Global Accelerator IPs only.
The Amazon EKS cluster is the heart of the regional stack, hosting both platform services and user workloads.
Karpenter node pools provision capacity on demand across system, general-purpose, gpu-x86 (g4dn/g5), gpu-arm (g5g), inference, and gpu-efa (p4d/p5/p6) pools.
Workloads & platform services run across namespaces: gco-system (Health Monitor, Manifest Processor, Queue Processor, Inference Monitor) and gco-jobs / gco-inference (training and batch jobs, inference endpoints, and job DAG pipelines).
Storage & data services back the workloads: Amazon EFS (shared RWX), optional FSx for Lustre (HPC), optional Valkey cache, optional Aurora pgvector (RAG), and Amazon S3 for KMS-encrypted model weights.
An optional Regional API Gateway (IAM auth over a VPC Link) provides direct in-VPC access for private clusters without public ALB exposure.
An internal Network Load Balancer in private subnets fronts in-cluster services for VPC-internal traffic.
Regional AWS services complete the stack: Amazon SQS for the job queue, Amazon DynamoDB for state, and Amazon CloudWatch for metrics and logs.

📊 Full Architecture Diagram (click to expand)

Full Architecture

The regional stack can be deployed to any AWS region. Add or remove regions by editing the deployment_regions.regional array in cdk.json.

Security Model

Figure 3: Defense-in-depth — five security layers applied across the request flow

Five layers protect every request:

IAM Authentication — API Gateway validates AWS credentials (SigV4)
Secret Header — Lambda injects a rotating token from Secrets Manager
IP Restriction — ALBs only accept Global Accelerator IPs
Header Validation — Backend services verify the secret token
IRSA — Pods assume IAM roles for AWS access (no static credentials)

hljs language-text

Request flow: User → API Gateway (SigV4) → Lambda (adds secret) → Global Accelerator
  → ALB (GA IPs only) → Services (validate secret)

For private clusters, Regional API Gateways provide direct VPC access without public ALB exposure.

See Architecture Details for the full deep dive.

Key Features

Compute & Orchestration

EKS Auto Mode with automatic node provisioning — no pre-scaling needed
GPU support for x86_64 (g4dn, g5) and ARM64 (g5g) via Karpenter nodepools
Multiple submission methods: API Gateway, SQS queues, DynamoDB job queue, or direct kubectl
Job pipelines (DAGs): Multi-step ML pipelines with dependency ordering and failure handling
Helm-managed ecosystem: KEDA, Volcano, KubeRay, Kueue, GPU Operator, DRA, and more — configurable via cdk.json

Inference Serving

Multi-region inference: Deploy endpoints (vLLM, TGI, Triton, TorchServe, SGLang) across regions with a single command
Canary deployments: A/B test new model versions with weighted traffic routing
Model weight management: Central S3 bucket with KMS encryption, automatic sync to each region
Spot instance support: Run inference on spot GPUs for significant cost savings
Autoscaling: HPA-based scaling with CPU/memory metrics

Networking & Security

Global Accelerator: Single anycast endpoint with automatic failover
IAM authentication: SigV4 at the API Gateway — no kubeconfig distribution
Compliance validated: CDK-nag checks for AWS Solutions, HIPAA, NIST 800-53, PCI DSS
Network policies: Default-deny with explicit allow rules for all service communication
EFA support: Optional Elastic Fabric Adapter for high-bandwidth distributed training and NIXL-based inference (toggle on/off)

Storage & Data

EFS: Shared elastic storage for job outputs that persist after pod termination
FSx for Lustre: Optional high-performance parallel file system for ML training (toggle on/off)
Valkey cache: Optional serverless key-value cache for prompt caching and session state
Aurora pgvector: Optional serverless vector database for RAG, semantic search, and embedding storage

Operations

Cost visibility: Track spend by service, region, and workload via Cost Explorer integration
Auto-bootstrap: CDK bootstrap runs automatically for new regions during deploy
Multi-region monitoring: CloudWatch dashboards, alarms, and SNS alerts across all regions

ML & Analytics Environment

ML & Analytics Environment: Optional SageMaker Studio domain + EMR Serverless + Cognito user pool for interactive notebook analytics, with an always-on Cluster_Shared_Bucket that all cluster jobs can read and write. Off by default — enable with gco analytics enable. See Analytics Guide.

Mission

Deterministic verdict cascade with optional advisory LLM sampling (MCP host or Amazon Bedrock). Sampling shapes only the next strategy; it never moves the verdict.
Budget caps on iterations and wall clock — the engine terminates cleanly when any cap fires. Cost guardrails live out-of-band via AWS Budgets and Cost Anomaly Detection at the account level.
Scripted strategies opt-in: an AST-validated Python sandbox with bounded duration and memory limits.
CLI + MCP surface: ten gco mission subcommands (including the chained gco mission run that scaffolds criteria and drives a session to completion in one call) and matching MCP tools, plus three mission://sessions/{id} resource templates.

Documentation

New to GCO? Start here:

Your Goal	Read This
Understand what GCO does	Core Concepts
Get running in under 60 minutes	Quick Start Guide
Learn the architecture	Architecture Details
Browse every guide in one place	Documentation Index

Day-to-day operations:

Your Goal	Read This
CLI commands and usage	CLI Reference
Deploy inference endpoints	Inference Guide
Use the REST API directly	API Reference
Fix issues	Troubleshooting
Respond to incidents	Operational Runbooks
Run interactive notebook analytics	Analytics Guide
Drive a goal-directed iteration loop	Mission Guide

Customization and development:

Your Goal	Read This
Add regions, tune nodepools, enable FSx	Customization Guide
Choose a scheduler for your workload	Schedulers & Orchestrators
Configure the SQS queue processor	Queue Processor Config
Contribute to the project	Contributing
API client examples (Python, curl, AWS CLI)	Client Examples
IAM policy templates	IAM Policies
Presentation slides and demo scripts	Demo Starter Kit

Prerequisites

Recommended path — dev container only:

AWS CLI configured with appropriate credentials (or ~/.aws to mount in)
Docker (or Finch / Colima) — that's it. The container ships Python 3.14, Node.js 24, CDK, kubectl, and AWS CLI at pinned versions.

hljs language-bash

docker build -f Dockerfile.dev -t gco-dev .
docker run -it --rm -v ~/.aws:/root/.aws:ro -v $(pwd):/workspace -w /workspace gco-dev

hljs language-bash

docker run --rm -it \
  -v ~/.aws:/root/.aws:ro \
  -v $(pwd):/workspace \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -w /workspace \
  gco-dev gco stacks deploy-all -y

This is host-socket pass-through, not true Docker-in-Docker. Anyone with access to the container has root-equivalent access to the host Docker daemon, so keep the container on a trusted host.

Host install path (advanced):

AWS CLI configured with appropriate credentials
Python 3.14+ and Node.js LTS (v24)
AWS CDK CLI (npm install -g aws-cdk)
Docker or Finch (for building container images)
A clean Python virtual environment or pipx — GCO pins exact versions of many packages, so installing it into an existing environment will commonly fail with dependency-resolver errors. If you hit ResolutionImpossible, switch to the dev container instead of debugging your local env.

Project Structure

hljs language-text

.
├── app.py                               # CDK app entry point
├── cdk.json                             # CDK configuration (regions, features, thresholds)
├── pyproject.toml                       # Project metadata, dependencies, and CLI installation
│
├── cli/                                 # GCO CLI (jobs, stacks, capacity, inference, costs, DAGs)
├── diagrams/                            # Auto-generated architecture diagrams (infra_diagrams/) and code flowcharts (code_diagrams/)
├── docs/                                # Documentation (architecture, CLI, API, inference, customization, analytics)
├── examples/                            # Example manifests (jobs, inference, Ray, Volcano, Kueue, Slurm, YuniKorn)
├── gco/
│   ├── config/                          # Configuration loader with validation
│   ├── models/                          # Data models for k8s clusters, health monitor, inference monitor and manifest processor
│   ├── services/                        # K8s services (health monitor, inference monitor, manifest processor, queue processor)
│   └── stacks/                          # CDK stacks (global, regional, API gateway, monitoring)
│       └── constants.py                 # Pinned versions: EKS addons, Lambda runtime, Aurora engine
│
├── lambda/                              # Lambda functions
│   ├── alb-header-validator/            # ALB header validation for auth tokens
│   ├── analytics-cleanup/               # Custom resource that deletes Studio user profiles + EFS access points on stack destroy
│   ├── analytics-presigned-url/         # Generates presigned SageMaker Studio URLs for Cognito-authenticated users
│   ├── api-gateway-proxy/               # API Gateway → Global Accelerator proxy
│   ├── cross-region-aggregator/         # Cross-region job/health aggregation
│   ├── drift-detection/                 # Scheduled drift checks against deployed CDK stacks
│   ├── ga-registration/                 # Global Accelerator endpoint registration
│   ├── helm-installer/                  # Installs Helm charts (schedulers, GPU operators, cert-manager)
│   │   └── charts.yaml                  # Helm chart configuration (schedulers, GPU operators, cert-manager)
│   ├── image-lookup/                    # Adopt-or-create custom resource for the project's gco/* ECR repositories
│   ├── kubectl-applier-simple/          # Applies K8s manifests during deployment
│   │   └── manifests/                   # Kubernetes manifests (nodepools, RBAC, services, storage)
│   ├── proxy-shared/                    # Shared utilities for proxy Lambdas
│   ├── regional-api-proxy/              # Regional API Gateway → internal ALB proxy
│   └── secret-rotation/                 # Daily secret rotation
│
├── mcp/                                 # MCP server for LLM interaction (95 tools default, up to 127 with feature flags)
├── scripts/                             # Utility scripts (version bump, cluster access setup)
└── tests/                               # PyTest + BATS test suites (counts tracked via badges)

Contributing

See CONTRIBUTING.md for development setup, testing, the GitHub Actions CI/CD layout, release process, and dependency scanning schedules.

Quick start for contributors (dev container — recommended):

hljs language-bash

docker build -f Dockerfile.dev -t gco-dev .
docker run --rm -v $(pwd):/workspace -w /workspace gco-dev pytest tests/ -v --cov=gco --cov=cli --cov=mcp

Or, in a clean virtual environment on your host:

hljs language-bash

python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest tests/ -v --cov=gco --cov=cli --cov=mcp

If pip install -e ".[dev]" fails with dependency-resolver errors, that's the pinned-versions issue mentioned in Prerequisites. Use the dev container instead — it ships everything at the exact versions CI uses.

License

See the LICENSE file for details.

Support

Check Troubleshooting for common issues
Review CloudWatch logs for Lambda and EKS errors
Open an issue on GitHub

Security

For security issues, do not open a public GitHub issue. See .github/SECURITY.md for the disclosure process.

global-capacity-orchestrator-on-aws

Global Capacity Orchestrator (GCO)

Why GCO?

Quick Start

Install and Deploy

Submit Your First Job

Deploy an Inference Endpoint

Architecture Overview

Multi-Region Reference Architecture workflow

Regional Architecture workflow

Security Model

Key Features

Compute & Orchestration

Inference Serving

Networking & Security

Storage & Data

Operations

ML & Analytics Environment

Mission

Documentation

Prerequisites

Project Structure

Contributing

License

Support

Security

Similar Packages

global-capacity-orchestrator-on-aws

Global Capacity Orchestrator (GCO)

Why GCO?

Quick Start

Install and Deploy

Submit Your First Job

Deploy an Inference Endpoint

Architecture Overview

Multi-Region Reference Architecture workflow

Regional Architecture workflow

Security Model

Key Features

Compute & Orchestration

Inference Serving

Networking & Security

Storage & Data

Operations

ML & Analytics Environment

Mission

Documentation

Prerequisites

Project Structure

Contributing

License

Support

Security

Similar Packages