[!IMPORTANT]
🎉 AssetOpsBench is officially accepted at KDD 2026 (Datasets & Benchmarks Track), Jeju, South Korea, alongside our hands-on tutorial Building Reliable Industrial Agents with MCP. See Publications for the full list of 2025–2026 work.
At a Glance
9 Asset classes
141+ Scenarios
5 Domain agents
2 Orchestration frameworks
20+ University extensions
500+ Competition submissions
Built for: maintenance engineers, reliability specialists, facility planners, and Industry 4.0 researchers.
Powered by: LLMs + Time Series Foundation Models, orchestrated over live sensor data and Industry 4.0 records (FMEA, work orders, alerts).
Now with: simplified interface and native MCP (Model Context Protocol) support.
Quick Start
hljs language-bash
# Clone and install
git clone https://github.com/IBM/AssetOpsBench.git
cd AssetOpsBench
pip install -e .
# Try a scenario (to be enabled)
python -m assetopsbench.run --scenario "List all sensors of Chiller 6 in MAIN site"
Or jump in instantly:
🚀 Run on Colab — no install required (illustration of LLM Agent)
[!NOTE]
Active development is on main. The codebase used for various publication venues continues to be maintained on separate branches, for example, ACL 2026 IndustryAssetEQA and prior experimental work is maintained on main-0.x.
What is AssetOpsBench?
AssetOpsBench is a unified framework for developing, orchestrating, and evaluating domain-specific AI agents in industrial asset operations and maintenance. It provides reproducible scenarios, agent tooling, and evaluation pipelines for multi-step workflows in simulated industrial environments.
"Identify failure modes detected by Chiller 6 Supply Temperature"
TSFM
"Forecast Chiller 9 Condenser Water Flow for the week of 2020-04-27"
WO
"Generate a work order for Chiller 6 anomaly detection"
Some tasks focus on a single domain, others are multi-step end-to-end workflows. Explore all scenarios on Hugging Face.
Leaderboards
To be revised (WIP with latest models)
Evaluated with 7 Large Language Models
Trajectories scored using LLM Judge (Llama-4-Maverick-17B)
6-dimensional criteria measuring reasoning, execution, and data handling
Example: MetaAgent leaderboard
Publications
12+ contributions across 7 top venues in 2025–2026 from the team behind AssetOpsBench.
⭐ KDD 2026 — Jeju, South Korea (click to expand)
[D&B]AssetOpsBench: A Benchmark for Industrial Asset Operations Agents · D. Patel, S. Lin, et al. · 📄 Paper
[Tutorial]Building Reliable Industrial Agents with MCP: A Hands-on AssetOpsBench Tutorial for AI-Driven Operations · D. Patel, C. Shyalika, et al.
ACL 2026 - San Diego, USA
[Industry]IndustryAssetEQA: A Neurosymbolic Operational Intelligence System for Embodied Question Answering in Industrial Asset Maintenance · C. Shyalika, D. Patel, A. Sheth
ICLR 2026 - Brazil
[Main]Adaptive Conformal Anomaly Detection with Time Series Foundation Models for Signal Monitoring · N. Martinez, F. O'Donncha, W. M. Gifford, N. Zhou, D. C. Patel, R. Vaculin
AAAI 2026 — Singapore
[Demo]AssetOpsBench-Live: Privacy-Aware Online Evaluation of Multi-Agent Performance in Industrial Operations · D. Patel, N. Zhou, S. Lin, J. T. Rayfield, C. Shyalika, S. R. Yarrabothula · 🎥 Demo
[Main]SPIRAL: Symbolic LLM Planning via Grounded and Reflective Search · Y. Zhang, G. Ganapavarapu, S. Jayaraman, B. Agrawal, D. Patel, A. Fokoue · 💻 Code
[Bridge]Knowledge-Guided AI for Industrial Asset Health Monitoring · S. Lin, D. Patel
[Tutorial]From Inception to Productization: Hands-on Lab for the Lifecycle of Multimodal Agentic AI in Industry 4.0 · C. Shyalika, S. Ahuja, S. Lin, R. Wickramarachchi, D. Patel, A. Sheth · 🌐 Website · 📊 Slides
[Workshop(AABA4ET)]Agentic Code Generation for Heuristic Rules in Equipment Monitoring · F. Lorenzi, A. Langbridge, F. O'Donncha, J. Rayfield, B. Eck, S. Rosato
IAAI 2026 - Singapore
[Deployed]Deployed AI Agents for Industrial Asset Management: CodeReAct Framework for Event Analysis and Work Order Automation · N. Zhou, D. Patel, A. Bhattacharyya
[Emmerging]Diversity Meets Relevancy: Multi-Agent Knowledge Probing for Industry 4.0 Applications · C. Constantinides, D. Patel, S. Kimbleton, N. Garg, M. Paracha
NeurIPS 2025 — San Diego, USA
[D&B Track]FailureSensorIQ: A Multi-Choice QA Dataset for Understanding Sensor Relationships and Failure Modes · C. Constantinides, D. Patel, S. Lin, C. Guerrero, S. D. Patil, J. Kalagnanam · 📄 arXiv · 💻 Code
[Social]Building Reliable Agentic Benchmarks: Insights from AssetOpsBench(invited talk, 2000+ registered) · D. Patel · 📅 Luma
EMNLP 2025 — Suzhou, China
[Main]ReAct Meets Industrial IoT: Language Agents for Data Access · J. T. Rayfield, S. Lin, N. Zhou, D. C. Patel
[Main]Generalized Embedding Models for Industry 4.0 Applications · C. Constantinides, S. Lin, D. C. Patel · 📄 arXiv
[Findings]Fine-Tuned Thoughts: Leveraging Chain-of-Thought Reasoning for Industrial Asset Health Monitoring · S. Lin, D. Patel, C. Constantinides · 📄 ACL Anthology · 💻 Code
AssetOpsBench v1.0 released — 141 industrial scenarios
University Projects & Extensions
AssetOpsBench is being extended by university research groups exploring new asset classes, evaluation paradigms, and agentic architectures. To list your project, open a PR.
Internalizing MCP Tool Knowledge in Small LLMs via QLoRA Fine-Tuning — HPML project using AssetOpsBench to fine-tune ~4B models to internalize MCP tool knowledge and reduce prompt schema overhead. Ayal Yakobe, Columbia University · repo
SPIN — Structural LLM Planning via Iterative Navigation for Industrial Tasks.Yusuke Ozaki, University at Albany · paper · repo
Synthetic Scenario Generation for Evaluation of Industry 4.0 Agents — Automated scenario generation, transformer asset integration, and scenario quality evaluation. Rohith Kanathur, Sagar Chethan Kumar, Columbia University · repo
Skill-Knowledge-Augmented Agents on AssetOpsBench — Confidence-gated skill execution with scoped knowledge plugins for industrial fault diagnosis. Vera Mazeeva, Sanskruti Shejwal, Shrey Arora, Mana Abbaszadeh, Columbia University · repo
Towards Multi-Turn Dialog Systems for Industrial Asset Operations and Maintenance - Improved response quality and reduced redundant tool calls and multi-turn latency. Chengrui Li, Rujing Li, Yitong Bai, Rui Li, Columbia University ·paper· repo
Skills and Knowledge Plugin MCP Servers for Optimized Industrial O&M Agents - reducing planning overhead and improving retrieval grounding in industrial asset maintenance agents through an MCP Skills Server that exposes reusable multi-step operational workflows and a Knowledge Plugin Server that enables injection of context-specific documentation. Andrew Li, Kirthana Natarajan, Thai On, Trisha Maturi, Yeshitha Bhuvanesh, Columbia University · repo
Profiling and Optimizing the TSFM MCP Server - Developed a reproducible benchmarking harness, stage-level profiling system, and interchangeable model interface that identified preprocessing and inference bottlenecks, achieving up to 12.8× faster forecasting and 12.2% lower fine-tuning latency while supporting forecasting, fine-tuning, and anomaly detection workflows. Tomas Pasiecznik, Sam Colman, Byeolah Kwon, Sally Go, Columbia University · repo
Profiling and Optimizing the AssetOpsBench Plan-Execute Pipeline - Provides the first systematic performance characterization of the AssetOpsBench plan-execute pipeline to quantify the latency-accuracy tradeoff of thinking mode on Gemma 4 26B for industrial asset operations tasks. Implemented and evaluated scenario-based routing optimizations to balance the tradeoff. Shen Li, Charles Xu, Ann Li, Caroline Cahill, Columbia University · repo
Performance Optimzation of the TSFM Agent in an Industrial Agentic Benchmark - Developed an optimization framework for IBM's TinyTimeMixer(TTM) model by implementing model pre-loading, torch.compile graph fusion, and replacing Huggingface abstractions with direct batched model calls. We achieved 3.3X reduction in workflow latency and 68% decrease in total execution time while maintaining zero-shot forecast quality on industrial sensor data. Alisha Vinod, Jonathan Ang, Sanjaii Vijayakumar, Thomas Ajai, Columbia University . repo
Visual Inspection Agent for AssetOpsBench - Adds a vision modality to AssetOpsBench via an MCP-connected Visual Inspection Agent and 22 hand-authored visual inspection scenarios across pumps, induction motors, power transformers, and wind turbine blades. Benchmarks AWQ W4A16 quantization and vLLM serving optimizations on Qwen2.5-VL-7B and Llama-3-LLaVA-NeXT-8B, with an LLM-as-a-judge scoring pipeline for accuracy evaluation. Amaan Sheikh, Aman Upganlawar, Madhav Rajkondawar, Yang-Jung (Eric) Chen, Columbia University · repo
Agentic AI Workflows for Naval Operations and Maintenance — Exploring AssetOpsBench for evaluating agentic AI workflows, with future extensions using digital-twin-generated synthetic data. Priyam Dalmia, Chin-Teng Lin, Fred Chang, University of Technology Sydney
Call for Scenario Contribution
We are expanding AssetOpsBench to cover a broader range of industrial challenges. We invite researchers and practitioners to contribute new scenarios, particularly in:
Task Domains: Prognostics and Health Management, Remaining Useful Life (RUL) estimation, Root Cause Analysis (RCA), Diagnostic Analysis, Predictive Maintenance