A community-driven registry for Claude, Cursor, Windsurf, Cline & more. Not affiliated with Anthropic.
Are you the author? Sign in to claim
A Python library powered by Language Models (LLMs) for conversational data discovery and analysis.
BambooAI is an open-source library that enables natural language-based data analysis using Large Language Models (LLMs). It works with both local datasets and can fetch data from external sources and APIs.
BambooAI is an experimental tool that makes data analysis more accessible by allowing users to interact with their data through natural language conversations. It's designed to:
A demonstration of creating a machine learning model to predict Titanic passenger survival:
https://github.com/user-attachments/assets/59ef810c-80d8-4ef1-8edf-82ba64178b85
Example of various sports data analysis queries:
https://github.com/user-attachments/assets/7b9c9cd6-56e3-46ee-a6c6-c32324a0c5ef
pip install bambooai
Or alternatively clone the repo and install the requirements
git clone https://github.com/pgalko/BambooAI.git
pip install -r requirements.txt
Try it out on a basic example in Google Colab:
Install BambooAI:
pip install bambooai
Configure environment:
cp .env.example .env
# Edit .env with your settings
Configure agents/models
cp LLM_CONFIG_sample.json LLM_CONFIG.json
# Edit LLM_CONFIG.json with your desired combination of agents, models and parameters
Run
import pandas as pd
from bambooai import BambooAI
import plotly.io as pio
pio.renderers.default = 'jupyterlab'
df = pd.read_csv('titanic.csv')
bamboo = BambooAI(df=df, planning=True, vector_db=False, search_tool=True)
bamboo.pd_agent_converse()
The BambooAI operates through six key steps:
Initiation
Task Routing
User Feedback
Dynamic Prompt Build
Debugging and Execution
Results and Knowledge Base

BambooAI accepts the following initialization parameters:
bamboo = BambooAI(
df=None, # DataFrame to analyze
auxiliary_datasets=None, # List of paths to auxiliary datasets
max_conversations=4, # Number of conversation pairs to keep in memory
search_tool=False, # Enable internet search capability
planning=False, # Enable planning agent for complex tasks
webui=False, # Run as web application
vector_db=False, # Enable vector database for knowledge storage
df_ontology=False, # Use custom dataframe ontology
exploratory=True, # Enable expert selection for query handling
custom_prompt_file=None # Enable the use of custom/modified prompt templates
)
df (pd.DataFrame, optional)
auxiliary_datasets (list, default=None)
max_conversations (int, default=4)
search_tool (bool, default=False)
planning (bool, default=False)
webui (bool, default=False)
vector_db (bool, default=False)
text-embedding-3-small(OpenAI) and all-MiniLM-L6-v2(HF)df_ontology (str, default=None)
.ttl file. The parameter takes the path to the TTL file.exploratory (bool, default=True)
custom_prompt_file (str, default=None)
BambooAI uses multi-agent system where different specialized agents handle specific aspects of the data analysis process. Each agent can be configured to use different LLM models and parameters based on their specific requirements.
The LLM configuration is stored in LLM_CONFIG.json. Here's the complete configuration structure:
{
"agent_configs": [
{"agent": "Expert Selector", "details": {"model": "gpt-4.1", "provider":"openai","max_tokens": 2000, "temperature": 0}},
{"agent": "Analyst Selector", "details": {"model": "claude-3-7-sonnet-20250219", "provider":"anthropic","max_tokens": 2000, "temperature": 0}},
{"agent": "Theorist", "details": {"model": "gemini-2.5-pro-preview-03-25", "provider":"gemini","max_tokens": 4000, "temperature": 0}},
{"agent": "Dataframe Inspector", "details": {"model": "gemini-2.0-flash", "provider":"gemini","max_tokens": 8000, "temperature": 0}},
{"agent": "Planner", "details": {"model": "gemini-2.5-pro-preview-03-25", "provider":"gemini","max_tokens": 8000, "temperature": 0}},
{"agent": "Code Generator", "details": {"model": "claude-3-5-sonnet-20241022", "provider":"anthropic","max_tokens": 8000, "temperature": 0}},
{"agent": "Error Corrector", "details": {"model": "claude-3-5-sonnet-20241022", "provider":"anthropic","max_tokens": 8000, "temperature": 0}},
{"agent": "Reviewer", "details": {"model": "gemini-2.5-pro-preview-03-25", "provider":"gemini","max_tokens": 8000, "temperature": 0}},
{"agent": "Solution Summarizer", "details": {"model": "gemini-2.5-flash-preview-04-17", "provider":"gemini","max_tokens": 4000, "temperature": 0}},
{"agent": "Google Search Executor", "details": {"model": "gemini-2.5-flash-preview-04-17", "provider":"gemini","max_tokens": 4000, "temperature": 0}},
{"agent": "Google Search Summarizer", "details": {"model": "gemini-2.5-flash-preview-04-17", "provider":"gemini","max_tokens": 4000, "temperature": 0}}
],
"model_properties": {
"gpt-4o": {"capability":"base","multimodal":"true", "templ_formating":"text", "prompt_tokens": 0.0025, "completion_tokens": 0.010},
"gpt-4.1": {"capability":"base","multimodal":"true", "templ_formating":"text", "prompt_tokens": 0.002, "completion_tokens": 0.008},
"gpt-4o-mini": {"capability":"base", "multimodal":"true","templ_formating":"text", "prompt_tokens": 0.00015, "completion_tokens": 0.0006},
"gpt-4.1-mini": {"capability":"base", "multimodal":"true","templ_formating":"text", "prompt_tokens": 0.0004, "completion_tokens": 0.0016},
"o1-mini": {"capability":"reasoning", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.003, "completion_tokens": 0.012},
"o3-mini": {"capability":"reasoning", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.0011, "completion_tokens": 0.0044},
"o1": {"capability":"reasoning", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.015, "completion_tokens": 0.06},
"gemini-2.0-flash": {"capability":"base", "multimodal":"true","templ_formating":"text", "prompt_tokens": 0.0001, "completion_tokens": 0.0004},
"gemini-2.5-flash-preview-04-17": {"capability":"reasoning", "multimodal":"true","templ_formating":"text", "prompt_tokens": 0.00015, "completion_tokens": 0.0035},
"gemini-2.0-flash-thinking-exp-01-21": {"capability":"reasoning", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.0, "completion_tokens": 0.0},
"gemini-2.5-pro-exp-03-25": {"capability":"reasoning", "multimodal":"true","templ_formating":"text", "prompt_tokens": 0.0, "completion_tokens": 0.0},
"gemini-2.5-pro-preview-03-25": {"capability":"reasoning", "multimodal":"true","templ_formating":"text", "prompt_tokens": 0.00125, "completion_tokens": 0.01},
"claude-3-5-haiku-20241022": {"capability":"base", "multimodal":"true","templ_formating":"xml", "prompt_tokens": 0.0008, "completion_tokens": 0.004},
"claude-3-5-sonnet-20241022": {"capability":"base", "multimodal":"true","templ_formating":"xml", "prompt_tokens": 0.003, "completion_tokens": 0.015},
"claude-3-7-sonnet-20250219": {"capability":"base", "multimodal":"true","templ_formating":"xml", "prompt_tokens": 0.003, "completion_tokens": 0.015},
"open-mixtral-8x7b": {"capability":"base", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.0007, "completion_tokens": 0.0007},
"mistral-small-latest": {"capability":"base", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.001, "completion_tokens": 0.003},
"codestral-latest": {"capability":"base", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.001, "completion_tokens": 0.003},
"open-mixtral-8x22b": {"capability":"base", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.002, "completion_tokens": 0.006},
"mistral-large-2407": {"capability":"base", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.003, "completion_tokens": 0.009},
"deepseek-chat": {"capability":"base", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.00014, "completion_tokens": 0.00028},
"deepseek-reasoner": {"capability":"reasoning", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.00055, "completion_tokens": 0.00219},
"/mnt/c/Users/pgalk/vllm/models/DeepSeek-R1-Distill-Qwen-14B": {"capability":"reasoning", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.00, "completion_tokens": 0.00},
"deepseek-r1-distill-llama-70b": {"capability":"reasoning", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.00, "completion_tokens": 0.00},
"deepseek-r1:32b": {"capability":"reasoning", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.00, "completion_tokens": 0.00},
"deepseek-ai/deepseek-r1": {"capability":"reasoning", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.00, "completion_tokens": 0.00},
"MiniMax-M3": {"capability":"base", "multimodal":"true","templ_formating":"text", "prompt_tokens": 0.001, "completion_tokens": 0.005},
"MiniMax-M2.7": {"capability":"base", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.001, "completion_tokens": 0.005},
"MiniMax-M2.7-highspeed": {"capability":"base", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.001, "completion_tokens": 0.005}
}
}
The LLM_CONFIG.json configuration file needs to be located in the BambooAI working dir, eg. /Users/palogalko/AI_Experiments/Bamboo_AI/web_app/LLM_CONFIG.json, and all API keys for the specified models need to be present in the .env also located in the working dir.
The above combination of agents/models is the most performant according to our tests as of 22 Apr 2025 using sports and performance datasets. I would strongly encourage you to experiment with these settings to see what combination best suits your particular use case.
agent_configs: Agents configuration
agent: The type of agentdetails:
model: Model identifierprovider: Service provider (openai, anthropic, gemini, etc.)max_tokens: Maximum tokens for completiontemperature: Creativity parameter (0-1)model_properties: Model properties
capability: Base or Reasoning modelmultimodal: Multimodal or text onlytempl_formating: Prompt formatting. XML or Textprompt_tokens: Cost of input (1K)completion_tokens: Cost of output (1K)If you assign a model for an agent in agent_configs make sure that the model is defined in model_properties.
{
"agent": "Planner",
"details": {
"model": "llama3:70b",
"provider": "ollama",
"max_tokens": 2000,
"temperature": 0
}
}
{
"agent": "Code Generator",
"details": {
"model": "/path/to/model/DeepSeek-R1-Distill-14B",
"provider": "vllm",
"max_tokens": 2000,
"temperature": 0
}
}
{
"agent": "Code Generator",
"details": {
"model": "MiniMax-M3",
"provider": "minimax",
"max_tokens": 8000,
"temperature": 0.1
}
}
BambooAI supports working with multiple datasets simultaneously, allowing for more comprehensive and contextual analysis. The auxiliary datasets feature enables you to reference and incorporate additional data sources alongside your primary dataset.
When you ask questions that might benefit from auxiliary data, BambooAI will:
from bambooai import BambooAI
import pandas as pd
# Load primary dataset
main_df = pd.read_csv('main_data.csv')
# Specify paths to auxiliary datasets
auxiliary_paths = [
'path/to/supporting_data1.csv',
'path/to/supporting_data2.parquet',
'path/to/reference_data.csv'
]
# Initialize BambooAI with auxiliary datasets
bamboo = BambooAI(
df=main_df,
auxiliary_datasets=auxiliary_paths,
)
BambooAI supports custom ontologies to ground the agents within the specific domain of interest.
from bambooai import BambooAI
import pandas as pd
# Initialize with ontology file path
bamboo = BambooAI(
df=your_dataframe,
df_ontology="path/to/ontology.ttl"
)
The ontology file defines your data structure using RDF/OWL notation, including:
This helps BambooAI understand complex data relationships and generate more accurate code.
BambooAI supports integration with vector database. The main putpose is to allow storage and retrieval of successfull analysis allowing the system to evolve and learn over time.
from bambooai import BambooAI
import pandas as pd
# Initialize with ontology file path
bamboo = BambooAI(
df=your_dataframe,
vector_db=True
)
Supports both Pinecone and Qdrant vector databases. Configure your choice using environment variables:
For Pinecone:
Requires an account with Pinecone (free), and the API key stored in the .env:
VECTOR_DB_TYPE=pinecone
PINECONE_API_KEY=<YOUR API KEY HERE>
PINECONE_CLOUD=aws
PINECONE_REGION=us-east-1
For Qdrant:
Can use either local Qdrant instance or Qdrant Cloud. Configure in .env:
VECTOR_DB_TYPE=qdrant
QDRANT_URL=http://localhost:6333 # For local Qdrant
QDRANT_API_KEY=<YOUR API KEY HERE> # Optional for local, required for cloud
Upon successful analysis completion, user has an ability to rank and store the solution.
import pandas as pd
from bambooai import BambooAI
import plotly.io as pio
pio.renderers.default = 'jupyterlab'
df = pd.read_csv('training_activity_data.csv')
aux_data = [
'path/to/wellness_data.csv',
'path/to/nutrition_data.parquet',
]
bamboo = BambooAI(df=df, search_tool=True, planning=True)
bamboo.pd_agent_converse()
bamboo.pd_agent_converse("Calculate 30, 50, 75 and 90 percentiles of the heart rate column")
Web UI screenshot (Interactive Workflow Map):
BambooAI can be easily deployed using Docker, which provides a consistent environment regardless of your operating system or local setup.
For detailed Docker setup and usage instructions, please refer to our Docker Setup Wiki.
The Docker approach offers several advantages:
Prerequisites:
Install BambooAI:
pip install bambooai
Download web_app folder from repository
Configure environment:
cp .env.example <path_to_web_app>/.env
# Edit .env with your settings
Configure LLM agents, models and parameters
cp LLM_CONFIG_sample.json <path_to_web_app>/LLM_CONFIG.json
web_app/LLM_CONFIG.json in the web_app directory{
"agent_configs": [
{
"agent": "Code Generator",
"details": {
"model": "your-preferred-model",
"provider": "provider-name",
"max_tokens": 4000,
"temperature": 0
}
}
]
}
Run application:
cd <path_to_web_app>
python app.py
Clone repository:
git clone https://github.com/pgalko/BambooAI.git
cd BambooAI
Install dependencies:
pip install -r requirements.txt
Configure environment:
cp .env.example web_app/.env
# Edit .env with your settings
Configure LLM agents, models and parameters
cp LLM_CONFIG_sample.json web_app/LLM_CONFIG.json
web_app/LLM_CONFIG.json in the web_app directory{
"agent_configs": [
{
"agent": "Code Generator",
"details": {
"model": "your-preferred-model",
"provider": "provider-name",
"max_tokens": 4000,
"temperature": 0
}
}
]
}
Run application:
cd web_app
python app.py
Access web interface at http://localhost:5000 (5001 if using Docker)
Required variables in .env:
<VENDOR_NAME>_API_KEY: API keys for selected providersGEMINI_API_KEY: This needs to be set if you want to use the native Gemini web search tool (Grounding). You can alternatively use Selenium, however it is much slower and not as tightly integrated.PINECONE_API_KEY: Optional for vector databaseSERPER_API_KEY: Required for Selenium searchREMOTE_OLLAMA: Optional URL for remote Ollama serverREMOTE_VLLM: Optional URL for remote VLLM serverFLASK_SECRET: This is used to sign the session cookie for WebAppWEB_SEARCH_MODE: 'google_ai' to use Gemini native search tool, or 'selenium' to use selenium web driverSELENIUM_WEBDRIVER_PATH: Path to your Selenium WebDriver. This is required if you are using the 'selenium' web search mode.EXECUTION_MODE: 'local' to run the code executor locally, or 'api' to run the code executor on a remote server or container.EXECUTOR_API_BASE_URL: `URL of the remote code executor API. This is required if you are using the 'api' execution mode eg.http://192.168.1.201:5000The log for each Run/Thread is stored in logs/bambooai_run_log.json. The file gets overwriten when the new Thread starts.
Consolidated logs are stored in logs/bambooai_consolidated_log.json with 5MB size limit and 3-file rotation. Logged information includes:
For detailed evaluation report, see: Objective Assessment Report
Contributions are welcome via pull requests. Focus on maintaining code readability and conciseness.
This project is indexed with DeepWiki by Cognition Labs, providing developers with:
Access the project's full interactive documentation: DeepWiki pgalko/BambooAI
干净、强大、属于你的 AI Agent 平台 --AI agents, without the clutter.
Native macOS app to monitor Claude AI usage limits and watch your coding sessions live
An AI-powered custom node for ComfyUI designed to enhance workflow automation and provide intelligent assistance
npx CLI installing 100+ agents, commands, hooks, and integrations in one command