DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
ICLR 2025

🤖 DistRL Setup Guide

This is the code repo for Paper DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents

Paper link https://arxiv.org/pdf/2410.14803

Website and Demo: https://ai-agents-2030.github.io/DistRL/

We will release the Model Weights and Dataset later.

🍩 Features

Framework Capabilities

Flexible Integration of Agents and Models
Scalable Data Collection Tools
Efficient Online Training with Multi-Machine Environment Support (utilizing heterogeneous workers and GPUs)

Methodology Highlights

Supported Training Module as detailed in the paper:
- DistRL: An efficient reinforcement learning algorithm tailored for distributed environments.
- Baseline: DigiRL (features automatic curriculum learning with doubly robust estimator filtering).
- Baseline: Filtered Behavior Cloning (employs reward-based filtering).
Agent Support:
- AutoUI: Comprehensive support for both training and evaluation phases.
Android-in-the-Wild Task Sets:
- AitW General: Tasks involving general browsing and app launching.
- AitW Web Shopping: Tasks centered around shopping on popular e-commerce websites.
- Additional Evaluations: We've assessed generalization capabilities on other AitW subsets, such as App Install, although the training environments for these were not meticulously configured.
DDP Multi-GPU Training Support:
- Multi-GPU training is facilitated through accelerate. If you're operating with a single GPU, this feature can be disabled. Running AutoUI with the DistRL algorithm requires only 15GB of GPU memory. This support is provided should you wish to experiment with more extensive setups.

✅ Quick Start

A. Dependencies

Please check the requirements.txt file for all necessary dependencies.

B. Before You Start

Create Necessary Directories: Set up the required directories as specified in the configuration .yaml files (e.g., Tmp path, agg_traj path, save_path, etc.).
Update Tokens: Replace placeholders with your actual tokens in the configuration files in scripts/config:
- huggingface_token
- wandb_token
- gemini_token
- ... etc.
Review and Enhance Prompts: Clear and well-structured prompts are essential for improving the evaluator's performance in assessing task completion. By crafting precise and detailed prompts, we can guide the model to produce more accurate and reliable evaluations. We have provided demonstration examples in data/environment/android/prompts.txt for your reference. Please do adjust data/environment/android/evaluate.py based on our hints and comments.

C. Android Environment Setup

To set up the Android environment for the DistRL to interact with, refer to the environment. Before moving on, you should be able to view this script.

D. Running Experiments

D.1 Weights & Biases (Wandb) Setup

Weights & Biases is a tool for tracking machine learning experiments. To integrate Wandb into our framework:

Create an Account: If you don't already have one, sign up for a free Wandb account.
Install Wandb: Ensure Wandb is installed by running pip install wandb.
Login: Authenticate your Wandb account by running wandb login and entering your API key when prompted.
Configure Wandb in the Framework: Update your wandb_token in the configuration files with your API key.

For more detailed instructions, refer to the Wandb Quickstart Guide.

D.2 Entry Point of the Framework

The main entry point of the program is the run.py script. You can specify different experiments by passing the configuration name. Configuration files are located in the scripts/config/ directory.

Setup Steps:

Set Up Conda Environment:
- Install Miniconda.
- Create a new environment named distrl:
  hljs language-bash
```
conda create -n distrl python=3.8
conda activate distrl
```
Clone the Repository:
- Clone the repository and check out the master branch:
  hljs language-bash
```
git clone <repository_url>
cd <repository_directory>
git checkout master
```
- Install the package:
  hljs language-bash
```
pip install -e .
```
Set Up the Environment:
- Follow the Environment Setup Guide to configure the environment for Android emulator.
Test the Setup:
- Set the configurations in the files multimachine/default.yaml and multimachine/worker.yaml.
- Download the host policy files from here and unzip into the worker's save_path defined in the config file multimachine/worker.yaml, default as /home/<usrname>/logs/worker.
- Run the run.py script with the worker configuration to test:
hljs language-bash
```
CUDA_VISIBLE_DEVICES=0 python scripts/run.py --config-path config/multimachine --config-name worker +thread_id=0
```

Run from host machine:

hljs language-bash

accelerate launch --config_file config/accelerate_config/default_config.yaml scripts/run.py --config-path config/multimachine --config-name host

📄 License

All content of this work is under Apache License v2.0, including codebase, data, and model checkpoints.

📚 Citation

Consider citing our paper!

hljs language-ini

@article{wang2024distrl,
  title={Distrl: An asynchronous distributed reinforcement learning framework for on-device control agents},
  author={Wang, Taiyi and Wu, Zhihao and Liu, Jianheng and Hao, Jianye and Wang, Jun and Shao, Kun},
  journal={arXiv preprint arXiv:2410.14803},
  year={2024}
}

DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
ICLR 2025

🤖 DistRL Setup Guide

This is the code repo for Paper DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents

Paper link https://arxiv.org/pdf/2410.14803

Website and Demo: https://ai-agents-2030.github.io/DistRL/

We will release the Model Weights and Dataset later.

🍩 Features

Framework Capabilities

Flexible Integration of Agents and Models
Scalable Data Collection Tools
Efficient Online Training with Multi-Machine Environment Support (utilizing heterogeneous workers and GPUs)

Methodology Highlights

Supported Training Module as detailed in the paper:
- DistRL: An efficient reinforcement learning algorithm tailored for distributed environments.
- Baseline: DigiRL (features automatic curriculum learning with doubly robust estimator filtering).
- Baseline: Filtered Behavior Cloning (employs reward-based filtering).
Agent Support:
- AutoUI: Comprehensive support for both training and evaluation phases.
Android-in-the-Wild Task Sets:
- AitW General: Tasks involving general browsing and app launching.
- AitW Web Shopping: Tasks centered around shopping on popular e-commerce websites.
- Additional Evaluations: We've assessed generalization capabilities on other AitW subsets, such as App Install, although the training environments for these were not meticulously configured.
DDP Multi-GPU Training Support:
- Multi-GPU training is facilitated through accelerate. If you're operating with a single GPU, this feature can be disabled. Running AutoUI with the DistRL algorithm requires only 15GB of GPU memory. This support is provided should you wish to experiment with more extensive setups.

✅ Quick Start

A. Dependencies

Please check the requirements.txt file for all necessary dependencies.

B. Before You Start

Create Necessary Directories: Set up the required directories as specified in the configuration .yaml files (e.g., Tmp path, agg_traj path, save_path, etc.).
Update Tokens: Replace placeholders with your actual tokens in the configuration files in scripts/config:
- huggingface_token
- wandb_token
- gemini_token
- ... etc.
Review and Enhance Prompts: Clear and well-structured prompts are essential for improving the evaluator's performance in assessing task completion. By crafting precise and detailed prompts, we can guide the model to produce more accurate and reliable evaluations. We have provided demonstration examples in data/environment/android/prompts.txt for your reference. Please do adjust data/environment/android/evaluate.py based on our hints and comments.

C. Android Environment Setup

To set up the Android environment for the DistRL to interact with, refer to the environment. Before moving on, you should be able to view this script.

D. Running Experiments

D.1 Weights & Biases (Wandb) Setup

Weights & Biases is a tool for tracking machine learning experiments. To integrate Wandb into our framework:

Create an Account: If you don't already have one, sign up for a free Wandb account.
Install Wandb: Ensure Wandb is installed by running pip install wandb.
Login: Authenticate your Wandb account by running wandb login and entering your API key when prompted.
Configure Wandb in the Framework: Update your wandb_token in the configuration files with your API key.

For more detailed instructions, refer to the Wandb Quickstart Guide.

D.2 Entry Point of the Framework

Setup Steps:

Set Up Conda Environment:
- Install Miniconda.
- Create a new environment named distrl:
  hljs language-bash
```
conda create -n distrl python=3.8
conda activate distrl
```
Clone the Repository:
- Clone the repository and check out the master branch:
  hljs language-bash
```
git clone <repository_url>
cd <repository_directory>
git checkout master
```
- Install the package:
  hljs language-bash
```
pip install -e .
```
Set Up the Environment:
- Follow the Environment Setup Guide to configure the environment for Android emulator.
Test the Setup:
- Set the configurations in the files multimachine/default.yaml and multimachine/worker.yaml.
- Download the host policy files from here and unzip into the worker's save_path defined in the config file multimachine/worker.yaml, default as /home/<usrname>/logs/worker.
- Run the run.py script with the worker configuration to test:
hljs language-bash
```
CUDA_VISIBLE_DEVICES=0 python scripts/run.py --config-path config/multimachine --config-name worker +thread_id=0
```

Run from host machine:

hljs language-bash

accelerate launch --config_file config/accelerate_config/default_config.yaml scripts/run.py --config-path config/multimachine --config-name host

📄 License

All content of this work is under Apache License v2.0, including codebase, data, and model checkpoints.

📚 Citation

Consider citing our paper!

hljs language-ini

@article{wang2024distrl,
  title={Distrl: An asynchronous distributed reinforcement learning framework for on-device control agents},
  author={Wang, Taiyi and Wu, Zhihao and Liu, Jianheng and Hao, Jianye and Wang, Jun and Shao, Kun},
  journal={arXiv preprint arXiv:2410.14803},
  year={2024}
}

distrl-open

DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
ICLR 2025

🤖 DistRL Setup Guide

🍩 Features

Framework Capabilities

Methodology Highlights

✅ Quick Start

A. Dependencies

B. Before You Start

C. Android Environment Setup

D. Running Experiments

D.1 Weights & Biases (Wandb) Setup

D.2 Entry Point of the Framework

📄 License

📚 Citation

Similar Packages

distrl-open

DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
ICLR 2025

🤖 DistRL Setup Guide

🍩 Features

Framework Capabilities

Methodology Highlights

✅ Quick Start

A. Dependencies

B. Before You Start

C. Android Environment Setup

D. Running Experiments

D.1 Weights & Biases (Wandb) Setup

D.2 Entry Point of the Framework

📄 License

📚 Citation

Similar Packages

distrl-open

DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents ICLR 2025

🤖 DistRL Setup Guide

🍩 Features

Framework Capabilities

Methodology Highlights

✅ Quick Start

A. Dependencies

B. Before You Start

C. Android Environment Setup

D. Running Experiments

D.1 Weights & Biases (Wandb) Setup

D.2 Entry Point of the Framework

📄 License

📚 Citation

Similar Packages

distrl-open

DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents ICLR 2025

🤖 DistRL Setup Guide

🍩 Features

Framework Capabilities

Methodology Highlights

✅ Quick Start

A. Dependencies

B. Before You Start

C. Android Environment Setup

D. Running Experiments

D.1 Weights & Biases (Wandb) Setup

D.2 Entry Point of the Framework

📄 License

📚 Citation

Similar Packages

DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
ICLR 2025

DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
ICLR 2025