RLLib Baselines on Colab!
This Colab notebook allows you to train a full Flatland agent using the provided PPO baseline.
We have taken the repo from https://gitlab.aicrowd.com/flatland/neurips2020-flatland-baselines 16 and made it into a simple colab notebook.
All training scripts are also provided, so one can modify the configs and do runs of their own. Evaluation is also run and the script to calculate scores on an independent test set is also provided.
π NeurIPS 2020 Flatland Challenge - PPO RLlib BaselineΒΆ
Read the documentation to learn how to make your first submission in 10 minutes: https://flatland.aicrowd.com/getting-started/first-submission.html
- π Flatland documentation
- π NeurIPS 2020 Challenge
π¦ SetupΒΆ
!pip install tensorboard
## Setting up Conda Environment
wget https://repo.continuum.io/miniconda/$MINICONDA_INSTALLER_SCRIPT
# Install all packages for training
git clone http://gitlab.aicrowd.com/flatland/neurips2020-flatland-baselines.git
Cloning into 'neurips2020-flatland-baselines'...
%cd neurips2020-flatland-baselines
conda env create -n flatland-paper -f environment-cpu.yml
source activate flatland-paper
conda install -y ipykernel
Solving environment: ...working... done # All requested packages already installed.
==> WARNING: A newer version of conda exists. <== current version: 4.5.4 latest version: 4.8.5 Please update conda by running $ conda update -n base conda
π TrainingΒΆ
source activate flatland-paper
python train.py --help
A value of 0 (default) disables checkpointing. --checkpoint-at-end Whether to checkpoint at the end of the experiment. Default is False. --no-sync-on-checkpoint Disable sync-down of trial checkpoint, which is enabled by default to guarantee recoverability. If set, checkpoint syncing from worker to driver is asynchronous. Set this only if synchronous checkpointing is too slow and trial restoration failures can be tolerated --keep-checkpoints-num KEEP_CHECKPOINTS_NUM Number of best checkpoints to keep. Others get deleted. Default (None) keeps all checkpoints. --checkpoint-score-attr CHECKPOINT_SCORE_ATTR Specifies by which attribute to rank the best checkpoint. Default is increasing order. If attribute starts with min- it will rank attribute in decreasing order. Example: min-validation_loss --export-formats EXPORT_FORMATS List of formats that exported at the end of the experiment. Default is None. Training example: python ./train.py --run DQN --env CartPole-v0 --no-log-flatland-stats Training with Config: python ./train.py -f experiments/flatland_random_sparse_small/global_obs/ppo.yaml Note that -f overrides all other trial-specific command-line options.
# Replace num_workers to match current system. Number of workers should be atmost #CPU Cores - 1
# If doing evaluation, we should further reduce the workers by the number of evaluation workers
!sed 's/num_workers: 2/num_workers: 1/g' experiments/tests/global_obs_ppo.yaml > global_obs_ppo.yaml
# Note the argument --bind-all is needed whenever we are running in Google Colab
source activate flatland-paper
# Try to run a small test to see if rllib training is working
python ./train.py -f global_obs_ppo.yaml --bind-all
- Successfully Loaded Generator Config small_stoch_v1 from small_stoch_v1.yaml - Successfully Loaded Generator Config 32x32_v0 from 32x32_v0.yaml - Successfully Loaded Generator Config small_stoch_v0 from small_stoch_v0.yaml - Successfully Loaded Generator Config medium_stoch_v2 from medium_stoch_v2.yaml - Successfully Loaded Generator Config large_stoch_v0 from large_stoch_v0.yaml - Successfully Loaded Generator Config small_v0 from small_v0.yaml - Successfully Loaded Generator Config small_single_v0 from small_single_v0.yaml - Successfully Loaded Generator Config small_double_v0 from small_double_v0.yaml - Successfully Loaded Generator Config adrian_v0 from adrian_v0.yaml - Successfully Loaded Generator Config small_triple_v0 from small_triple_v0.yaml - Successfully Loaded Generator Config medium_stoch_v1 from medium_stoch_v1.yaml - Successfully Loaded Evaluation Config test_render from test_render.yaml - Successfully Loaded Evaluation Config default from default.yaml - Successfully Loaded Evaluation Config default_render from default_render.yaml - Successfully Loaded Evaluation Config enable_explore from enable_explore.yaml - Successfully Loaded Observation class TreeObs from tree_obs.py - Successfully Loaded Observation class LocalConflictObs from local_conflict_obs.py - Successfully Loaded Observation class Utils from utils.py - Successfully Loaded Observation class CombinedObs from combined_obs.py - Successfully Loaded Observation class ForwardActionObs from forward_action_obs.py - Successfully Loaded Observation class NewTreeObs from new_tree_obs.py - Successfully Loaded Observation class NewTreeObsBuilder from new_tree_obs_builder.py - Successfully Loaded Observation class RandomActionObs from random_action_obs.py - Successfully Loaded Observation class ShortestPathObs from shortest_path_obs.py - Successfully Loaded Observation class GlobalObs from global_obs.py - Successfully Loaded Observation class ShortestPathActionObs from shortest_path_action_obs.py - Successfully Loaded Observation class GlobalDensityObs from global_density_obs.py - Successfully Loaded Environment class FlatlandRandomSparseSmall from flatland_random_sparse_small.py - Successfully Loaded Environment class FlatlandBase from flatland_base.py - Successfully Loaded Environment class FlatlandSingle from flatland_single.py - Successfully Loaded Environment class FlatlandSparse from flatland_sparse.py - Successfully Loaded Model class CustomLossModel from custom_loss_model.py - Successfully Loaded Model class GlobalDensObsModel from global_dens_obs_model.py - Successfully Loaded Model class CcTransformer from cc_transformer.py - Successfully Loaded Model class CcConcatenate from cc_concatenate.py - Successfully Loaded Model class FullyConnectedModel from fully_connected_model.py - Successfully Loaded Model class GlobalObsModel from global_obs_model.py == Status == Memory usage on this node: 1.4/12.7 GiB Using FIFO scheduling algorithm. π TensorBoardΒΆ
%load_ext tensorboard
%tensorboard --logdir ~/ray_results
Output hidden; open in https://colab.research.google.com to view.
# Now we can run a full training after changing number of workers
# For demonstration purpose we also reduce the time steps to 15000
sed 's/num_workers: 13/num_workers: 1/g' baselines/action_masking_and_skipping/ppo_tree_obs_small_v0.yaml \
| sed 's/num_envs_per_worker: 5/num_envs_per_worker: 2/g' \
| sed 's/timesteps_total: 15000000/timesteps_total: 15000/g' > ppo-tree-obs-small-v0.yaml
# Note the argument --bind-all is needed whenever we are running in Google Colab
source activate flatland-paper
# Run rllib training
python ./train.py -f ppo-tree-obs-small-v0.yaml --bind-all
βΈRLLib Training LifeCycleΒΆ
We also officially support saving training metrics, graphs, checkpoints, system runtime , experiment code etc in the experiment tracking tool Weights and Biases (w\&b). This also ensures all our experiments are transparent and easily reproducible.
The flatland metrics such as mean percentage completion, normalised reward and reward can be easily monitored
Evaluation can also be done simultaneously with training with a fixed periodicity. To use default evaluation settings one can just add a -e flag as follows
python train.py -ef ppo_tree_obs_small_v0.yaml
A sample recording in w\&b can be viewed here.
One can also specify a custom evaluation config in a yaml file similar to the training configs.
The flatland environment has also been suitably adapted to support saving video recording using the OpenAI's gym monitor. This has been integrated into rllib and one can directly upload these saved videos into w\&b during the training process. Video recording can slow down training considerably, so by default we only save videos of 5 episodes run during evaluation after every 50 training iterations. To use default evaluation and recording settings one can just add -er flag as follows
python train.py -erf ppo_tree_obs_small_v0.yaml
Just like evaluation one can also specify a custom config for recording in a yaml file.
Once we have all of these things,it is very convenient to track our training process directly in w\&b which has separate sections for training, evaluation and media section for recorded videos To save the checkpoints to w\&b one can just add a -s flag as follows
python train.py -ersf ppo_tree_obs_small_v0.yaml
The saved checkpoints can then be downloaded from weights and biases.
Once the training and evaluation is done, we can select the checkpoint corresponding to the best evaluation scores to run inference on a sample of 50 independent flatland environments:
python rollout.py <path-to-checkpoint> --run PPO --episode=50
Refer to the rollout scripts here for the different baseline RLLib runs
π RLib Training Scripts for baselinesΒΆ
We run a range of baselines from apex, ppo , imitation learning approaches as explained here. The train scripts for each of them is shown below. The results for the training are also shown in the subsequent sections.
MODEL | Training Script |
APEX FIXED IL(25%) | train.py -ef baselines/imitation_learning_tree_obs/apex_il_tree_obs_25.yaml |
APEX FIXED IL(100%) | train.py -ef baselines/imitation_learning_tree_obs/apex_pure_il.yaml |
APEX | train.py -ef baselines/action_masking_and_skipping/apex_tree_obs_small_v0.yaml |
APEX SKIP | train.py -ef baselines/action_masking_and_skipping/apex_tree_obs_small_v0_skip.yaml |
CCPPO | train.py -ef baselines/ccppo_tree_obs/ccppo.yaml |
CCPPO BASE | train.py -ef baselines/ccppo_tree_obs/ccppo_base.yaml |
MARWIL FIXED IL(100%) | train.py -ef baselines/imitation_learning_tree_obs/marwil_tree_obs_all_beta.yaml |
PPO + Online IL(50%) | train.py -ief baselines/custom_imitation_learning_rllib_tree_obs/ppo_imitation_tree_obs.yaml --eager --trace |
PPO | train.py -ef baselines/action_masking_and_skipping/ppo_tree_obs_small_v0.yaml |
PPO MASKING | train.py -ef baselines/action_masking_and_skipping/ppo_tree_obs_small_v0_mask.yaml |
PPO SKIP | train.py -ef baselines/action_masking_and_skipping/ppo_tree_obs_small_v0_skip.yaml |
APEX Global density | train.py -ef baselines/global_density_obs/sparse_small_apex_expdecay_maxt1000.yaml |
Online IL(100%) | train.py -ef baselines/custom_imitation_learning_rllib_tree_obs/pure_imitation_tree_obs.yaml --eager --trace |
β³οΈ ResultsΒΆ
All Baselines run configs can be found here. More information on each of the runs can be found in the πFlatland RLLib Baselines documentation
Checkpoints with the best evaluation normalized reward score for various runs can be found here
π Train, Evaluation and Test ResultsΒΆ
Note that these runs were based on the older flatland version 2.2.1 and the code for that can be found in the flatland-paper-baselines branch. Test results were calculated using this rollout script
The Training and Evaluation metrics and charts for all the runs can be found in the w\&b link here
MODEL | Train | Best Evaluation | Test | |||
% Complete | Reward | % Complete | Reward | % Complete | Reward | |
APEX FIXED IL(25%) | 90.45Β±0.4 | -0.18Β±0 | 89.18Β±2.44 | -0.18Β±0.02 | 86Β±1.44 | -0.22Β±0.01 |
APEX FIXED IL(100%) | 23.8Β±11.04 | -0.79Β±0.09 | 22.93Β±11.83 | -0.84Β±0.08 | ||
APEX | 90.38Β±1.64 | -0.2Β±0.02 | 85.33Β±6.11 | -0.22Β±0.05 | 80.93Β±5.45 | -0.32Β±0.04 |
APEX SKIP | 89.51Β±1.09 | -0.21Β±0.01 | 84Β±4 | -0.22Β±0.04 | 79.73Β±0.92 | -0.33Β±0.01 |
CCPPO | 87.72Β±2.37 | -0.2Β±0.02 | 84.67Β±4.01 | -0.23Β±0.04 | 71.87Β±3.7 | -0.35Β±0.03 |
CCPPO BASE | 83.21Β±1.47 | -0.25Β±0.01 | 83.2Β±0.8 | -0.25Β±0.02 | 76.27Β±6.96 | -0.31Β±0.06 |
MARWIL FIXED IL(100%) | 100Β±0 | -0.04Β±0.01 | 72.4Β±3.27 | -0.35Β±0.02 | ||
PPO + Online IL(50%) | 83.46Β±1.09 | -0.23Β±0.01 | 100Β±0 | -0.07Β±0 | 71.47Β±4.01 | -0.35Β±0.04 |
PPO | 94.78Β±0.29 | -0.13Β±0.01 | 98.67Β±2.31 | -0.09Β±0.02 | 81.33Β±5.86 | -0.26Β±0.05 |
PPO MASKING | 93.4Β±0.27 | -0.15Β±0 | 90.67Β±8.33 | -0.16Β±0.07 | 80.53Β±9.59 | -0.28Β±0.09 |
PPO SKIP | 93.48Β±0.66 | -0.15Β±0.01 | 100Β±0 | -0.08Β±0.01 | 82.67Β±5.79 | -0.26Β±0.05 |
APEX Global density | 57.87Β±1.85 | -0.51Β±0.01 | 60Β±10.58 | -0.45Β±0.11 | 34.4Β±9.23 | -0.71Β±0.07 |
Online IL(100%) | 100Β±0 | -0.06Β±0.01 | 80Β±3.27 | -0.27Β±0.03 |
π Submitting Solution to Challenge!!!ΒΆ
Thanks to the efforts of our partners Deutsche Bahn and Instadeep, you can now submit the CCPPO baseline out of the box: https://gitlab.aicrowd.com/GereonVienken/db_flatland_example 2
This RL method reaches a score of 76.232 on the leaderboard! πͺπ»
You should be able to use the same approach with the other RLlib baselines as well. Make sure to give your best performing checkpoints in the submission. Thanks to our partners and especially to @GereonVienken who contributed this baseline and submission repository!
You must login before you can post a comment.