Flatland
RLLib Baselines on Colab!
This Colab notebook allows you to train a full Flatland agent using the provided PPO baseline.
We have taken the repo from https://gitlab.aicrowd.com/flatland/neurips2020-flatland-baselines 16 and made it into a simple colab notebook.
All training scripts are also provided, so one can modify the configs and do runs of their own. Evaluation is also run and the script to calculate scores on an independent test set is also provided.
🚂 NeurIPS 2020 Flatland Challenge - PPO RLlib Baseline¶
This Colab notebook allows you to train a full Flatland agent using the provided PPO baseline.
Read the documentation to learn how to make your first submission in 10 minutes: https://flatland.aicrowd.com/getting-started/first-submission.html
📦 Setup¶
!pip install tensorboard
%%bash
## Setting up Conda Environment
MINICONDA_INSTALLER_SCRIPT=Miniconda3-4.5.4-Linux-x86_64.sh
MINICONDA_PREFIX=/usr/local
wget https://repo.continuum.io/miniconda/$MINICONDA_INSTALLER_SCRIPT
chmod +x $MINICONDA_INSTALLER_SCRIPT
./$MINICONDA_INSTALLER_SCRIPT -b -f -p $MINICONDA_PREFIX
%%bash
# Install all packages for training
git clone http://gitlab.aicrowd.com/flatland/neurips2020-flatland-baselines.git
Cloning into 'neurips2020-flatland-baselines'...
%cd neurips2020-flatland-baselines
/content/neurips2020-flatland-baselines/neurips2020-flatland-baselines
%%bash
conda env create -n flatland-paper -f environment-cpu.yml
%%bash
source activate flatland-paper
conda install -y ipykernel
Solving environment: ...working... done # All requested packages already installed.
==> WARNING: A newer version of conda exists. <== current version: 4.5.4 latest version: 4.8.5 Please update conda by running $ conda update -n base conda
🚂 Training¶
%%bash
source activate flatland-paper
python train.py --help
- Successfully Loaded Generator Config small_stoch_v1 from small_stoch_v1.yaml - Successfully Loaded Generator Config 32x32_v0 from 32x32_v0.yaml - Successfully Loaded Generator Config small_stoch_v0 from small_stoch_v0.yaml - Successfully Loaded Generator Config medium_stoch_v2 from medium_stoch_v2.yaml - Successfully Loaded Generator Config large_stoch_v0 from large_stoch_v0.yaml - Successfully Loaded Generator Config small_v0 from small_v0.yaml - Successfully Loaded Generator Config small_single_v0 from small_single_v0.yaml - Successfully Loaded Generator Config small_double_v0 from small_double_v0.yaml - Successfully Loaded Generator Config adrian_v0 from adrian_v0.yaml - Successfully Loaded Generator Config small_triple_v0 from small_triple_v0.yaml - Successfully Loaded Generator Config medium_stoch_v1 from medium_stoch_v1.yaml - Successfully Loaded Evaluation Config test_render from test_render.yaml - Successfully Loaded Evaluation Config default from default.yaml - Successfully Loaded Evaluation Config default_render from default_render.yaml - Successfully Loaded Evaluation Config enable_explore from enable_explore.yaml - Successfully Loaded Observation class TreeObs from tree_obs.py - Successfully Loaded Observation class LocalConflictObs from local_conflict_obs.py - Successfully Loaded Observation class Utils from utils.py - Successfully Loaded Observation class CombinedObs from combined_obs.py - Successfully Loaded Observation class ForwardActionObs from forward_action_obs.py - Successfully Loaded Observation class NewTreeObs from new_tree_obs.py - Successfully Loaded Observation class NewTreeObsBuilder from new_tree_obs_builder.py - Successfully Loaded Observation class RandomActionObs from random_action_obs.py - Successfully Loaded Observation class ShortestPathObs from shortest_path_obs.py - Successfully Loaded Observation class GlobalObs from global_obs.py - Successfully Loaded Observation class ShortestPathActionObs from shortest_path_action_obs.py - Successfully Loaded Observation class GlobalDensityObs from global_density_obs.py - Successfully Loaded Environment class FlatlandRandomSparseSmall from flatland_random_sparse_small.py - Successfully Loaded Environment class FlatlandBase from flatland_base.py - Successfully Loaded Environment class FlatlandSingle from flatland_single.py - Successfully Loaded Environment class FlatlandSparse from flatland_sparse.py - Successfully Loaded Model class CustomLossModel from custom_loss_model.py - Successfully Loaded Model class GlobalDensObsModel from global_dens_obs_model.py - Successfully Loaded Model class CcTransformer from cc_transformer.py - Successfully Loaded Model class CcConcatenate from cc_concatenate.py - Successfully Loaded Model class FullyConnectedModel from fully_connected_model.py - Successfully Loaded Model class GlobalObsModel from global_obs_model.py usage: train.py [-h] [--run RUN] [--stop STOP] [--config CONFIG] [--resources-per-trial RESOURCES_PER_TRIAL] [--num-samples NUM_SAMPLES] [--checkpoint-freq CHECKPOINT_FREQ] [--checkpoint-at-end] [--no-sync-on-checkpoint] [--keep-checkpoints-num KEEP_CHECKPOINTS_NUM] [--checkpoint-score-attr CHECKPOINT_SCORE_ATTR] [--export-formats EXPORT_FORMATS] [--max-failures MAX_FAILURES] [--scheduler SCHEDULER] [--scheduler-config SCHEDULER_CONFIG] [--restore RESTORE] [--ray-address RAY_ADDRESS] [--ray-num-cpus RAY_NUM_CPUS] [--ray-num-gpus RAY_NUM_GPUS] [--ray-num-nodes RAY_NUM_NODES] [--ray-redis-max-memory RAY_REDIS_MAX_MEMORY] [--ray-memory RAY_MEMORY] [--ray-object-store-memory RAY_OBJECT_STORE_MEMORY] [--experiment-name EXPERIMENT_NAME] [--local-dir LOCAL_DIR] [--upload-dir UPLOAD_DIR] [-v] [-vv] [--resume] [--torch] [--eager] [--trace] [--log-flatland-stats] [-e] [-i] [-r] [-s] [--bind-all] [--env ENV] [--queue-trials] [-f CONFIG_FILE] Train a reinforcement learning agent. optional arguments: -h, --help show this help message and exit --run RUN The algorithm or model to train. This may refer to the name of a built-on algorithm (e.g. RLLib's DQN or PPO), or a user-defined trainable function or class registered in the tune registry. --stop STOP The stopping criteria, specified in JSON. The keys may be any field returned by 'train()' e.g. '{"time_total_s": 600, "training_iteration": 100000}' to stop after 600 seconds or 100k iterations, whichever is reached first. --config CONFIG Algorithm-specific configuration (e.g. env, hyperparams), specified in JSON. --resources-per-trial RESOURCES_PER_TRIAL Override the machine resources to allocate per trial, e.g. '{"cpu": 64, "gpu": 8}'. Note that GPUs will not be assigned unless you specify them here. For RLlib, you probably want to leave this alone and use RLlib configs to control parallelism. --num-samples NUM_SAMPLES Number of times to repeat each trial. --checkpoint-freq CHECKPOINT_FREQ How many training iterations between checkpoints. A value of 0 (default) disables checkpointing. --checkpoint-at-end Whether to checkpoint at the end of the experiment. Default is False. --no-sync-on-checkpoint Disable sync-down of trial checkpoint, which is enabled by default to guarantee recoverability. If set, checkpoint syncing from worker to driver is asynchronous. Set this only if synchronous checkpointing is too slow and trial restoration failures can be tolerated --keep-checkpoints-num KEEP_CHECKPOINTS_NUM Number of best checkpoints to keep. Others get deleted. Default (None) keeps all checkpoints. --checkpoint-score-attr CHECKPOINT_SCORE_ATTR Specifies by which attribute to rank the best checkpoint. Default is increasing order. If attribute starts with min- it will rank attribute in decreasing order. Example: min-validation_loss --export-formats EXPORT_FORMATS List of formats that exported at the end of the experiment. Default is None. For RLlib, 'checkpoint' and 'model' are supported for TensorFlow policy graphs. --max-failures MAX_FAILURES Try to recover a trial from its last checkpoint at least this many times. Only applies if checkpointing is enabled. --scheduler SCHEDULER FIFO (default), MedianStopping, AsyncHyperBand, HyperBand, or HyperOpt. --scheduler-config SCHEDULER_CONFIG Config options to pass to the scheduler. --restore RESTORE If specified, restore from this checkpoint. --ray-address RAY_ADDRESS Connect to an existing Ray cluster at this address instead of starting a new one. --ray-num-cpus RAY_NUM_CPUS --num-cpus to use if starting a new cluster. --ray-num-gpus RAY_NUM_GPUS --num-gpus to use if starting a new cluster. --ray-num-nodes RAY_NUM_NODES Emulate multiple cluster nodes for debugging. --ray-redis-max-memory RAY_REDIS_MAX_MEMORY --redis-max-memory to use if starting a new cluster. --ray-memory RAY_MEMORY --memory to use if starting a new cluster. --ray-object-store-memory RAY_OBJECT_STORE_MEMORY --object-store-memory to use if starting a new cluster. --experiment-name EXPERIMENT_NAME Name of the subdirectory under `local_dir` to put results in. --local-dir LOCAL_DIR Local dir to save training results to. Defaults to '/root/ray_results'. --upload-dir UPLOAD_DIR Optional URI to sync training results to (e.g. s3://bucket). -v Whether to use INFO level logging. -vv Whether to use DEBUG level logging. --resume Whether to attempt to resume previous Tune experiments. --torch Whether to use PyTorch (instead of tf) as the DL framework. --eager Whether to attempt to enable TF eager execution. --trace Whether to attempt to enable tracing for eager mode. --log-flatland-stats Whether to log additional flatland specfic metrics such as percentage complete or normalized score. -e, --eval Whether to run evaluation. Default evaluation config is default.yaml to use custom evaluation config set (eval_generator:test_eval) under configs -i, --custom-fn Whether the experiment uses a custom function for trainingDefault custom function is imitation_ppo_train_fn -r, --record Whether the experiment requires video recording during evaluationDefault evaluation config is default_render.yaml Can also be done via custom evaluation config set (eval_generator:test_render) under configs -s, --save-checkpoint Whether the experiment will save the checkpoints to weights and biases --bind-all Whether to expose on network (binding on all network interfaces). --env ENV The gym environment to use. --queue-trials Whether to queue trials when the cluster does not currently have enough resources to launch one. This should be set to True when running on an autoscaling cluster to enable automatic scale-up. -f CONFIG_FILE, --config-file CONFIG_FILE If specified, use config options from this file. Note that this overrides any trial-specific options set via flags above. Training example: python ./train.py --run DQN --env CartPole-v0 --no-log-flatland-stats Training with Config: python ./train.py -f experiments/flatland_random_sparse_small/global_obs/ppo.yaml Note that -f overrides all other trial-specific command-line options.
# Replace num_workers to match current system. Number of workers should be atmost #CPU Cores - 1
# If doing evaluation, we should further reduce the workers by the number of evaluation workers
!sed 's/num_workers: 2/num_workers: 1/g' experiments/tests/global_obs_ppo.yaml > global_obs_ppo.yaml
%%bash
# Note the argument --bind-all is needed whenever we are running in Google Colab
source activate flatland-paper
# Try to run a small test to see if rllib training is working
python ./train.py -f global_obs_ppo.yaml --bind-all
- Successfully Loaded Generator Config small_stoch_v1 from small_stoch_v1.yaml - Successfully Loaded Generator Config 32x32_v0 from 32x32_v0.yaml - Successfully Loaded Generator Config small_stoch_v0 from small_stoch_v0.yaml - Successfully Loaded Generator Config medium_stoch_v2 from medium_stoch_v2.yaml - Successfully Loaded Generator Config large_stoch_v0 from large_stoch_v0.yaml - Successfully Loaded Generator Config small_v0 from small_v0.yaml - Successfully Loaded Generator Config small_single_v0 from small_single_v0.yaml - Successfully Loaded Generator Config small_double_v0 from small_double_v0.yaml - Successfully Loaded Generator Config adrian_v0 from adrian_v0.yaml - Successfully Loaded Generator Config small_triple_v0 from small_triple_v0.yaml - Successfully Loaded Generator Config medium_stoch_v1 from medium_stoch_v1.yaml - Successfully Loaded Evaluation Config test_render from test_render.yaml - Successfully Loaded Evaluation Config default from default.yaml - Successfully Loaded Evaluation Config default_render from default_render.yaml - Successfully Loaded Evaluation Config enable_explore from enable_explore.yaml - Successfully Loaded Observation class TreeObs from tree_obs.py - Successfully Loaded Observation class LocalConflictObs from local_conflict_obs.py - Successfully Loaded Observation class Utils from utils.py - Successfully Loaded Observation class CombinedObs from combined_obs.py - Successfully Loaded Observation class ForwardActionObs from forward_action_obs.py - Successfully Loaded Observation class NewTreeObs from new_tree_obs.py - Successfully Loaded Observation class NewTreeObsBuilder from new_tree_obs_builder.py - Successfully Loaded Observation class RandomActionObs from random_action_obs.py - Successfully Loaded Observation class ShortestPathObs from shortest_path_obs.py - Successfully Loaded Observation class GlobalObs from global_obs.py - Successfully Loaded Observation class ShortestPathActionObs from shortest_path_action_obs.py - Successfully Loaded Observation class GlobalDensityObs from global_density_obs.py - Successfully Loaded Environment class FlatlandRandomSparseSmall from flatland_random_sparse_small.py - Successfully Loaded Environment class FlatlandBase from flatland_base.py - Successfully Loaded Environment class FlatlandSingle from flatland_single.py - Successfully Loaded Environment class FlatlandSparse from flatland_sparse.py - Successfully Loaded Model class CustomLossModel from custom_loss_model.py - Successfully Loaded Model class GlobalDensObsModel from global_dens_obs_model.py - Successfully Loaded Model class CcTransformer from cc_transformer.py - Successfully Loaded Model class CcConcatenate from cc_concatenate.py - Successfully Loaded Model class FullyConnectedModel from fully_connected_model.py - Successfully Loaded Model class GlobalObsModel from global_obs_model.py == Status == Memory usage on this node: 1.4/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.49 GiB objects Result logdir: /root/ray_results/flatland-sparse-global-conv-ppo Number of trials: 1 (1 RUNNING) +---------------------------+----------+-------+ | Trial name | status | loc | |---------------------------+----------+-------| | PPO_flatland_sparse_00000 | RUNNING | | +---------------------------+----------+-------+ (pid=10684) 2020-10-10 10:27:52,884 INFO trainer.py:421 -- Tip: set 'eager': true or the --eager flag to enable TensorFlow eager execution (pid=10684) - Successfully Loaded Generator Config small_stoch_v1 from small_stoch_v1.yaml (pid=10684) - Successfully Loaded Generator Config 32x32_v0 from 32x32_v0.yaml (pid=10684) - Successfully Loaded Generator Config small_stoch_v0 from small_stoch_v0.yaml (pid=10684) - Successfully Loaded Generator Config medium_stoch_v2 from medium_stoch_v2.yaml (pid=10684) - Successfully Loaded Generator Config large_stoch_v0 from large_stoch_v0.yaml (pid=10684) - Successfully Loaded Generator Config small_v0 from small_v0.yaml (pid=10684) - Successfully Loaded Generator Config small_single_v0 from small_single_v0.yaml (pid=10684) - Successfully Loaded Generator Config small_double_v0 from small_double_v0.yaml (pid=10684) - Successfully Loaded Generator Config adrian_v0 from adrian_v0.yaml (pid=10684) - Successfully Loaded Generator Config small_triple_v0 from small_triple_v0.yaml (pid=10684) - Successfully Loaded Generator Config medium_stoch_v1 from medium_stoch_v1.yaml (pid=10684) - Successfully Loaded Evaluation Config test_render from test_render.yaml (pid=10684) - Successfully Loaded Evaluation Config default from default.yaml (pid=10684) - Successfully Loaded Evaluation Config default_render from default_render.yaml (pid=10684) - Successfully Loaded Evaluation Config enable_explore from enable_explore.yaml (pid=10684) 2020-10-10 10:27:53,565 WARNING deprecation.py:30 -- DeprecationWarning: `callbacks dict interface` has been deprecated. Use `a class extending rllib.agents.callbacks.DefaultCallbacks` instead. This will raise an error in the future! (pid=10684) 2020-10-10 10:27:53,565 INFO trainer.py:580 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags. (pid=10684) 2020-10-10 10:27:53,566 WARNING deprecation.py:30 -- DeprecationWarning: `callbacks dict interface` has been deprecated. Use `a class extending rllib.agents.callbacks.DefaultCallbacks` instead. This will raise an error in the future! (pid=10684) /usr/local/envs/flatland-paper/lib/python3.7/site-packages/flatland/envs/rail_generators.py:725: UserWarning: Could not set all required cities! (pid=10684) "Could not set all required cities!") (pid=10684) - Successfully Loaded Observation class TreeObs from tree_obs.py (pid=10684) - Successfully Loaded Observation class LocalConflictObs from local_conflict_obs.py (pid=10684) - Successfully Loaded Observation class Utils from utils.py (pid=10684) - Successfully Loaded Observation class CombinedObs from combined_obs.py (pid=10684) - Successfully Loaded Observation class ForwardActionObs from forward_action_obs.py (pid=10684) - Successfully Loaded Observation class NewTreeObs from new_tree_obs.py (pid=10684) - Successfully Loaded Observation class NewTreeObsBuilder from new_tree_obs_builder.py (pid=10684) - Successfully Loaded Observation class RandomActionObs from random_action_obs.py (pid=10684) - Successfully Loaded Observation class ShortestPathObs from shortest_path_obs.py (pid=10684) - Successfully Loaded Observation class GlobalObs from global_obs.py (pid=10684) - Successfully Loaded Observation class ShortestPathActionObs from shortest_path_action_obs.py (pid=10684) - Successfully Loaded Observation class GlobalDensityObs from global_density_obs.py (pid=10684) ================================================== (pid=10684) {'grid_mode': False, (pid=10684) 'height': 25, (pid=10684) 'max_num_cities': 4, (pid=10684) 'max_rails_between_cities': 2, (pid=10684) 'max_rails_in_city': 3, (pid=10684) 'number_of_agents': 5, (pid=10684) 'regenerate_rail_on_reset': True, (pid=10684) 'regenerate_schedule_on_reset': True, (pid=10684) 'seed': 0, (pid=10684) 'width': 25} (pid=10684) ================================================== (pid=10684) /usr/local/envs/flatland-paper/lib/python3.7/site-packages/flatland/envs/rail_generators.py:725: UserWarning: Could not set all required cities! (pid=10684) "Could not set all required cities!") (pid=10683) - Successfully Loaded Generator Config small_stoch_v1 from small_stoch_v1.yaml (pid=10683) - Successfully Loaded Generator Config 32x32_v0 from 32x32_v0.yaml (pid=10683) - Successfully Loaded Generator Config small_stoch_v0 from small_stoch_v0.yaml (pid=10683) - Successfully Loaded Generator Config medium_stoch_v2 from medium_stoch_v2.yaml (pid=10683) - Successfully Loaded Generator Config large_stoch_v0 from large_stoch_v0.yaml (pid=10683) - Successfully Loaded Generator Config small_v0 from small_v0.yaml (pid=10683) - Successfully Loaded Generator Config small_single_v0 from small_single_v0.yaml (pid=10683) - Successfully Loaded Generator Config small_double_v0 from small_double_v0.yaml (pid=10683) - Successfully Loaded Generator Config adrian_v0 from adrian_v0.yaml (pid=10683) - Successfully Loaded Generator Config small_triple_v0 from small_triple_v0.yaml (pid=10683) - Successfully Loaded Generator Config medium_stoch_v1 from medium_stoch_v1.yaml (pid=10683) - Successfully Loaded Evaluation Config test_render from test_render.yaml (pid=10683) - Successfully Loaded Evaluation Config default from default.yaml (pid=10683) - Successfully Loaded Evaluation Config default_render from default_render.yaml (pid=10683) - Successfully Loaded Evaluation Config enable_explore from enable_explore.yaml (pid=10683) 2020-10-10 10:28:00,849 WARNING deprecation.py:30 -- DeprecationWarning: `callbacks dict interface` has been deprecated. Use `a class extending rllib.agents.callbacks.DefaultCallbacks` instead. This will raise an error in the future! (pid=10683) - Successfully Loaded Observation class TreeObs from tree_obs.py (pid=10683) - Successfully Loaded Observation class LocalConflictObs from local_conflict_obs.py (pid=10683) - Successfully Loaded Observation class Utils from utils.py (pid=10683) - Successfully Loaded Observation class CombinedObs from combined_obs.py (pid=10683) - Successfully Loaded Observation class ForwardActionObs from forward_action_obs.py (pid=10683) - Successfully Loaded Observation class NewTreeObs from new_tree_obs.py (pid=10683) - Successfully Loaded Observation class NewTreeObsBuilder from new_tree_obs_builder.py (pid=10683) - Successfully Loaded Observation class RandomActionObs from random_action_obs.py (pid=10683) - Successfully Loaded Observation class ShortestPathObs from shortest_path_obs.py (pid=10683) - Successfully Loaded Observation class GlobalObs from global_obs.py (pid=10683) - Successfully Loaded Observation class ShortestPathActionObs from shortest_path_action_obs.py (pid=10683) - Successfully Loaded Observation class GlobalDensityObs from global_density_obs.py (pid=10683) /usr/local/envs/flatland-paper/lib/python3.7/site-packages/flatland/envs/rail_generators.py:725: UserWarning: Could not set all required cities! (pid=10683) "Could not set all required cities!") (pid=10683) /usr/local/envs/flatland-paper/lib/python3.7/site-packages/flatland/envs/rail_generators.py:647: UserWarning: [WARNING] Changing to Grid mode to place at least 2 cities. (pid=10683) warnings.warn("[WARNING] Changing to Grid mode to place at least 2 cities.") (pid=10684) 2020-10-10 10:28:02,186 INFO trainable.py:217 -- Getting current IP. (pid=10683) /usr/local/envs/flatland-paper/lib/python3.7/site-packages/flatland/envs/rail_generators.py:725: UserWarning: Could not set all required cities! (pid=10683) "Could not set all required cities!") ================================================== Setting up new w&b logger Experiment tag: 0 Experiment id: cf44a94683f74f69bac106522ff86af0 ================================================== == Status == Memory usage on this node: 2.4/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.49 GiB objects Result logdir: /root/ray_results/flatland-sparse-global-conv-ppo Number of trials: 1 (1 RUNNING) +---------------------------+----------+------------------+--------+------------------+------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+----------+------------------+--------+------------------+------+----------| | PPO_flatland_sparse_00000 | RUNNING | 172.28.0.2:10684 | 1 | 64.9726 | 1114 | nan | +---------------------------+----------+------------------+--------+------------------+------+----------+ == Status == Memory usage on this node: 2.4/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 0/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.49 GiB objects Result logdir: /root/ray_results/flatland-sparse-global-conv-ppo Number of trials: 1 (1 TERMINATED) +---------------------------+------------+-------+--------+------------------+------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+-------+--------+------------------+------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 1 | 64.9726 | 1114 | nan | +---------------------------+------------+-------+--------+------------------+------+----------+ Saving full experiment config: global_obs_ppo.yaml
2020-10-10 10:27:47,573 INFO resource_spec.py:212 -- Starting Ray with 7.32 GiB memory available for workers and up to 3.68 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-10-10 10:27:48,031 INFO services.py:1170 -- View the Ray dashboard at 172.28.0.2:8265
2020-10-10 10:27:48,381 WARNING logger.py:314 -- JsonLogger not provided. The ExperimentAnalysis tool is disabled.
wandb: W&B is a tool that helps track and visualize machine learning experiments
wandb: No credentials found. Run "wandb login" to visualize your metrics
wandb: Tracking run with wandb version 0.9.2
wandb: Wandb version 0.10.5 is available! To upgrade, please run:
wandb: $ pip install wandb --upgrade
wandb: Run data is saved locally in wandb/run-20201010_102907-20la2orq
wandb: Program ended successfully.
wandb: You can sync this run to the cloud by running:
wandb: wandb sync wandb/run-20201010_102907-20la2orq
📈 TensorBoard¶
%load_ext tensorboard
%tensorboard --logdir ~/ray_results
Output hidden; open in https://colab.research.google.com to view.
# Now we can run a full training after changing number of workers
# For demonstration purpose we also reduce the time steps to 15000
%%bash
sed 's/num_workers: 13/num_workers: 1/g' baselines/action_masking_and_skipping/ppo_tree_obs_small_v0.yaml \
| sed 's/num_envs_per_worker: 5/num_envs_per_worker: 2/g' \
| sed 's/timesteps_total: 15000000/timesteps_total: 15000/g' > ppo-tree-obs-small-v0.yaml
%%bash
# Note the argument --bind-all is needed whenever we are running in Google Colab
source activate flatland-paper
# Run rllib training
python ./train.py -f ppo-tree-obs-small-v0.yaml --bind-all
- Successfully Loaded Generator Config small_stoch_v1 from small_stoch_v1.yaml - Successfully Loaded Generator Config 32x32_v0 from 32x32_v0.yaml - Successfully Loaded Generator Config small_stoch_v0 from small_stoch_v0.yaml - Successfully Loaded Generator Config medium_stoch_v2 from medium_stoch_v2.yaml - Successfully Loaded Generator Config large_stoch_v0 from large_stoch_v0.yaml - Successfully Loaded Generator Config small_v0 from small_v0.yaml - Successfully Loaded Generator Config small_single_v0 from small_single_v0.yaml - Successfully Loaded Generator Config small_double_v0 from small_double_v0.yaml - Successfully Loaded Generator Config adrian_v0 from adrian_v0.yaml - Successfully Loaded Generator Config small_triple_v0 from small_triple_v0.yaml - Successfully Loaded Generator Config medium_stoch_v1 from medium_stoch_v1.yaml - Successfully Loaded Evaluation Config test_render from test_render.yaml - Successfully Loaded Evaluation Config default from default.yaml - Successfully Loaded Evaluation Config default_render from default_render.yaml - Successfully Loaded Evaluation Config enable_explore from enable_explore.yaml - Successfully Loaded Observation class TreeObs from tree_obs.py - Successfully Loaded Observation class LocalConflictObs from local_conflict_obs.py - Successfully Loaded Observation class Utils from utils.py - Successfully Loaded Observation class CombinedObs from combined_obs.py - Successfully Loaded Observation class ForwardActionObs from forward_action_obs.py - Successfully Loaded Observation class NewTreeObs from new_tree_obs.py - Successfully Loaded Observation class NewTreeObsBuilder from new_tree_obs_builder.py - Successfully Loaded Observation class RandomActionObs from random_action_obs.py - Successfully Loaded Observation class ShortestPathObs from shortest_path_obs.py - Successfully Loaded Observation class GlobalObs from global_obs.py - Successfully Loaded Observation class ShortestPathActionObs from shortest_path_action_obs.py - Successfully Loaded Observation class GlobalDensityObs from global_density_obs.py - Successfully Loaded Environment class FlatlandRandomSparseSmall from flatland_random_sparse_small.py - Successfully Loaded Environment class FlatlandBase from flatland_base.py - Successfully Loaded Environment class FlatlandSingle from flatland_single.py - Successfully Loaded Environment class FlatlandSparse from flatland_sparse.py - Successfully Loaded Model class CustomLossModel from custom_loss_model.py - Successfully Loaded Model class GlobalDensObsModel from global_dens_obs_model.py - Successfully Loaded Model class CcTransformer from cc_transformer.py - Successfully Loaded Model class CcConcatenate from cc_concatenate.py - Successfully Loaded Model class FullyConnectedModel from fully_connected_model.py - Successfully Loaded Model class GlobalObsModel from global_obs_model.py == Status == Memory usage on this node: 1.4/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (2 PENDING, 1 RUNNING) +---------------------------+----------+-------+ | Trial name | status | loc | |---------------------------+----------+-------| | PPO_flatland_sparse_00000 | RUNNING | | | PPO_flatland_sparse_00001 | PENDING | | | PPO_flatland_sparse_00002 | PENDING | | +---------------------------+----------+-------+ (pid=11266) - Successfully Loaded Generator Config small_stoch_v1 from small_stoch_v1.yaml (pid=11266) 2020-10-10 10:49:14,547 INFO trainer.py:421 -- Tip: set 'eager': true or the --eager flag to enable TensorFlow eager execution (pid=11266) - Successfully Loaded Generator Config 32x32_v0 from 32x32_v0.yaml (pid=11266) - Successfully Loaded Generator Config small_stoch_v0 from small_stoch_v0.yaml (pid=11266) - Successfully Loaded Generator Config medium_stoch_v2 from medium_stoch_v2.yaml (pid=11266) - Successfully Loaded Generator Config large_stoch_v0 from large_stoch_v0.yaml (pid=11266) - Successfully Loaded Generator Config small_v0 from small_v0.yaml (pid=11266) - Successfully Loaded Generator Config small_single_v0 from small_single_v0.yaml (pid=11266) - Successfully Loaded Generator Config small_double_v0 from small_double_v0.yaml (pid=11266) - Successfully Loaded Generator Config adrian_v0 from adrian_v0.yaml (pid=11266) - Successfully Loaded Generator Config small_triple_v0 from small_triple_v0.yaml (pid=11266) - Successfully Loaded Generator Config medium_stoch_v1 from medium_stoch_v1.yaml (pid=11266) - Successfully Loaded Evaluation Config test_render from test_render.yaml (pid=11266) - Successfully Loaded Evaluation Config default from default.yaml (pid=11266) - Successfully Loaded Evaluation Config default_render from default_render.yaml (pid=11266) - Successfully Loaded Evaluation Config enable_explore from enable_explore.yaml (pid=11266) - Successfully Loaded Observation class TreeObs from tree_obs.py (pid=11266) - Successfully Loaded Observation class LocalConflictObs from local_conflict_obs.py (pid=11266) - Successfully Loaded Observation class Utils from utils.py (pid=11266) - Successfully Loaded Observation class CombinedObs from combined_obs.py (pid=11266) - Successfully Loaded Observation class ForwardActionObs from forward_action_obs.py (pid=11266) - Successfully Loaded Observation class NewTreeObs from new_tree_obs.py (pid=11266) - Successfully Loaded Observation class NewTreeObsBuilder from new_tree_obs_builder.py (pid=11266) - Successfully Loaded Observation class RandomActionObs from random_action_obs.py (pid=11266) - Successfully Loaded Observation class ShortestPathObs from shortest_path_obs.py (pid=11266) - Successfully Loaded Observation class GlobalObs from global_obs.py (pid=11266) - Successfully Loaded Observation class ShortestPathActionObs from shortest_path_action_obs.py (pid=11266) - Successfully Loaded Observation class GlobalDensityObs from global_density_obs.py (pid=11266) ================================================== (pid=11266) {'grid_mode': False, (pid=11266) 'height': 25, (pid=11266) 'max_num_cities': 4, (pid=11266) 'max_rails_between_cities': 2, (pid=11266) 'max_rails_in_city': 3, (pid=11266) 'number_of_agents': 5, (pid=11266) 'regenerate_rail_on_reset': True, (pid=11266) 'regenerate_schedule_on_reset': True, (pid=11266) 'seed': 0, (pid=11266) 'width': 25} (pid=11266) ================================================== (pid=11266) 2020-10-10 10:49:15,185 WARNING deprecation.py:30 -- DeprecationWarning: `callbacks dict interface` has been deprecated. Use `a class extending rllib.agents.callbacks.DefaultCallbacks` instead. This will raise an error in the future! (pid=11266) 2020-10-10 10:49:15,185 INFO trainer.py:580 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags. (pid=11266) 2020-10-10 10:49:15,187 WARNING deprecation.py:30 -- DeprecationWarning: `callbacks dict interface` has been deprecated. Use `a class extending rllib.agents.callbacks.DefaultCallbacks` instead. This will raise an error in the future! (pid=11266) /usr/local/envs/flatland-paper/lib/python3.7/site-packages/flatland/envs/rail_generators.py:725: UserWarning: Could not set all required cities! (pid=11266) "Could not set all required cities!") (pid=11266) 2020-10-10 10:49:19,670 INFO trainable.py:217 -- Getting current IP. (pid=11268) - Successfully Loaded Generator Config small_stoch_v1 from small_stoch_v1.yaml (pid=11268) - Successfully Loaded Generator Config 32x32_v0 from 32x32_v0.yaml (pid=11268) - Successfully Loaded Generator Config small_stoch_v0 from small_stoch_v0.yaml (pid=11268) - Successfully Loaded Generator Config medium_stoch_v2 from medium_stoch_v2.yaml (pid=11268) - Successfully Loaded Generator Config large_stoch_v0 from large_stoch_v0.yaml (pid=11268) - Successfully Loaded Generator Config small_v0 from small_v0.yaml (pid=11268) - Successfully Loaded Generator Config small_single_v0 from small_single_v0.yaml (pid=11268) - Successfully Loaded Generator Config small_double_v0 from small_double_v0.yaml (pid=11268) - Successfully Loaded Generator Config adrian_v0 from adrian_v0.yaml (pid=11268) - Successfully Loaded Generator Config small_triple_v0 from small_triple_v0.yaml (pid=11268) - Successfully Loaded Generator Config medium_stoch_v1 from medium_stoch_v1.yaml (pid=11268) - Successfully Loaded Evaluation Config test_render from test_render.yaml (pid=11268) - Successfully Loaded Evaluation Config default from default.yaml (pid=11268) - Successfully Loaded Evaluation Config default_render from default_render.yaml (pid=11268) - Successfully Loaded Evaluation Config enable_explore from enable_explore.yaml (pid=11268) 2020-10-10 10:49:20,914 WARNING deprecation.py:30 -- DeprecationWarning: `callbacks dict interface` has been deprecated. Use `a class extending rllib.agents.callbacks.DefaultCallbacks` instead. This will raise an error in the future! (pid=11268) /usr/local/envs/flatland-paper/lib/python3.7/site-packages/flatland/envs/rail_generators.py:725: UserWarning: Could not set all required cities! (pid=11268) "Could not set all required cities!") (pid=11268) /usr/local/envs/flatland-paper/lib/python3.7/site-packages/flatland/envs/rail_generators.py:647: UserWarning: [WARNING] Changing to Grid mode to place at least 2 cities. (pid=11268) warnings.warn("[WARNING] Changing to Grid mode to place at least 2 cities.") (pid=11268) - Successfully Loaded Observation class TreeObs from tree_obs.py (pid=11268) - Successfully Loaded Observation class LocalConflictObs from local_conflict_obs.py (pid=11268) - Successfully Loaded Observation class Utils from utils.py (pid=11268) - Successfully Loaded Observation class CombinedObs from combined_obs.py (pid=11268) - Successfully Loaded Observation class ForwardActionObs from forward_action_obs.py (pid=11268) - Successfully Loaded Observation class NewTreeObs from new_tree_obs.py (pid=11268) - Successfully Loaded Observation class NewTreeObsBuilder from new_tree_obs_builder.py (pid=11268) - Successfully Loaded Observation class RandomActionObs from random_action_obs.py (pid=11268) - Successfully Loaded Observation class ShortestPathObs from shortest_path_obs.py (pid=11268) - Successfully Loaded Observation class GlobalObs from global_obs.py (pid=11268) - Successfully Loaded Observation class ShortestPathActionObs from shortest_path_action_obs.py (pid=11268) - Successfully Loaded Observation class GlobalDensityObs from global_density_obs.py (pid=11268) /usr/local/envs/flatland-paper/lib/python3.7/site-packages/flatland/envs/rail_generators.py:725: UserWarning: Could not set all required cities! (pid=11268) "Could not set all required cities!") ================================================== Setting up new w&b logger Experiment tag: 0 Experiment id: 3f9c63a2f09e4e9dbbef0efcac0dca05 ================================================== == Status == Memory usage on this node: 1.9/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (2 PENDING, 1 RUNNING) +---------------------------+----------+------------------+--------+------------------+------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+----------+------------------+--------+------------------+------+----------| | PPO_flatland_sparse_00000 | RUNNING | 172.28.0.2:11266 | 1 | 14.1453 | 1003 | nan | | PPO_flatland_sparse_00001 | PENDING | | | | | | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+----------+------------------+--------+------------------+------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (2 PENDING, 1 RUNNING) +---------------------------+----------+------------------+--------+------------------+------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+----------+------------------+--------+------------------+------+----------| | PPO_flatland_sparse_00000 | RUNNING | 172.28.0.2:11266 | 2 | 27.1881 | 2139 | nan | | PPO_flatland_sparse_00001 | PENDING | | | | | | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+----------+------------------+--------+------------------+------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (2 PENDING, 1 RUNNING) +---------------------------+----------+------------------+--------+------------------+------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+----------+------------------+--------+------------------+------+----------| | PPO_flatland_sparse_00000 | RUNNING | 172.28.0.2:11266 | 3 | 38.2044 | 3336 | -1263.5 | | PPO_flatland_sparse_00001 | PENDING | | | | | | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+----------+------------------+--------+------------------+------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (2 PENDING, 1 RUNNING) +---------------------------+----------+------------------+--------+------------------+------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+----------+------------------+--------+------------------+------+----------| | PPO_flatland_sparse_00000 | RUNNING | 172.28.0.2:11266 | 4 | 49.9066 | 4436 | -1263.5 | | PPO_flatland_sparse_00001 | PENDING | | | | | | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+----------+------------------+--------+------------------+------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (2 PENDING, 1 RUNNING) +---------------------------+----------+------------------+--------+------------------+------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+----------+------------------+--------+------------------+------+----------| | PPO_flatland_sparse_00000 | RUNNING | 172.28.0.2:11266 | 5 | 58.5835 | 5586 | -1263.5 | | PPO_flatland_sparse_00001 | PENDING | | | | | | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+----------+------------------+--------+------------------+------+----------+ (pid=11268) /usr/local/envs/flatland-paper/lib/python3.7/site-packages/flatland/envs/rail_generators.py:647: UserWarning: [WARNING] Changing to Grid mode to place at least 2 cities. (pid=11268) warnings.warn("[WARNING] Changing to Grid mode to place at least 2 cities.") == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (2 PENDING, 1 RUNNING) +---------------------------+----------+------------------+--------+------------------+------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+----------+------------------+--------+------------------+------+----------| | PPO_flatland_sparse_00000 | RUNNING | 172.28.0.2:11266 | 6 | 70.4385 | 6753 | -1573 | | PPO_flatland_sparse_00001 | PENDING | | | | | | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+----------+------------------+--------+------------------+------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (2 PENDING, 1 RUNNING) +---------------------------+----------+------------------+--------+------------------+------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+----------+------------------+--------+------------------+------+----------| | PPO_flatland_sparse_00000 | RUNNING | 172.28.0.2:11266 | 7 | 77.9675 | 7812 | -1573 | | PPO_flatland_sparse_00001 | PENDING | | | | | | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+----------+------------------+--------+------------------+------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (2 PENDING, 1 RUNNING) +---------------------------+----------+------------------+--------+------------------+------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+----------+------------------+--------+------------------+------+----------| | PPO_flatland_sparse_00000 | RUNNING | 172.28.0.2:11266 | 8 | 88.1596 | 8895 | -1573 | | PPO_flatland_sparse_00001 | PENDING | | | | | | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+----------+------------------+--------+------------------+------+----------+ Saving full experiment config: ppo-tree-obs-small-v0.yaml == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (2 PENDING, 1 RUNNING) +---------------------------+----------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+----------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | RUNNING | 172.28.0.2:11266 | 9 | 98.9992 | 10120 | -1586.83 | | PPO_flatland_sparse_00001 | PENDING | | | | | | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+----------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (2 PENDING, 1 RUNNING) +---------------------------+----------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+----------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | RUNNING | 172.28.0.2:11266 | 10 | 106.389 | 11120 | -1586.83 | | PPO_flatland_sparse_00001 | PENDING | | | | | | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+----------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (2 PENDING, 1 RUNNING) +---------------------------+----------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+----------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | RUNNING | 172.28.0.2:11266 | 11 | 113.736 | 12120 | -1586.83 | | PPO_flatland_sparse_00001 | PENDING | | | | | | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+----------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (2 PENDING, 1 RUNNING) +---------------------------+----------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+----------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | RUNNING | 172.28.0.2:11266 | 12 | 121.131 | 13120 | -1586.83 | | PPO_flatland_sparse_00001 | PENDING | | | | | | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+----------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (2 PENDING, 1 RUNNING) +---------------------------+----------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+----------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | RUNNING | 172.28.0.2:11266 | 13 | 133.423 | 14322 | -1710.75 | | PPO_flatland_sparse_00001 | PENDING | | | | | | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+----------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (2 PENDING, 1 RUNNING) +---------------------------+----------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+----------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | RUNNING | 172.28.0.2:11266 | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | PENDING | | | | | | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+----------+------------------+--------+------------------+-------+----------+ (pid=11622) 2020-10-10 10:51:51,815 INFO trainer.py:421 -- Tip: set 'eager': true or the --eager flag to enable TensorFlow eager execution (pid=11622) - Successfully Loaded Generator Config small_stoch_v1 from small_stoch_v1.yaml (pid=11622) - Successfully Loaded Generator Config 32x32_v0 from 32x32_v0.yaml (pid=11622) - Successfully Loaded Generator Config small_stoch_v0 from small_stoch_v0.yaml (pid=11622) - Successfully Loaded Generator Config medium_stoch_v2 from medium_stoch_v2.yaml (pid=11622) - Successfully Loaded Generator Config large_stoch_v0 from large_stoch_v0.yaml (pid=11622) - Successfully Loaded Generator Config small_v0 from small_v0.yaml (pid=11622) - Successfully Loaded Generator Config small_single_v0 from small_single_v0.yaml (pid=11622) - Successfully Loaded Generator Config small_double_v0 from small_double_v0.yaml (pid=11622) - Successfully Loaded Generator Config adrian_v0 from adrian_v0.yaml (pid=11622) - Successfully Loaded Generator Config small_triple_v0 from small_triple_v0.yaml (pid=11622) - Successfully Loaded Generator Config medium_stoch_v1 from medium_stoch_v1.yaml (pid=11622) - Successfully Loaded Evaluation Config test_render from test_render.yaml (pid=11622) - Successfully Loaded Evaluation Config default from default.yaml (pid=11622) - Successfully Loaded Evaluation Config default_render from default_render.yaml (pid=11622) - Successfully Loaded Evaluation Config enable_explore from enable_explore.yaml (pid=11622) - Successfully Loaded Observation class TreeObs from tree_obs.py (pid=11622) - Successfully Loaded Observation class LocalConflictObs from local_conflict_obs.py (pid=11622) - Successfully Loaded Observation class Utils from utils.py (pid=11622) - Successfully Loaded Observation class CombinedObs from combined_obs.py (pid=11622) - Successfully Loaded Observation class ForwardActionObs from forward_action_obs.py (pid=11622) - Successfully Loaded Observation class NewTreeObs from new_tree_obs.py (pid=11622) - Successfully Loaded Observation class NewTreeObsBuilder from new_tree_obs_builder.py (pid=11622) - Successfully Loaded Observation class RandomActionObs from random_action_obs.py (pid=11622) - Successfully Loaded Observation class ShortestPathObs from shortest_path_obs.py (pid=11622) - Successfully Loaded Observation class GlobalObs from global_obs.py (pid=11622) - Successfully Loaded Observation class ShortestPathActionObs from shortest_path_action_obs.py (pid=11622) - Successfully Loaded Observation class GlobalDensityObs from global_density_obs.py (pid=11622) ================================================== (pid=11622) {'grid_mode': False, (pid=11622) 'height': 25, (pid=11622) 'max_num_cities': 4, (pid=11622) 'max_rails_between_cities': 2, (pid=11622) 'max_rails_in_city': 3, (pid=11622) 'number_of_agents': 5, (pid=11622) 'regenerate_rail_on_reset': True, (pid=11622) 'regenerate_schedule_on_reset': True, (pid=11622) 'seed': 0, (pid=11622) 'width': 25} (pid=11622) ================================================== (pid=11622) 2020-10-10 10:51:52,803 WARNING deprecation.py:30 -- DeprecationWarning: `callbacks dict interface` has been deprecated. Use `a class extending rllib.agents.callbacks.DefaultCallbacks` instead. This will raise an error in the future! (pid=11622) 2020-10-10 10:51:52,803 INFO trainer.py:580 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags. (pid=11622) 2020-10-10 10:51:52,805 WARNING deprecation.py:30 -- DeprecationWarning: `callbacks dict interface` has been deprecated. Use `a class extending rllib.agents.callbacks.DefaultCallbacks` instead. This will raise an error in the future! (pid=11622) /usr/local/envs/flatland-paper/lib/python3.7/site-packages/flatland/envs/rail_generators.py:725: UserWarning: Could not set all required cities! (pid=11622) "Could not set all required cities!") (pid=11622) /usr/local/envs/flatland-paper/lib/python3.7/site-packages/flatland/envs/rail_generators.py:725: UserWarning: Could not set all required cities! (pid=11622) "Could not set all required cities!") (pid=11622) 2020-10-10 10:51:59,967 INFO trainable.py:217 -- Getting current IP. (pid=11664) - Successfully Loaded Generator Config small_stoch_v1 from small_stoch_v1.yaml (pid=11664) - Successfully Loaded Generator Config 32x32_v0 from 32x32_v0.yaml (pid=11664) - Successfully Loaded Generator Config small_stoch_v0 from small_stoch_v0.yaml (pid=11664) - Successfully Loaded Generator Config medium_stoch_v2 from medium_stoch_v2.yaml (pid=11664) - Successfully Loaded Generator Config large_stoch_v0 from large_stoch_v0.yaml (pid=11664) - Successfully Loaded Generator Config small_v0 from small_v0.yaml (pid=11664) - Successfully Loaded Generator Config small_single_v0 from small_single_v0.yaml (pid=11664) - Successfully Loaded Generator Config small_double_v0 from small_double_v0.yaml (pid=11664) - Successfully Loaded Generator Config adrian_v0 from adrian_v0.yaml (pid=11664) - Successfully Loaded Generator Config small_triple_v0 from small_triple_v0.yaml (pid=11664) - Successfully Loaded Generator Config medium_stoch_v1 from medium_stoch_v1.yaml (pid=11664) - Successfully Loaded Evaluation Config test_render from test_render.yaml (pid=11664) - Successfully Loaded Evaluation Config default from default.yaml (pid=11664) - Successfully Loaded Evaluation Config default_render from default_render.yaml (pid=11664) - Successfully Loaded Evaluation Config enable_explore from enable_explore.yaml (pid=11664) - Successfully Loaded Observation class TreeObs from tree_obs.py (pid=11664) - Successfully Loaded Observation class LocalConflictObs from local_conflict_obs.py (pid=11664) - Successfully Loaded Observation class Utils from utils.py (pid=11664) - Successfully Loaded Observation class CombinedObs from combined_obs.py (pid=11664) - Successfully Loaded Observation class ForwardActionObs from forward_action_obs.py (pid=11664) - Successfully Loaded Observation class NewTreeObs from new_tree_obs.py (pid=11664) - Successfully Loaded Observation class NewTreeObsBuilder from new_tree_obs_builder.py (pid=11664) - Successfully Loaded Observation class RandomActionObs from random_action_obs.py (pid=11664) - Successfully Loaded Observation class ShortestPathObs from shortest_path_obs.py (pid=11664) - Successfully Loaded Observation class GlobalObs from global_obs.py (pid=11664) - Successfully Loaded Observation class ShortestPathActionObs from shortest_path_action_obs.py (pid=11664) - Successfully Loaded Observation class GlobalDensityObs from global_density_obs.py (pid=11664) 2020-10-10 10:52:02,606 WARNING deprecation.py:30 -- DeprecationWarning: `callbacks dict interface` has been deprecated. Use `a class extending rllib.agents.callbacks.DefaultCallbacks` instead. This will raise an error in the future! (pid=11664) /usr/local/envs/flatland-paper/lib/python3.7/site-packages/flatland/envs/rail_generators.py:725: UserWarning: Could not set all required cities! (pid=11664) "Could not set all required cities!") (pid=11664) /usr/local/envs/flatland-paper/lib/python3.7/site-packages/flatland/envs/rail_generators.py:725: UserWarning: Could not set all required cities! (pid=11664) "Could not set all required cities!") ================================================== Setting up new w&b logger Experiment tag: 1 Experiment id: b81a7c36e77446a4bbfeca416dfe720c ================================================== == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 PENDING, 1 RUNNING, 1 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | RUNNING | 172.28.0.2:11622 | 1 | 20.9635 | 1179 | nan | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 PENDING, 1 RUNNING, 1 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | RUNNING | 172.28.0.2:11622 | 2 | 32.0034 | 2229 | nan | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 PENDING, 1 RUNNING, 1 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | RUNNING | 172.28.0.2:11622 | 3 | 42.8717 | 3390 | -1508.5 | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 PENDING, 1 RUNNING, 1 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | RUNNING | 172.28.0.2:11622 | 4 | 50.5203 | 4390 | -1508.5 | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 PENDING, 1 RUNNING, 1 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | RUNNING | 172.28.0.2:11622 | 5 | 58.3617 | 5390 | -1508.5 | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 PENDING, 1 RUNNING, 1 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | RUNNING | 172.28.0.2:11622 | 6 | 66.0808 | 6390 | -1508.5 | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 PENDING, 1 RUNNING, 1 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | RUNNING | 172.28.0.2:11622 | 7 | 77.469 | 7538 | -1786.75 | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 PENDING, 1 RUNNING, 1 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | RUNNING | 172.28.0.2:11622 | 8 | 88.3345 | 8646 | -1786.75 | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+------------+------------------+--------+------------------+-------+----------+ Saving full experiment config: ppo-tree-obs-small-v0.yaml == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 PENDING, 1 RUNNING, 1 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | RUNNING | 172.28.0.2:11622 | 9 | 101.621 | 9739 | -1601 | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 PENDING, 1 RUNNING, 1 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | RUNNING | 172.28.0.2:11622 | 10 | 111.18 | 10801 | -1601 | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 PENDING, 1 RUNNING, 1 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | RUNNING | 172.28.0.2:11622 | 11 | 120.736 | 11901 | -1601 | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 PENDING, 1 RUNNING, 1 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | RUNNING | 172.28.0.2:11622 | 12 | 130.868 | 12992 | -1599.25 | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 PENDING, 1 RUNNING, 1 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | RUNNING | 172.28.0.2:11622 | 13 | 139.139 | 14230 | -1599.25 | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 PENDING, 1 RUNNING, 1 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | RUNNING | 172.28.0.2:11622 | 14 | 146.76 | 15230 | -1599.25 | | PPO_flatland_sparse_00002 | PENDING | | | | | | +---------------------------+------------+------------------+--------+------------------+-------+----------+ (pid=11915) - Successfully Loaded Generator Config small_stoch_v1 from small_stoch_v1.yaml (pid=11915) - Successfully Loaded Generator Config 32x32_v0 from 32x32_v0.yaml (pid=11915) - Successfully Loaded Generator Config small_stoch_v0 from small_stoch_v0.yaml (pid=11915) - Successfully Loaded Generator Config medium_stoch_v2 from medium_stoch_v2.yaml (pid=11915) - Successfully Loaded Generator Config large_stoch_v0 from large_stoch_v0.yaml (pid=11915) - Successfully Loaded Generator Config small_v0 from small_v0.yaml (pid=11915) - Successfully Loaded Generator Config small_single_v0 from small_single_v0.yaml (pid=11915) - Successfully Loaded Generator Config small_double_v0 from small_double_v0.yaml (pid=11915) 2020-10-10 10:54:32,805 INFO trainer.py:421 -- Tip: set 'eager': true or the --eager flag to enable TensorFlow eager execution (pid=11915) - Successfully Loaded Generator Config adrian_v0 from adrian_v0.yaml (pid=11915) - Successfully Loaded Generator Config small_triple_v0 from small_triple_v0.yaml (pid=11915) - Successfully Loaded Generator Config medium_stoch_v1 from medium_stoch_v1.yaml (pid=11915) - Successfully Loaded Evaluation Config test_render from test_render.yaml (pid=11915) - Successfully Loaded Evaluation Config default from default.yaml (pid=11915) - Successfully Loaded Evaluation Config default_render from default_render.yaml (pid=11915) - Successfully Loaded Evaluation Config enable_explore from enable_explore.yaml (pid=11915) - Successfully Loaded Observation class TreeObs from tree_obs.py (pid=11915) - Successfully Loaded Observation class LocalConflictObs from local_conflict_obs.py (pid=11915) - Successfully Loaded Observation class Utils from utils.py (pid=11915) - Successfully Loaded Observation class CombinedObs from combined_obs.py (pid=11915) - Successfully Loaded Observation class ForwardActionObs from forward_action_obs.py (pid=11915) - Successfully Loaded Observation class NewTreeObs from new_tree_obs.py (pid=11915) - Successfully Loaded Observation class NewTreeObsBuilder from new_tree_obs_builder.py (pid=11915) - Successfully Loaded Observation class RandomActionObs from random_action_obs.py (pid=11915) - Successfully Loaded Observation class ShortestPathObs from shortest_path_obs.py (pid=11915) - Successfully Loaded Observation class GlobalObs from global_obs.py (pid=11915) - Successfully Loaded Observation class ShortestPathActionObs from shortest_path_action_obs.py (pid=11915) - Successfully Loaded Observation class GlobalDensityObs from global_density_obs.py (pid=11915) ================================================== (pid=11915) {'grid_mode': False, (pid=11915) 'height': 25, (pid=11915) 'max_num_cities': 4, (pid=11915) 'max_rails_between_cities': 2, (pid=11915) 'max_rails_in_city': 3, (pid=11915) 'number_of_agents': 5, (pid=11915) 'regenerate_rail_on_reset': True, (pid=11915) 'regenerate_schedule_on_reset': True, (pid=11915) 'seed': 0, (pid=11915) 'width': 25} (pid=11915) ================================================== (pid=11915) 2020-10-10 10:54:33,804 WARNING deprecation.py:30 -- DeprecationWarning: `callbacks dict interface` has been deprecated. Use `a class extending rllib.agents.callbacks.DefaultCallbacks` instead. This will raise an error in the future! (pid=11915) 2020-10-10 10:54:33,804 INFO trainer.py:580 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags. (pid=11915) 2020-10-10 10:54:33,805 WARNING deprecation.py:30 -- DeprecationWarning: `callbacks dict interface` has been deprecated. Use `a class extending rllib.agents.callbacks.DefaultCallbacks` instead. This will raise an error in the future! (pid=11915) /usr/local/envs/flatland-paper/lib/python3.7/site-packages/flatland/envs/rail_generators.py:725: UserWarning: Could not set all required cities! (pid=11915) "Could not set all required cities!") (pid=11915) 2020-10-10 10:54:41,459 INFO trainable.py:217 -- Getting current IP. (pid=11957) - Successfully Loaded Generator Config small_stoch_v1 from small_stoch_v1.yaml (pid=11957) - Successfully Loaded Generator Config 32x32_v0 from 32x32_v0.yaml (pid=11957) - Successfully Loaded Generator Config small_stoch_v0 from small_stoch_v0.yaml (pid=11957) - Successfully Loaded Generator Config medium_stoch_v2 from medium_stoch_v2.yaml (pid=11957) - Successfully Loaded Generator Config large_stoch_v0 from large_stoch_v0.yaml (pid=11957) - Successfully Loaded Generator Config small_v0 from small_v0.yaml (pid=11957) - Successfully Loaded Generator Config small_single_v0 from small_single_v0.yaml (pid=11957) - Successfully Loaded Generator Config small_double_v0 from small_double_v0.yaml (pid=11957) - Successfully Loaded Generator Config adrian_v0 from adrian_v0.yaml (pid=11957) - Successfully Loaded Generator Config small_triple_v0 from small_triple_v0.yaml (pid=11957) - Successfully Loaded Generator Config medium_stoch_v1 from medium_stoch_v1.yaml (pid=11957) - Successfully Loaded Evaluation Config test_render from test_render.yaml (pid=11957) - Successfully Loaded Evaluation Config default from default.yaml (pid=11957) - Successfully Loaded Evaluation Config default_render from default_render.yaml (pid=11957) - Successfully Loaded Evaluation Config enable_explore from enable_explore.yaml (pid=11957) - Successfully Loaded Observation class TreeObs from tree_obs.py (pid=11957) - Successfully Loaded Observation class LocalConflictObs from local_conflict_obs.py (pid=11957) - Successfully Loaded Observation class Utils from utils.py (pid=11957) - Successfully Loaded Observation class CombinedObs from combined_obs.py (pid=11957) - Successfully Loaded Observation class ForwardActionObs from forward_action_obs.py (pid=11957) - Successfully Loaded Observation class NewTreeObs from new_tree_obs.py (pid=11957) - Successfully Loaded Observation class NewTreeObsBuilder from new_tree_obs_builder.py (pid=11957) - Successfully Loaded Observation class RandomActionObs from random_action_obs.py (pid=11957) - Successfully Loaded Observation class ShortestPathObs from shortest_path_obs.py (pid=11957) - Successfully Loaded Observation class GlobalObs from global_obs.py (pid=11957) - Successfully Loaded Observation class ShortestPathActionObs from shortest_path_action_obs.py (pid=11957) - Successfully Loaded Observation class GlobalDensityObs from global_density_obs.py (pid=11957) 2020-10-10 10:54:44,221 WARNING deprecation.py:30 -- DeprecationWarning: `callbacks dict interface` has been deprecated. Use `a class extending rllib.agents.callbacks.DefaultCallbacks` instead. This will raise an error in the future! (pid=11957) /usr/local/envs/flatland-paper/lib/python3.7/site-packages/flatland/envs/rail_generators.py:725: UserWarning: Could not set all required cities! (pid=11957) "Could not set all required cities!") (pid=11957) /usr/local/envs/flatland-paper/lib/python3.7/site-packages/flatland/envs/rail_generators.py:725: UserWarning: Could not set all required cities! (pid=11957) "Could not set all required cities!") ================================================== Setting up new w&b logger Experiment tag: 2 Experiment id: 028ad485feb94fd89a88a95fee45c8c4 ================================================== == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 RUNNING, 2 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | TERMINATED | | 14 | 146.76 | 15230 | -1599.25 | | PPO_flatland_sparse_00002 | RUNNING | 172.28.0.2:11915 | 1 | 23.1527 | 1196 | nan | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 RUNNING, 2 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | TERMINATED | | 14 | 146.76 | 15230 | -1599.25 | | PPO_flatland_sparse_00002 | RUNNING | 172.28.0.2:11915 | 2 | 36.2727 | 2246 | nan | +---------------------------+------------+------------------+--------+------------------+-------+----------+ (pid=11957) /usr/local/envs/flatland-paper/lib/python3.7/site-packages/flatland/envs/rail_generators.py:647: UserWarning: [WARNING] Changing to Grid mode to place at least 2 cities. (pid=11957) warnings.warn("[WARNING] Changing to Grid mode to place at least 2 cities.") == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 RUNNING, 2 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | TERMINATED | | 14 | 146.76 | 15230 | -1599.25 | | PPO_flatland_sparse_00002 | RUNNING | 172.28.0.2:11915 | 3 | 48.4435 | 3388 | -1527.5 | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 RUNNING, 2 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | TERMINATED | | 14 | 146.76 | 15230 | -1599.25 | | PPO_flatland_sparse_00002 | RUNNING | 172.28.0.2:11915 | 4 | 59.6103 | 4392 | -1527.5 | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 RUNNING, 2 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | TERMINATED | | 14 | 146.76 | 15230 | -1599.25 | | PPO_flatland_sparse_00002 | RUNNING | 172.28.0.2:11915 | 5 | 67.919 | 5392 | -1527.5 | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 RUNNING, 2 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | TERMINATED | | 14 | 146.76 | 15230 | -1599.25 | | PPO_flatland_sparse_00002 | RUNNING | 172.28.0.2:11915 | 6 | 79.5559 | 6412 | -1599.75 | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 RUNNING, 2 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | TERMINATED | | 14 | 146.76 | 15230 | -1599.25 | | PPO_flatland_sparse_00002 | RUNNING | 172.28.0.2:11915 | 7 | 87.5625 | 7542 | -1599.75 | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 RUNNING, 2 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | TERMINATED | | 14 | 146.76 | 15230 | -1599.25 | | PPO_flatland_sparse_00002 | RUNNING | 172.28.0.2:11915 | 8 | 98.2645 | 8642 | -1599.75 | +---------------------------+------------+------------------+--------+------------------+-------+----------+ Saving full experiment config: ppo-tree-obs-small-v0.yaml == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 RUNNING, 2 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | TERMINATED | | 14 | 146.76 | 15230 | -1599.25 | | PPO_flatland_sparse_00002 | RUNNING | 172.28.0.2:11915 | 9 | 106.312 | 9792 | -1599.75 | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 RUNNING, 2 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | TERMINATED | | 14 | 146.76 | 15230 | -1599.25 | | PPO_flatland_sparse_00002 | RUNNING | 172.28.0.2:11915 | 10 | 117.031 | 10868 | -1691.83 | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 RUNNING, 2 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | TERMINATED | | 14 | 146.76 | 15230 | -1599.25 | | PPO_flatland_sparse_00002 | RUNNING | 172.28.0.2:11915 | 11 | 124.654 | 11868 | -1691.83 | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 RUNNING, 2 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | TERMINATED | | 14 | 146.76 | 15230 | -1599.25 | | PPO_flatland_sparse_00002 | RUNNING | 172.28.0.2:11915 | 12 | 135 | 12868 | -1691.83 | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 RUNNING, 2 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | TERMINATED | | 14 | 146.76 | 15230 | -1599.25 | | PPO_flatland_sparse_00002 | RUNNING | 172.28.0.2:11915 | 13 | 144.015 | 13952 | -1690.5 | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 RUNNING, 2 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | TERMINATED | | 14 | 146.76 | 15230 | -1599.25 | | PPO_flatland_sparse_00002 | RUNNING | 172.28.0.2:11915 | 14 | 153.332 | 14952 | -1690.5 | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 2/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (1 RUNNING, 2 TERMINATED) +---------------------------+------------+------------------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+------------------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | TERMINATED | | 14 | 146.76 | 15230 | -1599.25 | | PPO_flatland_sparse_00002 | RUNNING | 172.28.0.2:11915 | 15 | 162.568 | 15952 | -1690.5 | +---------------------------+------------+------------------+--------+------------------+-------+----------+ == Status == Memory usage on this node: 2.0/12.7 GiB Using FIFO scheduling algorithm. Resources requested: 0/2 CPUs, 0/0 GPUs, 0.0/7.32 GiB heap, 0.0/2.54 GiB objects Result logdir: /root/ray_results/ppo-tree-obs-small-v0 Number of trials: 3 (3 TERMINATED) +---------------------------+------------+-------+--------+------------------+-------+----------+ | Trial name | status | loc | iter | total time (s) | ts | reward | |---------------------------+------------+-------+--------+------------------+-------+----------| | PPO_flatland_sparse_00000 | TERMINATED | | 14 | 145.959 | 15429 | -1571 | | PPO_flatland_sparse_00001 | TERMINATED | | 14 | 146.76 | 15230 | -1599.25 | | PPO_flatland_sparse_00002 | TERMINATED | | 15 | 162.568 | 15952 | -1690.5 | +---------------------------+------------+-------+--------+------------------+-------+----------+
2020-10-10 10:49:09,241 INFO resource_spec.py:212 -- Starting Ray with 7.32 GiB memory available for workers and up to 3.68 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-10-10 10:49:09,709 INFO services.py:1170 -- View the Ray dashboard at 172.28.0.2:8265
2020-10-10 10:49:10,114 WARNING logger.py:314 -- JsonLogger not provided. The ExperimentAnalysis tool is disabled.
wandb: W&B is a tool that helps track and visualize machine learning experiments
wandb: No credentials found. Run "wandb login" to visualize your metrics
wandb: Tracking run with wandb version 0.9.2
wandb: Wandb version 0.10.5 is available! To upgrade, please run:
wandb: $ pip install wandb --upgrade
wandb: Run data is saved locally in wandb/run-20201010_104934-1uv5lzig
wandb: Program ended successfully.
wandb: W&B is a tool that helps track and visualize machine learning experiments
wandb: No credentials found. Run "wandb login" to visualize your metrics
wandb: Tracking run with wandb version 0.9.2
wandb: You can sync this run to the cloud by running:
wandb: wandb sync wandb/run-20201010_104934-1uv5lzig
wandb: Wandb version 0.10.5 is available! To upgrade, please run:
wandb: $ pip install wandb --upgrade
wandb: Run data is saved locally in wandb/run-20201010_105221-i57y6ony
wandb: Program ended successfully.
wandb: W&B is a tool that helps track and visualize machine learning experiments
wandb: No credentials found. Run "wandb login" to visualize your metrics
wandb: Tracking run with wandb version 0.9.2
wandb: You can sync this run to the cloud by running:
wandb: wandb sync wandb/run-20201010_105221-i57y6ony
wandb: Wandb version 0.10.5 is available! To upgrade, please run:
wandb: $ pip install wandb --upgrade
wandb: Run data is saved locally in wandb/run-20201010_105504-2f0d572t
wandb: Program ended successfully.
wandb: You can sync this run to the cloud by running:
wandb: wandb sync wandb/run-20201010_105504-2f0d572t
⛸RLLib Training LifeCycle¶
We also officially support saving training metrics, graphs, checkpoints, system runtime , experiment code etc in the experiment tracking tool Weights and Biases (w\&b). This also ensures all our experiments are transparent and easily reproducible.
The flatland metrics such as mean percentage completion, normalised reward and reward can be easily monitored
Evaluation can also be done simultaneously with training with a fixed periodicity. To use default evaluation settings one can just add a -e flag as follows
python train.py -ef ppo_tree_obs_small_v0.yaml
A sample recording in w\&b can be viewed here.
One can also specify a custom evaluation config in a yaml file similar to the training configs.
The flatland environment has also been suitably adapted to support saving video recording using the OpenAI's gym monitor. This has been integrated into rllib and one can directly upload these saved videos into w\&b during the training process. Video recording can slow down training considerably, so by default we only save videos of 5 episodes run during evaluation after every 50 training iterations. To use default evaluation and recording settings one can just add -er flag as follows
python train.py -erf ppo_tree_obs_small_v0.yaml
Just like evaluation one can also specify a custom config for recording in a yaml file.
Once we have all of these things,it is very convenient to track our training process directly in w\&b which has separate sections for training, evaluation and media section for recorded videos To save the checkpoints to w\&b one can just add a -s flag as follows
python train.py -ersf ppo_tree_obs_small_v0.yaml
The saved checkpoints can then be downloaded from weights and biases.
Once the training and evaluation is done, we can select the checkpoint corresponding to the best evaluation scores to run inference on a sample of 50 independent flatland environments:
python rollout.py <path-to-checkpoint> --run PPO --episode=50
--no-render
Refer to the rollout scripts here for the different baseline RLLib runs
🚆 RLib Training Scripts for baselines¶
We run a range of baselines from apex, ppo , imitation learning approaches as explained here. The train scripts for each of them is shown below. The results for the training are also shown in the subsequent sections.
MODEL | Training Script |
---|---|
APEX FIXED IL(25%) | train.py -ef baselines/imitation_learning_tree_obs/apex_il_tree_obs_25.yaml |
APEX FIXED IL(100%) | train.py -ef baselines/imitation_learning_tree_obs/apex_pure_il.yaml |
APEX | train.py -ef baselines/action_masking_and_skipping/apex_tree_obs_small_v0.yaml |
APEX SKIP | train.py -ef baselines/action_masking_and_skipping/apex_tree_obs_small_v0_skip.yaml |
CCPPO | train.py -ef baselines/ccppo_tree_obs/ccppo.yaml |
CCPPO BASE | train.py -ef baselines/ccppo_tree_obs/ccppo_base.yaml |
MARWIL FIXED IL(100%) | train.py -ef baselines/imitation_learning_tree_obs/marwil_tree_obs_all_beta.yaml |
PPO + Online IL(50%) | train.py -ief baselines/custom_imitation_learning_rllib_tree_obs/ppo_imitation_tree_obs.yaml --eager --trace |
PPO | train.py -ef baselines/action_masking_and_skipping/ppo_tree_obs_small_v0.yaml |
PPO MASKING | train.py -ef baselines/action_masking_and_skipping/ppo_tree_obs_small_v0_mask.yaml |
PPO SKIP | train.py -ef baselines/action_masking_and_skipping/ppo_tree_obs_small_v0_skip.yaml |
APEX Global density | train.py -ef baselines/global_density_obs/sparse_small_apex_expdecay_maxt1000.yaml |
Online IL(100%) | train.py -ef baselines/custom_imitation_learning_rllib_tree_obs/pure_imitation_tree_obs.yaml --eager --trace |
⛳️ Results¶
All Baselines run configs can be found here. More information on each of the runs can be found in the 🔗Flatland RLLib Baselines documentation
Checkpoints with the best evaluation normalized reward score for various runs can be found here
🚅 Train, Evaluation and Test Results¶
Note that these runs were based on the older flatland version 2.2.1 and the code for that can be found in the flatland-paper-baselines branch. Test results were calculated using this rollout script
The Training and Evaluation metrics and charts for all the runs can be found in the w\&b link here
MODEL | Train | Best Evaluation | Test | |||
---|---|---|---|---|---|---|
% Complete | Reward | % Complete | Reward | % Complete | Reward | |
APEX FIXED IL(25%) | 90.45±0.4 | -0.18±0 | 89.18±2.44 | -0.18±0.02 | 86±1.44 | -0.22±0.01 |
APEX FIXED IL(100%) | 23.8±11.04 | -0.79±0.09 | 22.93±11.83 | -0.84±0.08 | ||
APEX | 90.38±1.64 | -0.2±0.02 | 85.33±6.11 | -0.22±0.05 | 80.93±5.45 | -0.32±0.04 |
APEX SKIP | 89.51±1.09 | -0.21±0.01 | 84±4 | -0.22±0.04 | 79.73±0.92 | -0.33±0.01 |
CCPPO | 87.72±2.37 | -0.2±0.02 | 84.67±4.01 | -0.23±0.04 | 71.87±3.7 | -0.35±0.03 |
CCPPO BASE | 83.21±1.47 | -0.25±0.01 | 83.2±0.8 | -0.25±0.02 | 76.27±6.96 | -0.31±0.06 |
MARWIL FIXED IL(100%) | 100±0 | -0.04±0.01 | 72.4±3.27 | -0.35±0.02 | ||
PPO + Online IL(50%) | 83.46±1.09 | -0.23±0.01 | 100±0 | -0.07±0 | 71.47±4.01 | -0.35±0.04 |
PPO | 94.78±0.29 | -0.13±0.01 | 98.67±2.31 | -0.09±0.02 | 81.33±5.86 | -0.26±0.05 |
PPO MASKING | 93.4±0.27 | -0.15±0 | 90.67±8.33 | -0.16±0.07 | 80.53±9.59 | -0.28±0.09 |
PPO SKIP | 93.48±0.66 | -0.15±0.01 | 100±0 | -0.08±0.01 | 82.67±5.79 | -0.26±0.05 |
APEX Global density | 57.87±1.85 | -0.51±0.01 | 60±10.58 | -0.45±0.11 | 34.4±9.23 | -0.71±0.07 |
Online IL(100%) | 100±0 | -0.06±0.01 | 80±3.27 | -0.27±0.03 |
🚀 Submitting Solution to Challenge!!!¶
Thanks to the efforts of our partners Deutsche Bahn and Instadeep, you can now submit the CCPPO baseline out of the box: https://gitlab.aicrowd.com/GereonVienken/db_flatland_example 2
This RL method reaches a score of 76.232 on the leaderboard! 💪🏻
You should be able to use the same approach with the other RLlib baselines as well. Make sure to give your best performing checkpoints in the submission. Thanks to our partners and especially to @GereonVienken who contributed this baseline and submission repository!
Content
Comments
You must login before you can post a comment.