NeurIPS 2023 - The Neural MMO Challenge
Preparing the pkl submission file for a custom policy
Look at "create_custom_policy_pt()" function and modify as you see fit.
Open in Colab doesn't seem to work. Click this url instead: https://colab.research.google.com/drive/1UVQgJGgTOwphb2F9cXj1KZsrPCri5gYs
Set up your instance - gpu and google drive¶
In [1]:
# Check if (NVIDIA) GPU is available
import torch
assert torch.cuda.is_available, "CUDA gpu not available"
In [2]:
# Set up the work directory
import os
assert os.path.exists("/content/drive/MyDrive"), "Google Drive not mounted"
work_dir = "/content/drive/MyDrive/nmmo/"
Train your agent¶
Install nmmo env and pufferlib¶
In [3]:
# Install nmmo env
!pip install pufferlib nmmo > /dev/null
!pip show nmmo # should be 2.0.3
!pip show pufferlib # should be 0.4.5
Install the baselines¶
In [ ]:
# Create the work directory, download the baselines code
%mkdir $work_dir
%cd $work_dir
!git clone https://github.com/carperai/nmmo-baselines baselines --depth=1
In [4]:
# Install libs to run the baselines
%cd $work_dir
%cd baselines
# Create a requirements_colab.txt
with open(work_dir+'baselines/requirements_colab.txt', "w") as f:
f.write("""
accelerate==0.21.0
bitsandbytes==0.41.1
dash==2.11.1
openelm
pandas
plotly==5.15.0
psutil==5.9.3
scikit-learn==1.3.0
tensorboard==2.11.2
tiktoken==0.4.0
torch
transformers==4.31.0
wandb==0.13.7
""")
!pip install -r requirements_colab.txt > /dev/null
Run python train.py
¶
In [ ]:
# Just to check if the training flow works. The checkpoints are saved under nmmo/runs
%cd $work_dir
%cd baselines
ckpt_dir = work_dir + "runs"
!python train.py --runs-dir $ckpt_dir --local-mode true --train-num-steps=5_000
In [ ]:
# new policy store should create `_state.pth` files, which contains only the model's state_dict
!ls /content/drive/MyDrive/nmmo/runs/nmmo_20231210_014715/policy_store/
Test python evaluate.py with the custom policy checkpoints¶
In [ ]:
%cd $work_dir
%cd baselines
!wget https://kywch.github.io/replays/different_policy.pkl -P policies/
!wget https://kywch.github.io/replays/random_policy.pkl -P policies/
!ls policies/
In [ ]:
!python evaluate.py -p policies/
Create a checkpoint with custom policy¶
NOTE: Please check if the evaluation works with your checkpoint WITHOUT your custom files.
In [6]:
# replace policy.py with your file
custom_policy_file = work_dir + "baselines/reinforcement_learning/" + "policy.py"
assert os.path.exists(custom_policy_file), "CANNOT find the policy file"
print(custom_policy_file)
In [5]:
# replace checkpoint with
checkpoint_to_submit = work_dir + "runs/nmmo_20231210_014715/policy_store/nmmo_20231210_014715.000004_state.pth"
assert os.path.exists(checkpoint_to_submit), "CANNOT find the checkpoint file"
assert checkpoint_to_submit.endswith("_state.pth"), "the checkpoint file must end with _state.pth"
print(checkpoint_to_submit)
In [10]:
import pickle
import torch
def create_custom_policy_pt(policy_file, pth_file, out_name="my_submission.pkl"):
assert out_name.endswith(".pkl"), "The file name must end with .pkl"
with open(policy_file, "r") as f:
src_code = f.read()
# add the make_policy() function
# YOU SHOULD CHECK the name of your policy (if not Baseline),
# and the args that go into the policy
src_code += """
class Config(nmmo.config.Default):
PROVIDE_ACTION_TARGETS = True
PROVIDE_NOOP_ACTION_TARGET = True
MAP_FORCE_GENERATION = False
TASK_EMBED_DIM = 4096
COMMUNICATION_SYSTEM_ENABLED = False
def make_policy():
from pufferlib.frameworks import cleanrl
env = pufferlib.emulation.PettingZooPufferEnv(nmmo.Env(Config()))
# Parameters to your model should match your configuration
learner_policy = Baseline(
env,
input_size=256,
hidden_size=256,
task_size=4096
)
return cleanrl.Policy(learner_policy)
"""
state_dict = torch.load(pth_file, map_location="cpu")
checkpoint = {
"policy_src": src_code,
"state_dict": state_dict,
}
with open(out_name, "wb") as out_file:
pickle.dump(checkpoint, out_file)
In [11]:
%cd $work_dir
%cd baselines
# put the checkpoint into the policies directory
create_custom_policy_pt(custom_policy_file, checkpoint_to_submit,
out_name=work_dir + "baselines/policies/new_submission2.pkl")
In [13]:
# see if new_submission.pkl works with the other checkpoints
!python evaluate.py -p policies/
In [ ]:
Content
Comments
You must login before you can post a comment.
Your notebook is very great! And I have some problems: 1st, where could I modify RL algorithm, is this in clean_pufferl.p? 2nd, where could I modify the reward, in task_api.py?
The same issue: Your notebook is excellent! However, I have a couple of concerns: Firstly, where can I modify the RL algorithm? Is it in clean_pufferl.py? Secondly, where can I adjust the rewards, in task_api.py?