INFO:root:Ranking checkpoints from policies/
INFO:root:Replays will NOT be generated
INFO:root:Using policy store from policies/
INFO:root:Using existing policy ranker from policies/ranker.pickle
Allocated 31.00 MB to environments. Only accurate for Serial backend.
PolicyPool sample_weights: [0, 32, 32, 32, 32]
Allocated to storage - Pytorch: 0.00 GB, System: 3.41 GB
INFO:root:PolicyPool: Updated policies: dict_keys(['learner', 'different_policy', 'new_submission', 'puf41_seed425.000305', 'random_policy'])
('anchor', 1000.0, 33.333333333333336)
('different_policy', 1010.5687896061233, 33.333333333333336)
('puf41_seed425.000305', 1034.6498293586396, 33.333333333333336)
('learner', 1006.8908185882893, 33.333333333333336)
('random_policy', 954.3162223137092, 33.333333333333336)
('new_submission', 993.5743401332385, 33.333333333333336)
Allocated during evaluation - Pytorch: 0.01 GB, System: 1.65 GB
Epoch: 0 - 32K steps - 0:02:12 Elapsed
Steps Per Second: Env=411, Inference=4557
INFO:root:PolicyPool: Updated policies: dict_keys(['learner', 'different_policy', 'new_submission', 'puf41_seed425.000305', 'random_policy'])
Traceback (most recent call last):
File "/content/drive/MyDrive/nmmo/baselines/evaluate.py", line 358, in <module>
Process Process-2:
Process Process-5:
File "/content/drive/MyDrive/nmmo/baselines/evaluate.py", line 238, in rank_policies
_, stats, infos = evaluator.evaluate()
File "/usr/local/lib/python3.10/dist-packages/pufferlib/utils.py", line 223, in wrapper
Process Process-3:
result = func(*args, **kwargs)
File "/content/drive/MyDrive/nmmo/baselines/reinforcement_learning/clean_pufferl.py", line 292, in evaluate
Process Process-4:
o, r, d, i = self.buffers[buf].recv()
File "/usr/local/lib/python3.10/dist-packages/pufferlib/vectorization.py", line 234, in recv
returns = self._recv()
File "/usr/local/lib/python3.10/dist-packages/pufferlib/vectorization.py", line 359, in _recv
return [queue.get() for queue in self.response_queues]
File "/usr/local/lib/python3.10/dist-packages/pufferlib/vectorization.py", line 359, in <listcomp>
return [queue.get() for queue in self.response_queues]
File "/usr/lib/python3.10/multiprocessing/queues.py", line 103, in get
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/dist-packages/pufferlib/vectorization.py", line 351, in _worker_process
response = func(*args, **kwargs)
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
res = self._recv_bytes()
File "/usr/lib/python3.10/multiprocessing/connection.py", line 216, in recv_bytes
File "/usr/local/lib/python3.10/dist-packages/pufferlib/vectorization.py", line 144, in step
o, r, d, i= env.step(atns)
File "/usr/local/lib/python3.10/dist-packages/pufferlib/emulation.py", line 323, in step
obs, rewards, dones, infos = self.env.step(unpacked_actions)
File "/usr/local/lib/python3.10/dist-packages/nmmo/core/env.py", line 359, in step
rewards, infos = self._compute_rewards()
File "/usr/local/lib/python3.10/dist-packages/nmmo/core/env.py", line 503, in _compute_rewards
self.game_state = self._gamestate_generator.generate(self.realm, self.obs)
File "/usr/local/lib/python3.10/dist-packages/nmmo/task/game_state.py", line 277, in generate
event_index = precompute_index(event_data, EventAttr['ent_id']),
File "/usr/local/lib/python3.10/dist-packages/nmmo/task/game_state.py", line 285, in precompute_index
index[id_].append(row)
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/dist-packages/pufferlib/vectorization.py", line 351, in _worker_process
response = func(*args, **kwargs)
KeyboardInterrupt
File "/usr/local/lib/python3.10/dist-packages/pufferlib/vectorization.py", line 144, in step
o, r, d, i= env.step(atns)
File "/usr/local/lib/python3.10/dist-packages/pufferlib/emulation.py", line 323, in step
obs, rewards, dones, infos = self.env.step(unpacked_actions)
File "/usr/local/lib/python3.10/dist-packages/nmmo/core/env.py", line 359, in step
rewards, infos = self._compute_rewards()
File "/usr/local/lib/python3.10/dist-packages/nmmo/core/env.py", line 503, in _compute_rewards
self.game_state = self._gamestate_generator.generate(self.realm, self.obs)
File "/usr/local/lib/python3.10/dist-packages/nmmo/task/game_state.py", line 277, in generate
event_index = precompute_index(event_data, EventAttr['ent_id']),
File "/usr/local/lib/python3.10/dist-packages/nmmo/task/game_state.py", line 285, in precompute_index
index[id_].append(row)
KeyboardInterrupt
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.10/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
KeyboardInterrupt
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/dist-packages/pufferlib/vectorization.py", line 351, in _worker_process
response = func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pufferlib/vectorization.py", line 144, in step
o, r, d, i= env.step(atns)
File "/usr/local/lib/python3.10/dist-packages/pufferlib/emulation.py", line 323, in step
obs, rewards, dones, infos = self.env.step(unpacked_actions)
File "/usr/local/lib/python3.10/dist-packages/nmmo/core/env.py", line 359, in step
rewards, infos = self._compute_rewards()
File "/usr/local/lib/python3.10/dist-packages/nmmo/core/env.py", line 506, in _compute_rewards
task_rewards, task_infos = task.compute_rewards(self.game_state)
File "/usr/local/lib/python3.10/dist-packages/nmmo/task/task_api.py", line 97, in compute_rewards
reward = self._map_progress_to_reward(gs) * self._reward_multiplier
File "/usr/local/lib/python3.10/dist-packages/nmmo/task/task_api.py", line 83, in _map_progress_to_reward
new_progress = max(min(self._eval_fn(gs)*1.0,1.0),0.0)
File "/usr/local/lib/python3.10/dist-packages/nmmo/task/predicate_api.py", line 57, in __call__
progress = max(min(self._evaluate(gs)*1.0,1.0),0.0)
File "/usr/local/lib/python3.10/dist-packages/nmmo/task/predicate_api.py", line 169, in _evaluate
return fn(gs, *self._args, **self._kwargs)
File "/usr/local/lib/python3.10/dist-packages/nmmo/task/base_predicates.py", line 89, in AttainSkill
skill_level = getattr(subject,skill.__name__.lower() + '_level') - 1 # base level is 1
File "/usr/local/lib/python3.10/dist-packages/nmmo/task/group.py", line 93, in __getattr__
return self._sd.__getattribute__(attr)
File "/usr/local/lib/python3.10/dist-packages/nmmo/task/game_state.py", line 242, in __getattribute__
v = getattr(self.entity, attr)
File "/usr/local/lib/python3.10/dist-packages/nmmo/task/game_state.py", line 231, in __getattribute__
return object.__getattribute__(self, attr)
File "/usr/local/lib/python3.10/dist-packages/nmmo/task/game_state.py", line 102, in __get__
self.cache[instance] = self.func(instance)
File "/usr/local/lib/python3.10/dist-packages/nmmo/task/game_state.py", line 210, in entity
return EntityView(self._gs, self._subject, self._sbj_ent)
File "/usr/local/lib/python3.10/dist-packages/nmmo/task/game_state.py", line 231, in __getattribute__
return object.__getattribute__(self, attr)
File "/usr/local/lib/python3.10/dist-packages/nmmo/task/game_state.py", line 102, in __get__
self.cache[instance] = self.func(instance)
File "/usr/local/lib/python3.10/dist-packages/nmmo/task/game_state.py", line 206, in _sbj_ent
return self._gs.where_in_id('entity', self._subject.agents)
File "/usr/local/lib/python3.10/dist-packages/nmmo/task/game_state.py", line 66, in where_in_id
if data_type == 'item':
KeyboardInterrupt
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/dist-packages/pufferlib/vectorization.py", line 351, in _worker_process
response = func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pufferlib/vectorization.py", line 144, in step
o, r, d, i= env.step(atns)
File "/usr/local/lib/python3.10/dist-packages/pufferlib/emulation.py", line 332, in step
obs[agent], rewards[agent], dones[agent], infos[agent] = postprocess_and_flatten(
File "/usr/local/lib/python3.10/dist-packages/pufferlib/emulation.py", line 399, in postprocess_and_flatten
reward, done, info = postprocessor.reward_done_info(
File "/content/drive/MyDrive/nmmo/baselines/environment.py", line 82, in reward_done_info
reward, done, info = super().reward_done_info(reward, done, info)
File "/content/drive/MyDrive/nmmo/baselines/leader_board.py", line 221, in reward_done_info
self._curr_unique_count = len(extract_unique_event(log, self.env.realm.event_log.attr_to_col))
File "/content/drive/MyDrive/nmmo/baselines/leader_board.py", line 397, in extract_unique_event
log[idx, attr_to_col[attr]] = 0
KeyboardInterrupt
Your notebook is very great! And I have some problems: 1st, where could I modify RL algorithm, is this in clean_pufferl.p? 2nd, where could I modify the reward, in task_api.py?
The same issue: Your notebook is excellent! However, I have a couple of concerns: Firstly, where can I modify the RL algorithm? Is it in clean_pufferl.py? Secondly, where can I adjust the rewards, in task_api.py?