Loading
0 Follower
0 Following
karolisram
Karolis Ramanauskas

Location

GB

Badges

5
2
2

Connect

Activity

Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Mon
Wed
Fri

Ratings Progression

Loading...

Challenge Categories

Loading...

Challenges Entered

Latest submissions

No submissions made in this challenge.

Latest submissions

See All
graded 197092
graded 196866
graded 196863

ASCII-rendered single-player dungeon crawl game

Latest submissions

See All
graded 149038
graded 148738
graded 147238

Latest submissions

See All
graded 149687

Sample Efficient Reinforcement Learning in Minecraft

Latest submissions

No submissions made in this challenge.

Measure sample efficiency and generalization in reinforcement learning using procedurally generated environments

Latest submissions

See All
graded 93478
graded 93477
graded 93390

Multi-Agent Reinforcement Learning on Trains

Latest submissions

No submissions made in this challenge.

Sample-efficient reinforcement learning in Minecraft

Latest submissions

See All
graded 120617
graded 120492
failed 120483

Sample-efficient reinforcement learning in Minecraft

Latest submissions

See All
graded 25413
graded 25412
graded 25075

Multi Agent Reinforcement Learning on Trains.

Latest submissions

No submissions made in this challenge.

A new benchmark for Artificial Intelligence (AI) research in Reinforcement Learning

Latest submissions

See All
graded 8563
graded 8534
failed 8533

Predict if users will skip or listen to the music they're streamed

Latest submissions

No submissions made in this challenge.

Multi-Agent Reinforcement Learning on Trains

Latest submissions

No submissions made in this challenge.
Participant Rating
Participant Rating

NeurIPS 2021: MineRL Diamond Competition

MineRL self._actions

About 3 years ago

Ah I see the issue now. I think the confusion comes from line 121 in RL_plus_script.py:
[('forward', 1), ('jump', 1)]
This line doesn’t mean two actions, forward on first tick and then jump on the next tick. Instead it means that the forward and jump keys are both pressed for a single tick.

You can see that by printing out act = env.action_space.noop():

OrderedDict([('attack', 0),
             ('back', 0),
             ('camera', array([0., 0.], dtype=float32)),
             ('forward', 0),
             ('jump', 0),
             ('left', 0),
             ('right', 0),
             ('sneak', 0),
             ('sprint', 0)])

This is a single action that does nothing, because none of the keys are pressed. If you then do:

act['forward'] = 1
act['jump'] = 1

act will become an action with those two buttons pressed. This is what the ActionShaping() wrapper does. To create meta actions that perform 5 attacks and such you will need to do something else. Maybe frame skipping would be an easier way to achieve that?

MineRL self._actions

About 3 years ago

The docstring of the class ActionShaping() should be enough to figure out how to adjust the actions for the RL part of the algo. What changes do you want to make and what have you tried?
Maybe playing Minecraft for a bit or watching a youtube guide would help with Minecraft knowledge?

Questions about the environment that can be used to train the model

About 3 years ago

yes, you can use the *DenseVectorObf environments in the Research track of the competition.

Discord invite invalid

Over 3 years ago

Good catch, thank you! The links have been fixed.

NeurIPS 2020: MineRL Competition

Obfuscated actions + KMeans analysis

Over 3 years ago

Here’s some analysis our team did on the whole obfuscated action + KMeans thing:


A teaser: sometimes the agents don’t have a single action to look up. So shy :slight_smile:

Error using gym.make

Over 4 years ago

Working Colab example (credit to @tviskaron):

!java -version
!sudo apt-get purge openjdk-*
!java -version
!sudo apt-get install openjdk-8-jdk

!pip3 install --upgrade minerl
!sudo apt-get install xvfb xserver-xephyr vnc4server
!sudo pip install pyvirtualdisplay

from pyvirtualdisplay import Display
display = Display(visible=0, size=(640, 480))
display.start()

import minerl
import gym
env = gym.make(β€˜MineRLNavigateDense-v0’)

obs = env.reset()
done = False
net_reward = 0

for _ in range(100):
action = env.action_space.noop()

action['camera'] = [0, 0.03*obs["compassAngle"]]
action['back'] = 0
action['forward'] = 1
action['jump'] = 1
action['attack'] = 1

obs, reward, done, info = env.step(
    action)

net_reward += reward
print("Total reward: ", net_reward)

env.close()

NeurIPS 2020: Procgen Competition

How to find subtle implementation details

Almost 4 years ago

It could be the weight initialization, as pytorch uses he_uniform by default and tensorflow uses glorot_uniform. Using tensorflow with glorot_uniform I get 42 score on starpilot, while using tensorflow with he_uniform I get 19.

Round 2 is open for submissions πŸš€

About 4 years ago

Sounds good, thanks @shivam . Could you please also give us the normalization factors for the 4 private envs (Rmin, Rmax) ?

Round 2 is open for submissions πŸš€

About 4 years ago

Will we be able to choose which submission to use for the final 16+4 evaluation? It might be the case that our best solution that was tested locally on 16 envs is not the same as the best one for the 6+4 envs on public LB.

Human score

About 4 years ago

So I was a little bored and decided to see how well I could play the procgen games myself.

Setup:

python -m procgen.interactive --distribution-mode easy --vision agent --env-name coinrun

First I tried each game for 5-10 episodes to figure out what the keys do, how the game works, etc.
Then I played each game 100 times and logged the rewards. Here are the results:

Environment Mean reward Mean normalized reward
bigfish 29.40 0.728
bossfight 10.15 0.772
caveflyer 11.69 0.964
chaser 11.23 0.859
climber 12.34 0.975
coinrun 9.80 0.960
dodgeball 18.36 0.963
fruitbot 25.15 0.786
heist 10.00 1.000
jumper 9.20 0.911
leaper 9.90 0.988
maze 10.00 1.000
miner 12.27 0.937
ninja 8.60 0.785
plunder 29.46 0.979
starpilot 33.15 0.498

The mean normalized score over all games was 0.882. It stayed relatively constant throughout the 100 episodes, i.e. I didn’t improve much while playing.

I’m not sure how useful this result would be as a β€œhuman benchmark” though - I could easily achieve ~1.000 score given enough time to think on each frame. Also, human visual reaction time is ~250ms, which at 15 fps would translate to us being at least 4 frames behind on our actions, which can be important for games like starpilot, chaser and some others.

How to save rollout video / render?

Over 4 years ago

That worked, thank you!

How to save rollout video / render?

Over 4 years ago

Does it work properly for everyone else? When I run it for 100 episodes it only saves episodes number 0, 1, 8, 27, 64.

Same marks on the testing video

Over 4 years ago

It’s the paint_vel_info flag that you can find under env_config in the .yaml files. There are also some flags that are not in the .yaml files, but people are using (use_monochrome_assets, use_backgrounds). You can find all of them if you scroll down here: https://github.com/openai/procgen .
Should we actually be allowed to change the environment? Maybe these settings should be reset when doing evaluation?

Unity Obstacle Tower Challenge

Submissions are stuck

Over 5 years ago

There was a mention about the final standings for round 2 being based on more seeds than 5 to get a proper average performance. Is that going to happen? I didn’t try to repeatedly submit similar models to overfit the 5 seeds for that reason.

Is there any due date of GCP credit?

Over 5 years ago

mine says it expires 28 May 2020, not sure if that’s a set date or depends on when you redeem. I can’t find the date of when I redeemed.

Successful submissions do not appear on the leaderboard

Over 5 years ago

Is the debug option off?

What reward receives the agent for collecting a key?

Over 5 years ago

0.1, same as a single door (there’s 2 doors in each doorway).

Announcement: Debug your submissions

Over 5 years ago

And I was thinking I’m going mad when my previously working submission suddenly broke after β€œdisabling” debug :slight_smile:

Submission Failed: Evaluation Error

Over 5 years ago

Can’t wait! I’ve been trying to get my dopamine trained agent to be scored (only 5-7 floors so far), but the only response I get after every change is
The following containers terminated prematurely. : agent
and it’s not very helpful. It builds fine, but gets stuck on evaluation phase.

Human Performance

Almost 6 years ago

In the Obstacle Tower paper there is a section on human performance. 15 people tried it multiple times and the max floor was 22. Am I reading this right? I finished all 25 floors on my very first try without much trouble.
How far did everyone else get and how many runs did you do? We could try collecting more data and make a more accurate human benchmark this way.

karolisram has not provided any information yet.

Notebooks

Create Notebook