AIcrowd | karolisram | Participants

0 Follower

0 Following

karolisram

Karolis Ramanauskas

Activity

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Mon

Wed

Fri

Challenge Categories

Challenges Entered

Completed

NeurIPS 2022: MineRL BASALT Competition

MineRL Labs

Learning From Human-Feedback

Latest submissions

No submissions made in this challenge.

Completed

NeurIPS 2022 - The Neural MMO Challenge

Parametrix.ai

MIT

THU_SIGS

AIcrowd

Specialize and Bargain in Brave New Worlds

Latest submissions

See All

graded	197092	Tue, 23 Aug 2022 10:13:39
graded	196866	Fri, 19 Aug 2022 09:52:23
graded	196863	Fri, 19 Aug 2022 09:44:49

Completed

NeurIPS 2021 - The NetHack Challenge

AIcrowd

ASCII-rendered single-player dungeon crawl game

Latest submissions

See All

graded	149038	Wed, 30 Jun 2021 08:40:25
graded	148738	Mon, 28 Jun 2021 11:10:09
graded	147238	Fri, 18 Jun 2021 11:09:10

Completed

IJCAI 2022 - The Neural MMO Challenge

Parametrix.ai

MIT

THU_SIGS

AIcrowd

Latest submissions

No submissions made in this challenge.

Completed

NeurIPS 2021: MineRL Diamond Competition

MineRL Labs - Carnegie Mellon University

Training sample-efficient agents in Minecraft

Latest submissions

See All

graded

149687

Mon, 5 Jul 2021 10:15:23

Completed

NeurIPS 2021: MineRL BASALT Competition

C.H.A.I. - UC Berkeley

Sample Efficient Reinforcement Learning in Minecraft

Latest submissions

No submissions made in this challenge.

Completed

NeurIPS 2020: Procgen Competition

OpenAI

Measure sample efficiency and generalization in reinforcement learning using procedurally generated environments

Latest submissions

See All

graded	93478	Thu, 29 Oct 2020 12:57:57
graded	93477	Thu, 29 Oct 2020 12:57:25
graded	93390	Wed, 28 Oct 2020 12:56:30

Completed

Flatland

SBB

Deutsche Bahn

SNCF

Multi-Agent Reinforcement Learning on Trains

Latest submissions

No submissions made in this challenge.

Completed

NeurIPS 2020: MineRL Competition

MineRL Labs - Carnegie Mellon University

Sample-efficient reinforcement learning in Minecraft

Latest submissions

See All

graded	120617	Thu, 11 Feb 2021 09:56:57
graded	120492	Wed, 10 Feb 2021 13:34:54
failed	120483	Wed, 10 Feb 2021 12:26:29

Completed

NeurIPS 2019 : MineRL Competition

MineRL Labs - Carnegie Mellon University

Sample-efficient reinforcement learning in Minecraft

Latest submissions

See All

graded	25413	Tue, 26 Nov 2019 09:31:38
graded	25412	Tue, 26 Nov 2019 09:29:38
graded	25075	Sat, 23 Nov 2019 11:00:13

Completed

Flatland Challenge

SBB

Multi Agent Reinforcement Learning on Trains.

Latest submissions

No submissions made in this challenge.

Completed

Unity Obstacle Tower Challenge

Unity Technologies

A new benchmark for Artificial Intelligence (AI) research in Reinforcement Learning

Latest submissions

See All

graded	8563	Thu, 11 Jul 2019 06:16:27
graded	8534	Wed, 10 Jul 2019 21:01:07
failed	8533	Wed, 10 Jul 2019 20:52:11

Completed

Spotify Sequential Skip Prediction Challenge

Spotify

Predict if users will skip or listen to the music they're streamed

Latest submissions

No submissions made in this challenge.

Completed

Flatland AMLD 2021

AIcrowd

Multi-Agent Reinforcement Learning on Trains

Latest submissions

No submissions made in this challenge.

Participant	Rating

Participant	Rating

BeepBoop NeurIPS 2020: MineRL Competition
View
Chaotic-Dwarven-GPT-5 NeurIPS 2021 - The NetHack Challenge
View

NeurIPS 2021: MineRL Diamond Competition

Ah I see the issue now. I think the confusion comes from line 121 in RL_plus_script.py:
[('forward', 1), ('jump', 1)]
This line doesn’t mean two actions, forward on first tick and then jump on the next tick. Instead it means that the forward and jump keys are both pressed for a single tick.

You can see that by printing out act = env.action_space.noop():

OrderedDict([('attack', 0),
             ('back', 0),
             ('camera', array([0., 0.], dtype=float32)),
             ('forward', 0),
             ('jump', 0),
             ('left', 0),
             ('right', 0),
             ('sneak', 0),
             ('sprint', 0)])

This is a single action that does nothing, because none of the keys are pressed. If you then do:

act['forward'] = 1
act['jump'] = 1

act will become an action with those two buttons pressed. This is what the ActionShaping() wrapper does. To create meta actions that perform 5 attacks and such you will need to do something else. Maybe frame skipping would be an easier way to achieve that?

MineRL self._actions

Over 3 years ago

The docstring of the class ActionShaping() should be enough to figure out how to adjust the actions for the RL part of the algo. What changes do you want to make and what have you tried?
Maybe playing Minecraft for a bit or watching a youtube guide would help with Minecraft knowledge?

Questions about the environment that can be used to train the model

Over 3 years ago

yes, you can use the *DenseVectorObf environments in the Research track of the competition.

Discord invite invalid

Almost 4 years ago

Good catch, thank you! The links have been fixed.

NeurIPS 2020: MineRL Competition

Obfuscated actions + KMeans analysis

Almost 4 years ago

Here’s some analysis our team did on the whole obfuscated action + KMeans thing:

A teaser: sometimes the agents don’t have a single action to look up. So shy

Error using gym.make

Over 4 years ago

Working Colab example (credit to @tviskaron):

!java -version
!sudo apt-get purge openjdk-*
!java -version
!sudo apt-get install openjdk-8-jdk

!pip3 install --upgrade minerl
!sudo apt-get install xvfb xserver-xephyr vnc4server
!sudo pip install pyvirtualdisplay

from pyvirtualdisplay import Display
display = Display(visible=0, size=(640, 480))
display.start()

import minerl
import gym
env = gym.make(‘MineRLNavigateDense-v0’)

obs = env.reset()
done = False
net_reward = 0

for _ in range(100):
action = env.action_space.noop()
action['camera'] = [0, 0.03*obs["compassAngle"]]
action['back'] = 0
action['forward'] = 1
action['jump'] = 1
action['attack'] = 1

obs, reward, done, info = env.step(
    action)

net_reward += reward
print("Total reward: ", net_reward)
env.close()

NeurIPS 2020: Procgen Competition

How to find subtle implementation details

About 4 years ago

It could be the weight initialization, as pytorch uses he_uniform by default and tensorflow uses glorot_uniform. Using tensorflow with glorot_uniform I get 42 score on starpilot, while using tensorflow with he_uniform I get 19.

Round 2 is open for submissions 🚀

Over 4 years ago

Sounds good, thanks @shivam . Could you please also give us the normalization factors for the 4 private envs (Rmin, Rmax) ?

Round 2 is open for submissions 🚀

Over 4 years ago

Will we be able to choose which submission to use for the final 16+4 evaluation? It might be the case that our best solution that was tested locally on 16 envs is not the same as the best one for the 6+4 envs on public LB.

Human score

Over 4 years ago

So I was a little bored and decided to see how well I could play the procgen games myself.

Setup:

python -m procgen.interactive --distribution-mode easy --vision agent --env-name coinrun

First I tried each game for 5-10 episodes to figure out what the keys do, how the game works, etc.
Then I played each game 100 times and logged the rewards. Here are the results:

Environment	Mean reward	Mean normalized reward
bigfish	29.40	0.728
bossfight	10.15	0.772
caveflyer	11.69	0.964
chaser	11.23	0.859
climber	12.34	0.975
coinrun	9.80	0.960
dodgeball	18.36	0.963
fruitbot	25.15	0.786
heist	10.00	1.000
jumper	9.20	0.911
leaper	9.90	0.988
maze	10.00	1.000
miner	12.27	0.937
ninja	8.60	0.785
plunder	29.46	0.979
starpilot	33.15	0.498

The mean normalized score over all games was 0.882. It stayed relatively constant throughout the 100 episodes, i.e. I didn’t improve much while playing.

I’m not sure how useful this result would be as a “human benchmark” though - I could easily achieve ~1.000 score given enough time to think on each frame. Also, human visual reaction time is ~250ms, which at 15 fps would translate to us being at least 4 frames behind on our actions, which can be important for games like starpilot, chaser and some others.

How to save rollout video / render?

Over 4 years ago

That worked, thank you!

How to save rollout video / render?

Over 4 years ago

Does it work properly for everyone else? When I run it for 100 episodes it only saves episodes number 0, 1, 8, 27, 64.

Same marks on the testing video

Over 4 years ago

It’s the paint_vel_info flag that you can find under env_config in the .yaml files. There are also some flags that are not in the .yaml files, but people are using (use_monochrome_assets, use_backgrounds). You can find all of them if you scroll down here: https://github.com/openai/procgen .
Should we actually be allowed to change the environment? Maybe these settings should be reset when doing evaluation?

Unity Obstacle Tower Challenge

Submissions are stuck

Over 5 years ago

There was a mention about the final standings for round 2 being based on more seeds than 5 to get a proper average performance. Is that going to happen? I didn’t try to repeatedly submit similar models to overfit the 5 seeds for that reason.

Is there any due date of GCP credit?

Over 5 years ago

mine says it expires 28 May 2020, not sure if that’s a set date or depends on when you redeem. I can’t find the date of when I redeemed.

Successful submissions do not appear on the leaderboard

Almost 6 years ago

Is the debug option off?

What reward receives the agent for collecting a key?

Almost 6 years ago

0.1, same as a single door (there’s 2 doors in each doorway).

Announcement: Debug your submissions

About 6 years ago

And I was thinking I’m going mad when my previously working submission suddenly broke after “disabling” debug

Submission Failed: Evaluation Error

About 6 years ago

Can’t wait! I’ve been trying to get my dopamine trained agent to be scored (only 5-7 floors so far), but the only response I get after every change is
The following containers terminated prematurely. : agent
and it’s not very helpful. It builds fine, but gets stuck on evaluation phase.

Human Performance

About 6 years ago

In the Obstacle Tower paper there is a section on human performance. 15 people tried it multiple times and the max floor was 22. Am I reading this right? I finished all 25 floors on my very first try without much trouble.
How far did everyone else get and how many runs did you do? We could try collecting more data and make a more accurate human benchmark this way.

karolisram has not provided any information yet.

Notebooks

Create Notebook

Filters

Private

Behavioural cloning baseline for the Research track Research track baseline

karolisram
· Almost 4 years ago

Open in Colab · View
Behavioural cloning baseline for the Intro track BC lumberjack plus script

karolisram
· Almost 4 years ago

Open in Colab · View
Fully scripted baseline for the Intro track Meet Bulldozer the lumberjack

karolisram
· Almost 4 years ago

Open in Colab · View
Testing MineRL environment Test the environment by running a fixed sequence of actions in a fixed world

karolisram
· Almost 4 years ago

Open in Colab · View

Notebooks

Create Notebook

Filters

Private