IITM RL Final Project
BSuite Benchmark for Reinforcement Learning
This notebook uses an open-source reinforcement learning benchmark known as bsuite.
https://github.com/deepmind/bsuite
BSuite is a collection of carefully-designed experiments that investigate core capabilities of a reinforcement learning agent.
Your task is to use any reinforcement learning techniques at your disposal to get high scores on the environments specified.
Note: Since the course is on Reinforcement Learning, please limit yourself to using traditional Reinforcement Learning algorithms.
Do not use deep reinforcement learning.
You will be implementing a traditional RL algorithm to solve 3 environments.
Environment 1: CATCH
In this environment , the agent must move a paddle to intercept falling balls. Falling balls only move downwards on the column they are in.
The observation is an array shape (rows, columns), with binary values: 0 if a space is empty; 1 if it contains the paddle or a ball.
The actions 3 discrete actions possible: ['stay', 'left', 'right'].
The episode terminates when the ball reaches the bottom of the screen.
Environment 2: CARTPOLE
This environment implements a version of the classic Cartpole task, where the cart has to counter the movements of the pole to prevent it from falling over.
The observation is a vector representing: (x, x_dot, sin(theta), cos(theta), theta_dot, time_elapsed)
The actions are discrete and there are 3 of them available: ['left', 'stay', 'right'].
Episodes start with the pole close to upright. Episodes end when the pole falls, the cart falls off the table, or the max_time is reached.
Environment 3: MOUNTAIN CAR
This environment implements a version of the classic Mountain Car problem where an underpowered car must power up a hill.
The observation is a vector representing: (x, x_dot, time_elapsed)
There are 3 discrete actions available: ['push left', 'no push', 'push right']
Episodes start with the car at the bottom of the hill with no velocity. An episode ends when you reach position x=0.5, or if 1000 steps have been completed.
Each environment has a NOISE variant which adds a scaled random noise to the received rewards. More details in the BSuite Paper.
🚀 Submission
Before submitting, make sure to accept the rules.
Go to the starter kit notebook and follow the instructions to implement your agent in the notebook.
🎯Scoring
We use BSuite's scoring system to determine score for each environment. The final score is the sum of all the test environments' scores.