AIcrowd | Multi Agent Behavior Challenge 2022

Round 1: Completed

Round 2: Completed

AIcrowd &

MABe Team

62.2k

456

1753

Problem Statements

Weight: 25.0

MABe 2022: Mouse-Triplets - Video Data

Round 2 - Active | Claim AWS Credits by beating the baseline

9911

472

Weight: 25.0

MABe 2022: Ant-Beetles - Video Data

Round 2 - Active | Claim AWS Credits by beating the baseline

4922

200

Weight: 25.0

MABe 2022: Mouse Triplets

Round 1 - Completed

9226

766

💥 Round-2 Is Live : Mouse Triple Video Data and Ant Beetle Video Data
🏅 Win upto $400 AWS Credit Per Team

🗂 New Baselines | 📹 Town Hall Recording
📻 Join Community Slack Channel

🪰 Fruit Fly Baseline | 🐁 Mouse Triplet Baseline

👀 Overview

Can you learn a representation of multi-agent behavior from trajectory and video data?

Representation learning has transformed our understanding of data in domains such as images and language. In order to study behavioral representations, we have curated trajectory and video data of multi-agent animal behavior from three settings. The goal is to learn behavioral representations that can be effectively applied to a variety of downstream behavior analysis tasks.

Interactions between agents are crucial for studying multi-agent behavior, since it is difficult to understand the behavior of an individual without considering interactions of the group. In each of our three settings, since the video and trajectory data consists of multiple interacting agents, the learned representations will need to consider both individual as well as group behavior.

Join our Computational Behavior Slack to discuss the challenge, ask questions, find teammates, or chat with the organizers!

⚔️ Problem statement

Behavior is our oldest window into the inner workings of the brain; it is our first, and often only, indicator of nervous system function and dysfunction.

Computer vision and machine learning have revolutionized the study of behavior: emerging efforts have used machine learning to automate behavior analysis in neuroscience, analyze football players and team strategies, and develop safer autonomous vehicles. In particular, animal behavior modeling has been especially important in supporting conservation work, tracking vectors of infectious disease, accelerating drug screening, and monitoring threatened pollinator species.

If asked to identify the actions of animals around you, you would have no trouble pointing out examples of behavior: we can watch pets or wildlife play, groom, sleep, and forage. But automating the detection of those behaviors remains a challenge. One option is to train supervised algorithms to detect the behaviors we want to study, but doing so relies on manually labeled training data that is costly and time-consuming to produce.

An alternative is to take a data-driven approach: given a lot of examples of naturally behaving agents, try to learn a vocabulary or dimensions of their actions by observation. In machine learning, this is a problem of representation learning.

diagram of problem statement

In this Challenge, you will be given a dataset of tracking data or videos of socially interacting animals: specifically, trios of mice, groups of flies, or symbiotic ant/beetle pairs. Rather than being asked to detect a specific behavior of interest, we ask you to submit a frame-by-frame representation of the dataset—for example, a low-dimensional embedding of animals' trajectories over time. (For inspiration, you can read about a few existing methods for embedding behavior of individual animals here, here, here, and here.)

To evaluate the quality of your learned representations, we will take a practical approach: we'll use representations as input to train single-layer neural networks for many different "hidden" tasks (each task will have its own neural network), such as detecting the occurrence of experimenter-defined actions or distinguishing between two different strains of mice. The goal is therefore to create a representation that captures behavior and generalizes well in any downstream task.

🗄 The Data

Data format

The first phase of this challenge includes two tasks, the Fly Pose Task and the Mouse Pose Task, both aimed at creating representations of animal behavior from keypoint-based pose estimates. The two tasks have slightly different data formats: please check their respective pages above for details on the data formats.

Submission Train and Test Data

The data comprises of two sets:

UserTrain - This contains pose sequences to be used for training/handcrafting your learned behavioral representation. It also contains Labels for some of the subtasks that will be trained during submission- you can use these to evaluate the quality of your embedding
SubmissionClips - This contains only pose sequences, and should be used to generate your team's submission. You should submit a learned representation for each sequence in this dataset. Behind the scenes, these clips belong to one of three groups (see Submission Internal Flow for details)- we don't indicate which clips are used for which sets.

⏰ Timeline

Round 1: February 9th - April 11th, 2022

Round 2: April 11th - July 3rd, 2022 (Updated from May 20th)

🏆 Prizes

The total prize pool for the competition is $12,000 USD.

Round 1 - Pose Data

The cash prize pool for Fruit Flies Round 1 is $3,000 USD total:

🥇 1st on leaderboard: $1500 USD
🥈 2nd on the leaderboard: $1000 USD
🥉 3rd on the leaderboard: $500 USD

The cash prize pool for Mouse Triplets Round 1 is $3,000 USD total:

🥇 1st on leaderboard: $1500 USD
🥈 2nd on the leaderboard: $1000 USD
🥉 3rd on the leaderboard: $500 USD

Round 2 - Video Data

The cash prize pool for Mouse Triplets Video Data is $3,000 USD total:

🥇 1st on leaderboard: $1500 USD
🥈 2nd on the leaderboard: $1000 USD
🥉 3rd on the leaderboard: $500 USD

The cash prize pool for Ant Beetle Video Data is $3,000 USD total:

🥇 1st on leaderboard: $1500 USD
🥈 2nd on the leaderboard: $1000 USD
🥉 3rd on the leaderboard: $500 USD

🎯 Scoring

We use a version of The Borda Count Method to rank team performance in each hidden evaluation subtask. For each subtask, the top-performing submission receives a Gold rank (3 points), second receives a Silver rank (two points), and third receives a Bronze rank (one point). Each team's points earned are then summed across all subtasks, to give the final scores for the team; this final score determines the overall rank for that task.

🚀 Submission

You are required to submit per-frame embedding arrays for each clip in the SubmissionClips dataset. Any temporal information you wish to incorporate in the embeddings need to be computed on a per-frame basis.

Details of the embedding arrays:

Data type should be Float32.
NaN or Inf values are not allowed.
Each task has a maximum allowed embedding dimension.
All embeddings for a task must be the same size.

Please refer to the Baseline notebook in the Notebooks section of each task, to get an example of the required format.

Submission Internal Flow

Your submitted representations of the sequences in SubmissionClips are split into the three sets: SubmissionTrain, PublicTest and PrivateTest:

SubmissionTrain representations are used to train linear classifiers on an ensemble of hidden subtasks. The hidden subtasks involve decoding behaviorally relevant information from your representations, like when an experimenter-defined action of interest is occurring or what strain the animals are from.
PublicTest representations are a test set for the classifiers trained on SubmissionTrain. Classifier performance on PublicTest will determine your rank on the public leaderboard.
PrivateTest - these representations, alongside PublicTest, are the ultimate test set that will be used to determine the winners for a task. Omitting a small number of sequences from rank calculations for the public leaderboard ensures teams can't game the system by resubmitting a model in hopes of getting a better score/seed.

The SubmissionTrain set is split 90/10 into training and validation sets, using three different random seeds. These best model is each seed is found by doing a grid search over the learning rate and number of hidden units in a two layer neural network. Each subtask is trained for 20 epochs with an independent neural network.

The best models of three seeds are then evaluated on the PublicTest set for the public leaderboard scores, and on the combination of PublicTest and PrivateTest set for the private leaderboard. The scores of three seeds are averaged and considered as the scores of the submission.

The Borda Count method is then calculated on these scores against other submissions to produce the leaderboard ranks.

The submission flow for a single seed is shown below.

Submission Flow