MABe 2022: Mouse-Triplets - Video Data
Getting Started - Mouse-Triplets Video Data
Initial data exploration and a basic embedding using a vision model
Initial data exploration and a basic embedding using a vision modelInitial data exploration and a basic embedding using a vision model
How to use this notebook 📝¶
- Copy the notebook. This is a shared template and any edits you make here will not be saved. You should copy it into your own drive folder. For this, click the "File" menu (top-left), then "Save a Copy in Drive". You can edit your copy however you like.
- Link it to your AIcrowd account. In order to submit your predictions to AIcrowd, you need to provide your account's API key.
Setup AIcrowd Utilities 🛠¶
!pip install -U aicrowd-cli
%load_ext aicrowd.magic
Login to AIcrowd ㊗¶¶
%aicrowd login
Install packages 🗃¶
Please add all pacakages installations in this section
!pip install torch torchvision tqdm
Import necessary modules and packages 📚¶
import os
import cv2
import numpy as np
from tqdm.auto import tqdm
import torch
import torchvision
import torchvision.transforms as T
import copy
import matplotlib.pyplot as plt
from matplotlib import animation
from matplotlib import colors
from matplotlib import rc
from matplotlib import rcParams
Download and prepare the dataset 🔍¶
aicrowd_challenge_name = "mabe-2022-mouse-triplets-video-data"
if not os.path.exists('data'):
os.mkdir('data')
datafolder = 'data/'
## If data is already downloaded and stored on google drive, skip the download and point to the prepared directory
# datafolder = '/content/drive/MyDrive/mabe-2022-mouse-triplets-video/data/'
video_folder = f'{datafolder}video_clips/'
## The download might take a while, recommend to move to Google Drive if you want to run multiple times.
%aicrowd ds dl -c {aicrowd_challenge_name} -o data *.npy* # Download all files
# We'll download the 224x224 videos since they're fast on the dataloader, but you can use the full sized videos if you want
%aicrowd ds dl -c {aicrowd_challenge_name} -o data *resized_224* # Download all file
# %aicrowd ds dl -c {aicrowd_challenge_name} -o data *videos.zip* # Download the 512x512 videos
!unzip -q data/submission_videos_resized_224.zip -d {video_folder}
!unzip -q data/userTrain_videos_resized_224.zip -d {video_folder}
## Careful when running the below commands - For copying to Google Drive
# !rm data/submission_videos.zip data/userTrain_videos.zip
# !cp -r data/ '/content/drive/MyDrive/mabe-2022-mouse-triplets-video/data/'
Data Description 📚¶
The following files are available in the Resources
section on the Challenge Page. A "sequence" is a continuous recording of social interactions between animals: sequences are 60 seconds long (1800 frames at 30Hz) in the mouse video dataset. The sequence_id
is a random hash to anonymize experiment details. nans indicate missing data. These occur because not all videos are labelled for all tasks. Data are padded with nans to be all the same size.
user_train.npy
- Set of videos where three public tasks are provided, for your local validation, which follows the following schema :
{
"vocabulary" : A list of public task names
"sequences" : {
"<sequence_id> : {
"annotations" : a ndarray of shape (2, 1800) - Per frame labels for each of the public tasks
"keypoints" : a ndarray of shape (1800, 3, 12, 2) - Tracking keypoints on each of the mice
}
}
}
submission_keypoints.npy
- Keypoints for the submission clips, which follows the following schema :
{
"sequences" : {
"<sequence_id> : {
"keypoints" : a ndarray of shape (1800, 3, 12, 2) - Single point tracking on each of the mice
}
}
}
frame_number_map.npy
- A map of frame numbers for each clip to be used for the submission embeddings arraysample_submission.npy
- Template for a sample submission for this task, follows the following schema :
{
"frame_number_map":
{"<sequence_id-1>": (start_frame_index, end_frame_index),
"<sequence_id-1>": (start_frame_index, end_frame_index),
...
"<sequence_id-n>": (start_frame_index, end_frame_index),
}
"<sequence_id-1>" : [
[0.321, 0.234, 0.186, 0.857, 0.482, 0.185], .....]
[0.184, 0.583, 0.475, 0.485, 0.275, 0.958], .....]
]
}
userTrain_videos.zip
- Videos for the userTrain sequences, all 512x512 Grayscale 30 fps - 1800 frames eachsubmission_videos.zip
- Videos for the Submission sequences, all 512x512 Grayscale 30 fps - 1800 frames each
In sample_submission
, each key in the frame_number_map
dictionary refers to the unique sequence id of a video in the test set. The item for each key is expected to be an the start and end index for slicing the embeddings
numpy array to get the corresponding embeddings. The embeddings
array is a 2D ndarray
of floats of size total_frames
by X
, where X
is the dimension of your learned embedding (6 in the above example; maximum permitted embedding dimension is 128), representing the embedded value of each frame in the sequence. total_frames
is the sum of all the frames of the sequences, the array should be concatenation of all the embeddings of all the clips.
# Load data
userTrain_data = np.load(datafolder + 'user_train.npy', allow_pickle=True).item()
submission_keypoints = np.load(datafolder + 'submission_keypoints.npy', allow_pickle=True).item()
sample_submission = np.load(datafolder + 'sample_submission.npy')
frame_number_map = np.load(datafolder + 'frame_number_map.npy', allow_pickle=True).item()
# Check some basic info
print("UserTrain Vocabulary (Public Tasks)", userTrain_data['vocabulary'])
print("Number of UserTrain Sequences", len(userTrain_data['sequences']))
print("Number of Submission Sequences", len(submission_keypoints['sequences']))
sk = list(userTrain_data['sequences'].keys())[0]
single_sequence = userTrain_data['sequences'][sk]
print("Sequence name", sk, " - Sequence keys", single_sequence.keys())
print("Annotations shape", single_sequence['annotations'].shape)
print("Keypoints shape", single_sequence['keypoints'].shape)
Visualize the sequences 🤓¶
class_to_number = {s: i for i, s in enumerate(userTrain_data['vocabulary'])}
number_to_class = {i: s for i, s in enumerate(userTrain_data['vocabulary'])}
rc('animation', html='jshtml')
# Note: Image processing may be slow if too many frames are animated.
#Plotting constants
FRAME_WIDTH_TOP = 224
FRAME_HEIGHT_TOP = 224
M1_COLOR = 'lawngreen'
M2_COLOR = 'skyblue'
M3_COLOR = 'tomato'
PLOT_MOUSE_START_END = [(0, 1), (1, 3), (3, 2), (2, 0), # head
(3, 6), (6, 9), # midline
(9, 10), (10, 11), # tail
(4, 5), (5, 8), (8, 9), (9, 7), (7, 4) # legs
]
def fill_holes(data):
clean_data = copy.deepcopy(data)
for m in range(3):
holes = np.where(clean_data[0,m,:,0]==0)
if not holes:
continue
for h in holes[0]:
sub = np.where(clean_data[:,m,h,0]!=0)
if(sub and sub[0].size > 0):
clean_data[0,m,h,:] = clean_data[sub[0][0],m,h,:]
else:
return np.empty((0))
for fr in range(1,np.shape(clean_data)[0]):
for m in range(3):
holes = np.where(clean_data[fr,m,:,0]==0)
if not holes:
continue
for h in holes[0]:
clean_data[fr,m,h,:] = clean_data[fr-1,m,h,:]
return clean_data
def set_figax():
fig = plt.figure(figsize=(8, 8))
img = np.zeros((FRAME_HEIGHT_TOP, FRAME_WIDTH_TOP, 3))
ax = fig.add_subplot(111)
imh = ax.imshow(img)
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
return fig, ax, imh
def plot_mouse(ax, pose, color):
pose = np.int32(pose * 224 / 512)
# Draw each keypoint
for j in range(10):
ax.plot(pose[j, 1], pose[j, 0], 'o', color=color, markersize=3)
# Draw a line for each point pair to form the shape of the mouse
for pair in PLOT_MOUSE_START_END:
line_to_plot = pose[pair, :]
ax.plot(line_to_plot[:, 1], line_to_plot[:, 0], color=color, linewidth=1)
def animate_pose_sequence(video_name, seq, start_frame = 0, stop_frame = 100, skip = 0, load_video = True, video_directory = []):
# Returns the animation of the keypoint sequence between start frame
# and stop frame.
image_list = []
cap = []
if load_video:
curr_vid = os.path.join(video_directory, video_name + '.avi')
if not os.path.exists(curr_vid):
print('I couldn''t find a video for sequence ' + video_name + ' in ' + video_directory)
else:
cap = cv2.VideoCapture(curr_vid)
cap.set(cv2.CAP_PROP_POS_FRAMES, start_frame)
cap.open(curr_vid)
counter = 0
if skip:
anim_range = range(start_frame, stop_frame, skip)
else:
anim_range = range(start_frame, stop_frame)
for j in anim_range:
if counter%100 == 0:
print("Processing frame ", j)
fig, ax, imh = set_figax()
plot_mouse(ax, seq[j, 0, :, :], color=M1_COLOR)
plot_mouse(ax, seq[j, 1, :, :], color=M2_COLOR)
plot_mouse(ax, seq[j, 2, :, :], color=M3_COLOR)
if cap:
cap.set(cv2.CAP_PROP_POS_FRAMES, j)
ret,frame = cap.read()
imh.set_data(frame)
ax.set_title(
video_name + '\n frame {:03d}.png'.format(j))
ax.axis('off')
fig.tight_layout(pad=0)
ax.margins(0)
fig.canvas.draw()
image_from_plot = np.frombuffer(fig.canvas.tostring_rgb(),
dtype=np.uint8)
image_from_plot = image_from_plot.reshape(
fig.canvas.get_width_height()[::-1] + (3,))
image_list.append(image_from_plot)
plt.close()
counter = counter + 1
# Plot animation.
fig = plt.figure(figsize=(8,8))
plt.axis('off')
im = plt.imshow(image_list[0])
def animate(k):
im.set_array(image_list[k])
return im,
ani = animation.FuncAnimation(fig, animate, frames=len(image_list), blit=True)
return ani
sequence_names = list(userTrain_data['sequences'].keys())
sequence_key = sequence_names[-2]
single_sequence = userTrain_data["sequences"][sequence_key]
keypoint_sequence = single_sequence['keypoints']
masked_data = np.ma.masked_where(keypoint_sequence==0, keypoint_sequence)
ani = animate_pose_sequence(sequence_key,
keypoint_sequence,
start_frame = 0,
stop_frame = 100,
skip = 0,
load_video = True,
video_directory = video_folder)
# Display the animaion on colab
ani
EDA 🕵️¶
# Percentage of frames for each task
for task_idx, task_name in enumerate(userTrain_data['vocabulary']):
l0, l1 = 0, 0 # We count both because NaNs can exist
for sk in userTrain_data['sequences'].keys():
l0 += np.sum(userTrain_data['sequences'][sk]['annotations'][task_idx] == 0)
l1 += np.sum(userTrain_data['sequences'][sk]['annotations'][task_idx] == 1)
print(f"Task {task_name} - Percentage Frames Active {l1/l0*100:0.3f}")
# Check the number of bouts of each task occuring
def check_bouts(anno):
anno_padded = np.pad(anno.copy(), 1)
anno_padded[np.isnan(anno_padded)] = 0
if np.sum(anno_padded) == 0:
return None
locs = np.where(np.diff(anno_padded))
return locs[0].reshape(-1,2)
def get_bout_infos(dataset):
num_tasks = len(userTrain_data['vocabulary'])
bout_infos = [np.empty((0,2)) for _ in range(num_tasks)]
for sk in dataset['sequences']:
anno = dataset['sequences'][sk]['annotations']
for idx in range(num_tasks):
bout_limits = check_bouts(anno[idx])
if bout_limits is not None:
bout_infos[idx] = np.concatenate([bout_infos[idx], bout_limits], axis=0)
return bout_infos
bout_infos = get_bout_infos(userTrain_data)
for task_idx, task_name in enumerate(userTrain_data['vocabulary']):
b_info = bout_infos[task_idx]
blens = b_info[:, 1] - b_info[:, 0]
print(task_name)
print(f"Number of bouts : {len(b_info)}")
print(f"Average length : {np.mean(blens)}")
print(f"Std lengths : {np.std(blens)}")
print()
Generate an embedding ✨¶
We'll generate a basic embedding using a pre-trained vision model.
num_frames_per_clip = 1800
image_size = (224, 224)
batch_size = 4 # Reduce this if encountering OOM errors
frame_skip = 8 # For every 1 frame, skip 8 frames after that
# NOTE - We skip frames because the output generation on all frames takes a lot of time,
# primarily because reading videos is slow. Resizing frames also takes lot of time.
class MabeVideoDataset(torch.utils.data.Dataset):
"""
Reads all frames from video files with frame skip
"""
def __init__(self,
videofolder,
frame_number_map,
frame_skip):
"""
Initializing the dataset with images and labels
"""
self.videofolder = videofolder
self.frame_number_map = frame_number_map
self.video_keys = list(frame_number_map.keys())
self.frame_skip = frame_skip # For every frame read, skip <frame_skip> frames after that
assert num_frames_per_clip % (frame_skip + 1) == 0, "frame_skip+1 should exactly divide frame number map"
self.num_frames = num_frames_per_clip // (self.frame_skip + 1)
self.transform = T.Compose([
T.ToTensor(),
# T.Resize(image_size), # Add this if using full sized videos
T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
def __len__(self):
return len(self.frame_number_map)
def __getitem__(self, idx):
video_name = self.video_keys[idx]
video_path = os.path.join(self.videofolder, video_name + '.avi')
if not os.path.exists(video_path):
# raise FileNotFoundError(video_path)
print("File not found", video_path)
return torch.zeros((self.num_frames, 3, *image_size), dtype=torch.float32)
cap = cv2.VideoCapture(video_path)
frame_array = torch.zeros((self.num_frames, 3, *image_size), dtype=torch.float32)
for array_idx, frame_idx in enumerate(range(0, num_frames_per_clip, self.frame_skip+1)):
cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)
success, frame = cap.read()
if success:
frame_tensor = self.transform(frame)
frame_array[array_idx] = frame_tensor
return frame_array
dataset = MabeVideoDataset(videofolder=video_folder,
frame_number_map=frame_number_map,
frame_skip=frame_skip)
dataloader = torch.utils.data.DataLoader(
dataset,
batch_size=batch_size,
shuffle=False,
drop_last=False,
pin_memory=True,
num_workers=1,
)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
def get_model():
resnet_encoder = torchvision.models.resnet18(pretrained=False)
model = torch.nn.Sequential(*list(resnet_encoder.children())[:-1])
model.to(device);
model.eval()
return model
model = get_model()
embedding_size = 128 # We'll clip the top embeddings from the output of the CNN
submission = np.empty((sample_submission.shape[0], embedding_size), dtype=np.float32)
idx = 0
for data in tqdm(dataloader, total=len(dataloader)):
with torch.no_grad():
dshape = data.shape
images = data.reshape((-1, *dshape[2:])).to(device) # Squeeze first 2 dimensions to make 4D
output = model(images)
output = output[:, :embedding_size, 0, 0]
output = output.reshape((dshape[0], dshape[1], -1)) # Return the outputs to 2D for multiple clips
output = output.cpu().numpy()
output = np.repeat(output, frame_skip+1, axis=1) # Repeat the output for next skipped frames
# At this point the output should be the embeddings for batch_size number of clips
# Shape of output - (batch_size, num_frames_per_clip, embedding_size)
output = np.reshape(output, (-1, embedding_size))
submission[idx:idx+output.shape[0]] = output
idx += output.shape[0]
Submission 🚀¶
Validate and submit to AIcrowd
print("Embedding shape:", submission.shape)
Validate the submission ✅¶
The submssion should follow these constraints:
- It should be a numpy array
- Embeddings is an 2D numpy array of dtype float32
- The embedding size should't exceed 128
- The frame number map matches the clip lengths
- You can use the helper function below to check these
def validate_submission(submission, frame_number_map):
if not isinstance(submission, np.ndarray):
print("Embeddings should be a numpy array")
return False
elif not len(submission.shape) == 2:
print("Embeddings should be 2D array")
return False
elif not submission.shape[1] <= 128:
print("Embeddings too large, max allowed is 128")
return False
elif not isinstance(submission[0, 0], np.float32):
print(f"Embeddings are not float32")
return False
total_clip_length = frame_number_map[list(frame_number_map.keys())[-1]][1]
if not len(submission) == total_clip_length:
print(f"Emebddings length doesn't match submission clips total length")
return False
if not np.isfinite(submission).all():
print(f"Emebddings contains NaN or infinity")
return False
print("All checks passed")
return True
validate_submission(submission, frame_number_map)
np.save('submission.npy', submission)
%aicrowd submission create --description "Mouse-Video-Getting-Started" -c {aicrowd_challenge_name} -f submission.npy
Content
Comments
You must login before you can post a comment.