AIcrowd | Cinematic Sound Demixing Track - CDX'23

Warm-Up Round: Completed

Phase I: Completed

Phase 2: Completed Weight: 1.0

AIcrowd &

Sony Group Corporation &

Moises.AI &

Mitsubishi Electric Research Laboratories

13.3k

1182

832

🏆 Winner's Solutions

🔍 Discover released models and source code in our MDX track and CDX track papers' "Notes" section.

🗣️ Explore teams' model announcements on the discussion forum for additional insights.

🏛️ Watch the SDX23 Townhall & Presentations

🛠 How to debug your submissions

🕵️ Introduction

Cinematic sound separation is the task of separating movie audio into the three tracks “dialogue”, “sound effects” and “music”. It has many applications ranging from language dubbing to upmixing of old movies to spatial audio and user interfaces for flexible listening.

📜 The Task

There are two Leaderboards in Cinematic Sound Demixing Track(CDX):

Systems that are trained only on the training (tr) and validation (cv) part of DnR are eligible for Leaderboards A.
Systems that are trained on any other data (e.g., also using the test part tt of the DnR dataset) are eligible for Leaderboard B.

📁 Datasets

Cinematic source separation is the task of separating movie audio into the three tracks “dialogue”, “sound effects” and “music”. It has many applications ranging from language dubbing to upmixing of old movies to spatial audio.

For the training of the system, participants can use either the training data of the “Divide-and-Remaster” (DnR) dataset (Leaderboard A) or any data that they have at their disposal (Leaderboard B). The DnR dataset consists of 3,406 mixtures (∼ 57 h) for the training set, 487 mixtures (∼ 8 h) for the validation set, and 973 mixtures ( ∼16 h) for the test set, along with their isolated ground-truth stems.

For the evaluation and ranking of the submissions, we use a newly created hidden dataset of real audio from 11 Sony Picture Entertainment movies. The data is stereo and sampled at 44.1 kHz. You can find the dataset files over here.

💰 Prizes

🥁 Cinematic Sound Demixing Track (CDX) 10,000 USD

Leaderboard - Divide and Remaster (DnR) dataset : 5,000 USD

1st prize: 2500 USD
2nd prize: 1500 USD
3rd prize: 1000 USD

Participants need to opensource their training + inference code

Leaderboard - Standard Cinematic Sound Separation(Open Track): 5,000 USD

1st prize: 2500 USD
2nd prize: 1500 USD
3rd prize: 1000 USD

This is an Open Track where you can use any data you want.

Please refer to the Challenge Rules for more details about the Open Sourcing criteria for each of the leaderboards to be eligible for the associated prizes.

🖊 Evaluation Metric

As evaluation metric, we are using signal-to-distortion ratio (SDR), which is defined as

is the waveform of the ground truth and sinstr(n) denotes the waveform of the estimate. The higher the SDR score, the better the output of the system is.

In order to rank systems, we will use the average SDR computed by

for each song. Finally, the overall score SDRtotal is given by the average over all songs in the hidden test set. There will be a separate leaderboard for each round.

For an academic report about the challenge, the organizers will get access to the separations of the top-10 submitted entries (i.e., their output) for each leaderboard in order to compute more source separation metrics (e.g., signal-to-interference ratio).

📅 Timeline

The SDX23 Cinematic Sound Demixing Track will take place in 2 Rounds which differ in the evaluation datasets used for ranking the submitted systems.

Warmup Round: 8th December 2022
Phase I: 23rd January 2023
Phase II: 6th March 2023
Challenge End: 1st May 2023

📱 Challenge Organising Committee

Cinematic Sound Demixing Track (CDX)

Yuki Mitsufuji, Stefan Uhlich, Hirano Masato, Shusuke Takahashi (Sony)
Jonathan Le Roux, Gordon Wichern (Mitsubishi Electric Research Labs)

🏆 Challenge Sponsors

Getting Started

8

Reasons of submission failure About 2 years ago

23

5

What baselines are coming up? About 2 years ago

6

5

Structure of the competition Over 2 years ago

5