Problem Statements
Semantic Segmentation
Perform semantic segmentation on aerial images from monocular downward-facing drone
Mono Depth Perception
Estimate depth in aerial images from monocular downward-facing drone
π― Select your final submissions
- Semantic Segmentation selection form
π₯ Challenges are more fun with friends. Find teammates for SUADD'23 π¬
π£ Update in segmentation scoring metrics
π³οΈ How to download dataset via CLI
π΅οΈββοΈ Introduction
Unmanned Aircraft Systems (UAS) have various applications, such as environmental studies, emergency responses or package delivery. The safe operation of fully autonomous UAS requires robust perception systems.
For this challenge, we will focus on images of a single downward camera to estimate the scene's depth and perform semantic segmentation. The results of these two tasks can help the development of safe and reliable autonomous control systems for aircraft.
This challenge includes the release of a new dataset of drone images that will benchmark semantic segmentation and mono-depth perception. The images in this dataset comprise realistic backyard scenarios of variable content and have been taken on various Above Ground Level (AGL) ranges.
This challenge aims to foster the development of fully autonomous Unmanned Aircraft Systems (UAS).
To achieve this, it needs to overcome a multitude of challenges. To leverage fully autonomous drone navigation, the device needs to understand both objects in a scene and the scale and distance to them.
This project's two key computer vision components are semantic segmentation and depth perception.
With this challenge, we aim to inspire the Computer Vision community to develop new insights and advance state-of-the-art in perception tasks involving drone images.
π©βπ Key Tasks
Understanding the 3D scene below the drone is helpful for many of the challenges autonomous drones must address. Semantic segmentation and depth perception are two key components of this. Hence these are the two main goals of this challenge.
These two separate tasks will have their benchmark. We will employ data from a single grey-scale camera to solve them.
Task 1: Semantic Segmentation
Semantic segmentation is the labelling of the pixels of an image according to the category of the object to which they belong. The output for this task is an image in which each pixel has the value of the class it represents.
For this task, we focus on labels that ensure a safe landing, such as the location of humans and animals, round or flat surfaces, tall grass and water elements, vehicles and so on. The labels chosen for this challenge are humans, animals, roads, concrete, roof, tree, furniture, vehicles, wires, snow etc. The complete list of labels is: [WATER
, ASPHALT
, GRASS
, HUMAN
, ANIMAL
, HIGH_VEGETATION
, GROUND_VEHICLE
, FAΓADE
, WIRE
, GARDEN_FURNITURE
, CONCRETE
, ROOF
, GRAVEL
, SOIL
, PRIMEAIR_PATTERN
, SNOW
].
Task 2: Mono-Depth Estimation
Depth estimation measures the distance between the camera and the objects in the scene. It is an important perception task for an autonomous aerial drone. Using two stereo cameras makes this task solvable with stereo vision methods. This challenge aims to create a model that can use the information of a single camera to predict the depth of every pixel.
The output of this task must be an image of equal size to the input image, in which every pixel contains a depth value.
πΎ Dataset
The dataset consists of a collection of flight frames at given timestamps taken from one of the downward cameras of our drones during dedicated data collection operations, not during customer delivery operations.
The dataset contains 412 flights, 2056 total frames (5 frames per flight at different AGLs), Full semantic segmentation annotations of all frames and depth estimations. The dataset has been split into training and (public) test datasets. While the challenge will be scored using a private test dataset, we considered it useful to have this split to allow teams to share their results even after the challenge ends.
This dataset contains birdseye-view greyscale images taken between 5 m and 25 m AGL. Annotations for the semantic segmentation task are fully labelled images across 16 distinct classes, while annotations for the mono-depth estimation task have been computed with geometric stereo-depth algorithms. To the best of our knowledge, this is the largest dataset with full semantic annotations and monodepth estimation ground-truth over a wide range of AGLs and different scenes.
Images can be in uint8 or uint16 format, to load them you can for example use OpenCV:
Ethical Considerations About The Data
The dataset of the challenge contains images of realistic flight footage taken as part of our research and development programs, not from real customer deliveries. Furthermore, it is ensured that all personal identifiers are removed.
π Starter Kit
Check out these easy-2-follow starter kits and baselines to get familiar with documentation, submission follow and setup. This starter kit will help you in making your first submission.
πͺ Baselines
Don't know where to start ? Check out these baselines released by the organizing team:
π Timeline
- Challenge Launch: 22nd December 2022
- Challenge End: 28th April 2023
- Winner Announcement: 30th June 2023
π Prizes
Semantic Segmentation
- π₯ The Top scoring submission will receive $15,000 USD
- π₯ The Second best submission will receive $7,500 USD
- π₯ The Third place submission will receive $1,250 USD
Depth Perception
- π₯ The Top scoring submission will receive $15,000 USD
- π₯ The Second best submission will receive $7,500 USD
- π₯ The Third place submission will receive $1,250 USD
π The Most βCreativeβ solution submitted to the whole competition, as determined by the Sponsorβs sole discretion, will receive $2,500 USD.
π Links
π Discussion Forum π Notebooks
π± Contact
For questions, queries, feedbacks and suggestions, contact: suadd23-challenge@amazon.com.
Participants
Getting Started
Leaderboard
01 |
|
2.000 |
02 |
|
5.000 |
03 |
|
9.000 |
04 | minhsaco99 | 13.000 |
04 | kartana | 13.000 |