Official round: Completed

Post round: Completed

LifeCLEF 2020 Geo

USD 5K as part of Microsoft's AI for Earth program Prize Money

1 Authorship/Co-Authorship

LifeCLEF

7696

Note: Do not forget to read the Rules section on this page. Pressing the red Participate button leads you to a page where you have to agree with those rules. You will not be able to submit any results before agreeing with the rules. In addition, please read the Submission instructions section on this page before trying to submit results.

News

05/06 - Submission deadline extended to June 14.

11/05 - A section on the important dates have been added in the challenge description.

28/04 - The US and FR occurrence files have been updated to correct an issue with the occurrence IDs. Please use these new files, which can be found in Resources tab.

08/04 - A short paper describing the dataset and the evaluation metric can be found here or in the Resources tab.

06/04 - The data is available in the Resources tab once you have joined the challenge.

Challenge description

Motivation

Automatic prediction of the list of species most likely to be observed at a given location is useful for many scenarios related to biodiversity management and conservation. First, it could improve species identification tools (whether automatic, semi-automatic or based on traditional field guides) by reducing the list of candidate species observable at a given site. More generally, this could facilitate biodiversity inventories through the development of location-based recommendation services (e.g. on mobile phones), encourage the involvement of citizen scientist observers, and accelerate the annotation and validation of species observations to produce large, high-quality data sets. Last but not least, this could be used for educational purposes through biodiversity discovery applications with features such as contextualized educational pathways.

Task

The occurrence dataset will be split into a training set with known species labels and a test set used for evaluation. For each occurrence (with geographic images) in the test set, the goal of the task will be to return a set of candidate species with associated confidence scores.

Important dates

The important dates are common for all LifeCLEF's challenges and can be found on LifeCLEF's main page.

The submissions will be opened 3 weeks before the end of the challenge on 17/05.

Data

The challenge relies on a collection of millions of occurrences of plants and animals in the US and France, coming from the iNaturalist and Pl@ntNet citizen science platforms. In addition to geo-coordinates and species name, occurrences are paired with a set of covariates characterizing the landscape and environment around the occurrence. Covariates include high-resolution remote sensing imagery, land cover data, and altitude, as well as traditional low-resolution climate and soil variables. In more detail, each occurrence is paired with the following co-variates: (i) high resolution RGB-IR remote sensing imagery (1 meter per pixel, 256x256 pixels, 4 channels) from NAIP for the US and from IGN for France, (ii) high resolution land cover (resampled to 1 meter per pixel, 256x256 pixels) from NLCD for the US and from Cesbio for France, (iii) local topography (resampled to 1 meter per pixel, 256x256 pixels). Additionally, we provide 19 bio-climatic rasters from WorldClim (1 km resolution) and 8 pedologic rasters from SoilGrids. The data is available under the “Resources” tab.

Submission instructions

Once submissions are being accepted you will find a “Create Submission” button on this page (next to the tabs).

Before being allowed to submit your results, you have to first press the red Participate button, which leads you to a page where you have to accept the challenges rules.

A submission should be a CSV file (where entries are separated by a comma). Each row should contain the predictions made by the model for a single observation. You must provide a real-valued score for each class and each observation (see the Evaluation criteria section). To make the submitted files smaller, the submission should only contain the 150 classes with highest scores for each observation. For each of these top 150 classes, the class id and the associated score must be provided.

The CSV file will thus contain 301 columns, in this order:

observation_id: a single integer corresponding to the id of the observation
for i from 1 to 150, columns of the form:
- top_class_id: an integer corresponding to the id of the class with the i-th score
for i from 1 to 150, columns of the form:
- top_class_score: a real value corresponding to the value of the score of the i-th class

Note that a class id can appear only once in each row and the classes must be ordered in descending order by score.

The submission should also have a header with the column names, which are listed previously.

Evaluation criteria

The evaluation criterion will be an adaptive top-$K$ accuracy. For each submission, we will first compute the threshold $t$ such that the average number of classes above the threshold (over all test occurrences) is $K$. Note that each sample may be associated with a different number of predictions. Then, we will compute the percentage of test observations for which the correct species is among the classes above the threshold.

In practice, if the scores for the $n$-th observation are denoted $s_1^{(n)},s_2^{(n)},\dots,s_C^{(n)}$ where $C$ is the total number of species, then the average accuracy for a given threshold $t K$ is computed using

$\frac{1}{N} \sum_{n=1}^N \delta_{s_{y_n}^{(n)} \geq t}$

where $y_n$ is the target species of the $n$ -th sample. The threshold $t$ is the lowest value satisfying

$\frac{1}{N} \sum_{n=1}^N \sum_{i=1}^C \delta_{s_i^{(n)} \geq t} \leq K$

where the left term is the average number of results predicted per sample.

$K$ will be fixed to 30, which corresponds to the average observed plant species richness across the plots inventoried in Sophy [1].

To compute the evaluation criterion, a per-class confidence score is needed for each observation. Due to submission file constraints, only the 150 highest scoring classes should be provided and will be considered in the computation of the metric.

The top-30 accuracy will be used as a secondary evaluation criterion.

Please see here for more information about the evaluation criteria.

Rules

LifeCLEF lab is part of the Conference and Labs of the Evaluation Forum (CLEF) 2020. CLEF 2020 consists of independent peer-reviewed workshops on a broad range of challenges in the fields of multilingual and multimodal information access evaluation, and a set of benchmarking activities carried out in various labs designed to test different aspects of mono and cross-language information retrieval systems. More details about the conference can be found here.

Submitting a working note with the full description of the methods used in each run is mandatory. Any run that could not be reproduced based on its description in the working notes may be removed from the official publication of the results. Working notes are published within CEUR-WS proceedings, resulting in an assignment of an individual DOI (URN) and an indexing by many bibliography systems including DBLP. According to the CEUR-WS policies, a light review of the working notes will be conducted by the LifeCLEF organizing committee to ensure quality. As an illustration, LifeCLEF 2019 working notes (task overviews and participant working notes) can be found within the CLEF 2019 CEUR-WS proceedings.

Important

Participants in this challenge will automatically be registered at CLEF 2020. In order to be compliant with the CLEF registration requirements, please edit your profile by providing the following additional information:

First name

Last name

Affiliation

Address

City

Country

Please choose a username that represents your team.

This information will not be publicly visible and will be exclusively used to contact you and to send the registration data to CLEF, which is the main organizer of all CLEF labs.

Citations

Information will be posted after the challenge ends.

Prizes

Cloud credit

The winner of each of the challenges will be offered a cloud credit grant of 5k USD as part of Microsoft’s AI for Earth program.

Publication

LifeCLEF 2020 is an evaluation campaign that is being organized as part of the CLEF initiative labs. The campaign offers several research tasks that welcome participation from teams around the world. The results of the campaign appear in the working notes proceedings, published by CEUR Workshop Proceedings (CEUR-WS.org). Selected contributions from the participants will be invited for publication in the following year in the Springer Lecture Notes in Computer Science (LNCS) series together with the annual lab overviews.

Misc

Contact us

Discussion Forum

You can ask questions related to this challenge on the Discussion Forum. Before asking a new question please make sure that question has not been asked before.
Click on Discussion tab above or click here

Alternative channels

We strongly encourage you to use the public channels mentioned above for communications between the participants and the organizers. In extreme cases, if there are any queries or comments that you would like to make using a private communication channel, then you can send us an email at :

benjamin[dot]deneu[at]inria[dot]fr
ecole[at]caltech[dot]edu
maximilien[dot]servajean[at]lirmm[dot]fr
christophe[dot]botella[at]cirad[dot]fr
titouan[dot]lorieul[at]inria[dot]fr
alexis[dot]joly[at]inria[dot]fr

More information

You can find additional information on the challenge here.

References

[1] Ruffray, P., B.H.G.r.G.H.M.: “sophy”, une banque de données phytosociologiques; son intérêt pour la conservation de la nature. Actes du colloque “Plantes sauvages et menacées de France: bilan et protection”, Brest, 8-10 octobre 1987 pp. 129–150 (1989).