Round 3: Completed Weight: 10.0

Cell-Entity Annotation (CEA) Challenge

SemTab: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching

8111

382

NEWS: Please join our discussion group and visit our website

This is a task of ISWC 2020 challenge “SemTab: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching”. The task is to annotate table cells (entity mentions) of a give table set with entities of a knowledge graph (KG) such as DBpedia and Wikidata.

Task Description

Given a set of table cells, the task is to annotate each cell with an entity of a specific KG.

Each submission file should contain one or NO annotation for each target cell. Any of the equivalent entities of the ground truth entity, such as the wiki page redirected entities in DBpedia (by dbo:wikiPageRedirects) are regarded as correct. The annotation should be the entity's full URI, and case is NOT sensitive.

The submission file should be in CSV format. Each line should contain the annotation of one cell which is identified by a table id, a row id and a column id. Namely one line should have four fields: “Table ID”, “Row ID”, “Column ID”, and “Entity URI”. The headers should be excluded from the submission file. Here is an example for Wikidata: "KIN0LD6C","1","0","http://www.wikidata.org/entity/Q2472824".

Notes:

1) Table ID is the filename of the table data, but does not include the extension.

2) Row ID is the position of the row in the table file, starting from 0, i.e., first row’s ID is 0.

3) Column ID is the position of the column in the table file, starting from 0, i.e., first column’s ID is 0.

4) At most one entity can be annotated for each cell, and one submission file should have NO duplicate lines for one cell.

5) Annotations for cells out of the target cells are ignored.

Datasets

Table set for Round #1: Tables, Target Cells, KG: Wikidata

Table set for Round #2: Tables, Target Cells

Table set for Round #3: Tables, Target Cells

Table set for Round #4: Tables, Target Cells

Data Description: The table for Round #1 is generated from Wikidata (Version: March 5, 2020). One table is stored in one CSV file, and each line corresponds to a table row. In the target cell file, one target cell in stored in one line.

Evaluation Criteria

Precision, Recall and F1_Score are calculated:

$P r e c i s i o n = \frac{c o r r e c t l y a n n o t a t e d c e l l s #}{a n n o t a t e d c e l l s #}$

$R e c a l l = \frac{c o r r e c t l y a n n o t a t e d c e l l s #}{t a r g e t c e l l s #}$

$F 1_S c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}$

Notes:

1) # denotes the number.

2) F1_Score is used as the primary score; Precision is used as the secondary score.

3) An empty annotation of a cell will lead to an annotated cell; we suggest to exclude the cell with empty annotation in the submission file.

Submission

1. One participant is allowed to make at most 5 submissions per day in Round #1 and #2

2. The evaluation for each submission may cost several minutes

Tentative Dates

1. Round #1: 26 May to 20 July

2. Round #2: 25 July to 30 Aug

3. Round #3: 3 September to 17 September

4. Round #4: 20 September to 4 October

Rules

Selected systems with the best results will be invited to present their results during the ISWC conference and the Ontology Matching workshop.
Participants are encouraged to submit a system paper describing their tool and the obtained results. Papers will be published online as a volume of CEUR-WS as well as indexed on DBLP. By submitting a paper, the authors accept the CEUR-WS and DBLP publishing rules.
Please see additional information at our official website

Leaderboard

01	MTab4Wikidata	0.991
02	LinkingPark	0.986
03	Team_DAGOBAH	0.985
04	Unimib	0.974
05	bbw	0.954