Official results

Below are the official test scores for all participants and scenarios. For each scenario and participant, the best run in terms of F1, out of a maximum of three was selected.

Special congratulations to all teams for their high-quality submissions which have surpassed previous years and our wildest expectations.

Raw results are available in different formats. Feel free to use these resources to build your tables, graphics and comparisons.

Scenario 1: Main evaluation

Team F1 Precision Recall
Vicomtech 0.665564 0.679364 0.652315
Talp-UPC 0.626679 0.626969 0.626389
UH-MAJA-KD 0.625 0.634542 0.615741
IXA-NER-RE 0.55748 0.58008 0.536574
UH-MatCom 0.556876 0.716157 0.455556
SINAI 0.42069 0.651456 0.310648
HAPLAP 0.395153 0.458435 0.347222
baseline 0.395153 0.458435 0.347222
ExSim 0.245644 0.312589 0.202315

Scenario 2: Task A

Team F1 Precision Recall
SINAI 0.825207 0.844633 0.806655
Vicomtech 0.820882 0.821622 0.820144
Talp-UPC 0.815836 0.807218 0.82464
UH-MAJA-KD 0.814312 0.820255 0.808453
UH-MatCom 0.794967 0.824952 0.767086
IXA-NER-RE 0.6918 0.726733 0.660072
HAPLAP 0.541978 0.503864 0.586331
baseline 0.541978 0.503864 0.586331
ExSim 0.314214 0.292117 0.339928

Scenario 3: Task B

Team F1 Precision Recall
IXA-NER-RE 0.633235 0.647887 0.619231
UH-MAJA-KD 0.59879 0.629237 0.571154
Vicomtech 0.583243 0.671679 0.515385
Talp-UPC 0.574786 0.646635 0.517308
UH-MatCom 0.545035 0.682081 0.453846
SINAI 0.461725 0.627063 0.365385
HAPLAP 0.316418 0.327835 0.305769
ExSim 0.131313 0.527027 0.075
baseline 0.131313 0.527027 0.075

Scenario 4: Alternative domain

Team F1 Precision Recall
Talp-UPC 0.58353 0.604724 0.563772
Vicomtech 0.563251 0.594009 0.535521
UH-MAJA-KD 0.547739 0.608321 0.49813
IXA-NER-RE 0.478863 0.563202 0.416494
UH-MatCom 0.37307 0.726835 0.250935
SINAI 0.28125 0.626255 0.181346
HAPLAP 0.13779 0.281772 0.0911924
baseline 0.13779 0.281772 0.0911924
ExSim 0.122282 0.253264 0.0805983