Skip to the content.

This is the page of the task 11 @SemEval2023 on Learning with Disagreements Le-Wi-DI, 2nd edition


10/01/23: test released, evaluation phase begins!

- Deadline to submit your predictions valid for the competition at Semeval workshop: 31/01/23 23:59 UTC

- Go on our Codalab page to get the data and start preparing your models!!

- Watch here the video with the presentation of the task!!


In recent years, the assumption that natural language (NL) expressions have a single and clearly identifiable interpretation in a given context is more and more recognized as just a convenient idealization. The objective of the Learning with Disagreement shared task is to provide a unified testing framework for learning from disagreements, using datasets containing information about disagreements for interpreting language. Learning with Disagreement (Le-Wi-Di) 2021 created a benchmark consisting of 6 existing and widely used datasets, but focusing primarily on semantic ambiguity and image classification.

For SemEval 2023, we run a second shared task on the topic of Learning with Disagreement: (i) the focus is entirely on subjective tasks, where training with aggregated labels makes much less sense, and (ii) while relying on the same infrastructure, it will involve new datasets. We believe that the shared task thus reformulated is extremely timely, given the current high degree of interest in subjective tasks such as offensive language detection in general, and in particular on the issue of disagreements in such data (Basile et al., 2021; Leonardelli et al., 2021; Akhtar et al., 2021; Davani et al., 2022; Uma et al., 2021) and we hope it attract substantial interest from the community.

The Datasets

Our focus is entirely on subjective tasks, where training with aggregated labels makes much less sense. To this end, we collected a benchmark of four (textual) datasets with different characteristics, in terms of genres (social media and conversations), of languages (English and Arabic), of tasks (misogyny, hate-speech, offensiveness detection) and of annotationsā€™ methods (experts, specific demographics groups, AMT-crowd). But all datasets providing a multiplicity of labels for each instance.

The four datasets presented are:

Aim of the task and data format

We encourage participants in developing methods able to capture agreements/disagreements, rather than focusing on developing the best model. To this end, we developed an harmonized json format used to release all datasets. Thus, features that are common to all datasets, are released in a homogenous format, so to facilitate participants in testing their methods across all the datasets.

Among the information released that is common to all datasets, and of particular relevance for the task, are the disaggregated crowd-annotations labels and the annotatorsā€™ reference. Moreover, dataset-specific information are also released, and vary for each dataset, from demographics of annotators (ArMIS and HS-Brexit datasets), to the other annotations made by the same annotators within the same dataset (all datasets) or additional annotations given for for the same item (HS-Brexit and ConvAbuse datasets) by the same annotator. Participants can leverage on this dataset-specific information to improve perfomance for a specific dataset.

The competition

The shared task is hosted on Codalab. Please refer to Codalab platform for more detailed information. Note that the competion has not started yet and only a sample of the datasets has been released. The competition will start with the practice phase from 01/09/22. Please subscribe to our google group to be updated with the news about the task!

Important Dates



Join our google group. Weā€™ll keep you updated with the news about the task. Contact us directly, if you have further inquiries. Follow us on Twitter, for news about learning with disagreements and more!

Previous Editions