๐๐ LeWiDi
Download the data
๐๐ News
- The overview paper describing the LeWiDi task is available here !
- This yearโs edition of the Learning-With-Disagreements shared task was a great success, with around 30 submissions valid for the competition. Results available here.
- A video presenting the task is available here.
๐๐ Overview
In recent years, the assumption that natural language (NL) expressions have a single and clearly identifiable interpretation in a given context is more and more recognized as just a convenient idealization. The objective of the Learning with Disagreement shared task is to provide a unified testing framework for learning from disagreements, using datasets containing information about disagreements for interpreting language. Learning with Disagreement (Le-Wi-Di) 2021 created a benchmark consisting of 6 existing and widely used datasets, but focusing primarily on semantic ambiguity and image classification.
For SemEval 2023, we run a second shared task on the topic of Learning with Disagreements:
- the focus is entirely on subjective tasks, where training with aggregated labels makes much less sense, and
- while relying on the same infrastructure, it will involve new datasets.
We believe that the shared task is extremely timely, given the current high degree of interest in subjective tasks such as offensive language detection in general, and in particular on the issue of disagreements in such data (Basile et al., 2021; Leonardelli et al., 2021; Akhtar et al., 2021; Davani et al., 2022; Uma et al., 2021)
๐๐ The Datasets
To this end, we collected a benchmark of four (textual) datasets with different characteristics, in terms of genres (social media and conversations), of languages (English and Arabic), of tasks (misogyny, hate-speech, offensiveness detection) and of annotationsโ methods (experts, specific demographics groups, AMT-crowd). But all datasets providing a multiplicity of labels for each instance.
The four datasets presented are:
- The HS-brexit dataset: an entirely new dataset of tweets on Abusive Language on Brexit and annotated for hate speech (HS), aggressiveness and offensiveness by six annotators belonging to two distinct groups: a target group of three Muslim immigrants in the UK, and a control group of three other individuals.
- The ArMIS dataset: a dataset of Arabic tweets annotated for misogyny detection by annotators with different demographics characteristics (โModerate Femaleโ, โLiberal Femaleโ and โConservative Maleโ). This dataset is new. VIDEO
- The ConvAbuse dataset: a dataset of 4,185 English dialogues conducted between users and two conversational agents. The user utterances have been annotated by at least three experts in gender studies using a heirarchical labelling scheme (following categories: Abuse binary, Abuse severity; Directedness; Target; Type).
- The MultiDomain Agreement dataset: a dataset of around 10k English tweets from three domains (BLM, Election, Covid-19). Each tweet is annotated for offensiveness by 5 annotators via AMT.Particular focus was put on pre-selecting tweets to be annotated that are potentially leading to disagreement. Indeed, almost 1/3 of the dataset has then been annotated with a 2 vs 3 annotators disagreement, and another third of the dataset has an agreement of 1 vs 4. VIDEO
๐๐ Aim of the task and data format
We encourage participants in developing methods able to capture agreements/disagreements, rather than focusing on developing the best model. To this end, we developed an harmonized json format used to release all datasets. Thus, features that are common to all datasets, are released in a homogenous format, so to facilitate participants in testing their methods across all the datasets.
Among the information released that is common to all datasets, and of particular relevance for the task, are the disaggregated crowd-annotations labels and the annotatorsโ reference. Moreover, dataset-specific information are also released, and vary for each dataset, from demographics of annotators (ArMIS and HS-Brexit datasets), to the other annotations made by the same annotators within the same dataset (all datasets) or additional annotations given for for the same item (HS-Brexit and ConvAbuse datasets) by the same annotator. Participants can leverage on this dataset-specific information to improve perfomance for a specific dataset. </details>
๐๐ The competition
The shared task was hosted on Codalab. Please refer to Codalab platform for more detailed information about the competition.
Training data ready 1 September 2022Evaluation start 10 January 2023Evaluation end by 31 January 2023System paper submission due February 2023Task paper submission due February 2023Notification to authors March 2023Camera ready due April 2023- SemEval workshop Summer 2023 (co-located with a NAACL)
๐๐ Organisers
- Elisa Leonardelli, FKB Trento, Italy
- Gavin Abercrombie, Heriot Watt University Edinburgh, UK
- Dina Almanea, Queen Mary University, UK
- Valerio Basile, University of Turin, IT
- Tommaso Fornaciari, Italian National Policem IT
- Barbara Plank, LMU Munich, DE
- Massimo Poesio, Queen Mary University, UK
- Verena Rieser, Heriot Watt University Edinburgh, UK
- Alexandra N Uma, Connex One, UK
๐๐ Communication
Contact us directly, if you have further inquiries. Our google group with news about the task. Follow us on Twitter, for news about learning with disagreements and more!