Skip to the content.

Logo

LeWiDi 2023 - 2nd edition LeWiDi 2021 - 1st edition

👍👎 Updates

LeWiDi 3rd Edition shared task at the NLPerspectives Workshop is online!

Please check our competition page on Codabench for data and more information! 👍👍

👍👎 Overview

The third edition of Learning with Disagreements (LeWiDi) is co-located with the NLPerspectives workshop at EMNLP 2025. The Shared Task aims to highlight the challenges posed by interpretative variation and to encourage the research community to engage with this issue. The main goal of the shared task is to provide a unified testing framework for learning from disagreements and evaluating models on such datasets.

The two previous editions of the shared task were organized as part of SEMEVAL: the first edition, (in 2021, Uma et al.) focused on ambiguity in language and vision, while the second edition (in 2023, Leonardelli et al.) concentrated on disagreement in subjective tasks.

This new edition will differ from the previous ones in a number of respects:

👍👎 The datasets

Each dataset includes annotated examples with soft labels generated from multiple annotators and corresponding annotator metadata. Despite differing objectives, all datasets share a homogeneous JSON format.

Conversational Sarcasm Corpus (CSC)

The CSC is a dataset of context+response pairs rated for sarcasm, with ratings from 1 to 6. The paper describing the dataset is available here.

MultiPico dataset (MP)

The MP dataset is a crowdsourced multilingual irony detection dataset. Annotators were tasked to detect whether a reply was ironic in the context of a brief post-reply exchange on social media. Annotators ids and metadata (gender, age, nationality, etc) are available. Languages include Arabic, German, English, Spanish, French, Hindi, Italian, Dutch, and Portuguese. The paper describing the dataset is available here.

Paraphrase Detection dataset (Par)

A dataset of question pairs for which the annotators had to tell whether the two questions are paraphrases of each other, using values on a Likert scale. Maintained by the MaiNLP lab (not yet published).

VariErr NLI dataset (VariErrNLI)

A dataset originally designed for automatic error detection, distinguishing between annotation errors and legitimate human label variations in Natural Language Inference. The paper describing the dataset is available here.

👍👎 Tasks and Evaluation

Only soft evaluation metrics will be used:

Submissions can target one or both tasks. Participants may submit one or multiple datasets.

👍👎 Output of the shared task

Participants can submit a system paper to the 4th Workshop on Perspectivist Approaches to NLP. These peer-reviewed papers will be published in the workshop proceedings.

👍👎 Important dates