Centre for Language Studies, Radboud University Nijmegen
Martial Pastor β martial.pastor [at] ru.nl Β· Nelleke Oostdijk β nelleke.oostdijk [at] ru.nl
This dataset consists of political tweets annotated for the presence of enthymemes β arguments in which a key component is left implicit. For each tweet, multiple independent annotators determine whether an implicit premise, an implicit conclusion, or no implicit component is present, reconstruct the full propositional structure of the argument, and identify the underlying Walton argumentation scheme.
The tweets cover two topics in British political discourse: immigration policy and COVID-19 vaccination. They were drawn from the tropes corpus of Flaccavento et al. (2025) and selected to balance both topics and enthymeme types across the dataset.
A central design principle is the preservation of annotator disagreement. Rather than reducing multiple judgements to a single ground truth, all individual labels and reconstructions are retained and released alongside the data, enabling research into annotation variation as a substantive signal rather than noise to be discarded.
Each tweet is annotated independently by multiple annotators. Train and dev instances are annotated by five annotators each; test instances by three. Every annotator provides the following for each tweet:
One of three labels: implicit_premise (an unstated supporting assumption the argument relies on), implicit_conclusion (a claim that follows from stated premises but is never expressed), or none (all components are explicit, or no argument is present).
The annotator writes out the full set of propositions β premises and conclusion β constituting the argument. The implicit component is marked with the tag (implicit). The example below illustrates a complete reconstruction.
Annotators classify the argument using Walton's taxonomy of argumentation schemes. The most frequently attested schemes in the dataset include Argument from Cause to Effect, Argument from Inconsistent Commitment, Argument from Motive, Argument from Source Credibility, and Argument from Consequences. The full taxonomy, critical questions, and abstract scheme forms used in annotation are documented in the annotation guidelines.
The dataset is released in three stages. The train and dev sets released in mid-March are supersets of the initial sample.
The dataset is distributed as CSV files β one per annotator per split β alongside a merged file aggregating all annotations. Each row corresponds to one tweet as annotated by one annotator.
| Field | Description |
|---|---|
tweet_id | Unique tweet identifier |
tweet_text | Raw tweet content |
topic | immigration or vaccine |
annotator_id | Anonymised annotator code |
label | implicit_premise, implicit_conclusion, or none |
scheme | Walton scheme name, or None |
prop_1 β¦ prop_3 | Reconstructed propositions with inline role tags |
implicit_text | Extracted text of the implicit proposition (convenience field) |
Within proposition fields, the role of each proposition is marked inline. The implicit component carries the tag (implicit) appended to its text β e.g. "Controlled immigration is desirable. (implicit)".
Enthymemes are among the most pervasive β and most underexplored β features of persuasive discourse. By leaving a key premise or conclusion unstated, an argument invites the reader to supply it themselves, producing the subjective impression that the inference is their own. This mechanism is especially effective in short-form political communication, where space is constrained and emotional register is prioritised over logical explicitness.
Detecting and reconstructing implicit argument components is directly relevant to computational fact-checking, misinformation research, and argument mining more broadly. A system capable of recovering the unstated premise underlying a political claim has taken a meaningful step toward auditing that claim's logical structure.
Most existing argument mining corpora treat annotation disagreement as noise to be minimised. This dataset treats it as a feature: genuine interpretive plurality is preserved and the resource is designed to support research into learning from disagreement rather than collapsing it into a single authoritative label.
implicit_premise β missing supporting assumptionimplicit_conclusion β missing inferred claimnone β no implicit component