Skip to content

Latest commit

 

History

History
36 lines (27 loc) · 4.55 KB

annotation_guidelines.md

File metadata and controls

36 lines (27 loc) · 4.55 KB

Annotation Guidelines

These guidelines were developed jointly by Hessel Haagsma, Barbara Plank, and Johan Bos.

PIE Annotation

A phrase is an instance of a PIE when the phrase contains all the words of the PIE, with the same part-of-speech, and in the same grammatical relations as in the dictionary form of the PIE. Determiners in both the phrase and the PIE should be ignored. Inflectional variation is allowed. Passivization is allowed. Internal modification of the phrase is allowed, e.g. the insertion of adjectives and adverbs. PIEs as (part of) proper names are not allowed. Transitive/intransitive variation of parts of the idiom is not allowed. Coordination of parts of the idiom is not allowed. Genitive variation (for idioms containing possessive pronouns or PPs) is allowed. The addition of tense, aspect, and modality markers/verbs is allowed.

PIE: in the running
Definition: any phrase which has the preposition in, a form of the noun running, where in is the head of running.
Positive examples: in the early running, in the runnings, in running
Negative examples: in a running competition, running in the woods, in a set of runnings, The magazine is called In The Running

PIE: spill the beans
Definition: any phrase which has a form of the verb spill, a form of the noun bean, where bean is the direct object of spill.
Positive examples: spill some very high-profile beans, spilling no beans, The beans were spilled all over the floor.
Negative examples: three very high-profile bean spillings, the spilling of beans, The beans spilled all over the floor., I spilled beans and gravy.

Sense Annotation

Every instance of a PIE has to be labelled with the sense the PIE is used in, in that specific instance. We distinguish four different sense classes: idiomatic, literal, unclear, and other. The idiomatic sense is the one that is defined in the dictionary. The literal sense is the regular, compositional meaning of the phrase. Often, the sense of a PIE instance is clearly idiomatic or literal. In cases where there is some ambiguity, annotators should choose the sense that is most salient in their interpretation. The unclear sense should not be used for ambiguous cases, but only for cases where the clues from the context are not sufficient to disambiguate the sense. For some PIEs, there is no clear literal interpretation possible, e.g. piping hot. In those cases too, the unclear label should be used if there is no context, even though it is certain to have an idiomatic sense. In cases where neither of the above labels apply (see examples below), the other class should be used.

PIE: rock the boat
Idiomatic Sense: to disturb the status quo
Literal Sense: to sway a ship
Idiomatic Examples: I didn't want to rock the boat at my previous job, since I was leaving anyway., We needed a momentum shifter, we needed something to really rock the boat.
Literal Example: The high waves rocked the boat, and I got seasick.
Unclear Example: It rocked the boat., Rock the boat!

Illustration of the other class: idioms embedded in bigger idioms, e.g. on edge in living on the edge, meta-linguistic uses, e.g. the prototypical example of an idiomatic expression, spill the beans, and other cases in which the sense is clear, but it does not fit the idiomatic sense of the PIE in question, or the literal sense of the phrase.

Difficult Cases

  • Dashes: treat dashes as spaces, and dash-joined words as separate words.
  • Coordination: Treat these as non-PIEs. No clear reason for or against, just that a decision has to be made, and syntactic analysis of coordination is not trivial.
  • all over the place: For the PIE 'all over the place', it is unclear what the distinction between its literal and idiomatic senses is. The main source of confusion is Wiktionary. The ODEI only lists in a state of confusion or disorganization. (informal). Mark cases where 'all over the place' is used in the sense of 'everywhere, in many places' as literal, and when it is used as 'untidy, confused, disorganized', mark it as idiomatic.
  • Genitive Variation: We keep it as a PIE, because they are clearly idiomatic and a genuine form of the same PIE, and we want to keep as much data points and variations as possible.
  • Tense/Modality/Aspect: We keep it as a PIE, because they are clearly idiomatic and a genuine form of the same PIE, and we want to keep as much data points and variations as possible.