Skip to content

Latest commit

 

History

History
47 lines (41 loc) · 993 Bytes

data.md

File metadata and controls

47 lines (41 loc) · 993 Bytes

Data Format

You can pass SpanFinder any formats of data, as long as you implement a dataset reader inherited from SpanReader.

By default, SpanFinder uses its own JSON data format, which enables richer features for training and modeling.

The minimal example of the JSON is

{
  "tokens": ["Bob", "attacks", "the", "building", "."],
  "annotations": [
    {
      "span": [1, 1],
      "label": "Attack",
      "children": [
        {
          "span": [0, 0],
          "label": "Assailant",
          "children": []
        },
        {
          "span": [2, 3],
          "label": "Victim",
          "children": []
        }
      ]
    },
    {
      "span": [3, 3],
      "label": "Buildings",
      "children": [
        {
          "span": [3, 3],
          "label": "Building",
          "children": []
        }
      ]
    }
  ]
}

You can have nested spans with unlimited depth.

An example data file for FrameNet can be downloaded here.