Skip to content

Latest commit

 

History

History
109 lines (82 loc) · 3.92 KB

task_qa_multiple_choice.md

File metadata and controls

109 lines (82 loc) · 3.92 KB

Analyzing Multiple Choices QA

Before diving into the detail of this doc, you're strongly recommended to know some important concepts about system analyses.

In this file we describe how to analyze multiple-choice QA models. We will give an example using the fig_qa dataset, but other datasets can be analyzed in a similar way.

Data Preparation

Format of Dataset File

  • (1) datalab: if your datasets have been supported by datalab, you fortunately don't need to prepare the dataset.

  • (2) json (basically, it's a list of dictionaries with two keys: context , options, question, and answers)

[
  {'context': 'The girl had the flightiness of a sparrow', 'question': '', 'answers': {'text': 'The girl was very fickle.', 'option_index': 0}, 'options': ['The girl was very fickle.', 'The girl was very stable.']},
  {'context': 'The girl had the flightiness of a rock', 'question': '', 'answers': {'text': 'The girl was very stable.', 'option_index': 1}, 'options': ['The girl was very fickle.', 'The girl was very stable.']}
  ...
]

Format of System Output File

In order to perform analysis of your results, they should be in the following json format:

[
    {
        "context": "The girl was as down-to-earth as a Michelin-starred canape",
        "question": "",
        "answers": {
            "text": "The girl was not down-to-earth at all.",
            "option_index": 0
        },
        "options": [
            "The girl was not down-to-earth at all.",
            "The girl was very down-to-earth."
        ],
        "predicted_answers": {
            "text": "The girl was not down-to-earth at all.",
            "option_index": 0
        }
    },
  ...
]

where

  • context represents the text providing context information
  • question represents the question, which could be null in some scenario
  • options is a list of string, denoting all potential options.
  • answers is a dictionary with two elements:
    • text: the true answer text
    • option_index: the index options for true answer
  • predicted_answers is a dictionary with two elements:
    • text: the predicted answer text
    • option_index: the index options for predicted answer

Let's say we have several files such as

etc. from different systems.

Performing Basic Analysis

In order to perform your basic analysis, we can run the following command:

    explainaboard --task qa-multiple-choice --system-outputs ./data/system_outputs/fig_qa/gpt2.json > report.json

where

  • --task: denotes the task name, you can find all supported task names here
  • --system-outputs: denote the path of system outputs. Multiple one should be separated by space, for example, system1 system2
  • --dataset:optional, denotes the dataset name
  • report.json: the generated analysis file with json format. You can find the file here. Tips: use a json viewer like this one for better interpretation.

Now let's look at the results to see what sort of interesting insights we can glean from them.

TODO: add insights

Advanced Analysis Options

One also can perform pair-wise analysis:

explainaboard --task qa-multiple-choice --system-outputs model_1 model_2 > report.json

where two system outputs are fed separated by space.

  • report.json: the generated analysis file with json format, whose schema is similar to the above one with single system evaluation except that all performance values are obtained using the sys1 subtract sys2.