Analyzing Multiple Choices QA

Before diving into the detail of this doc, you're strongly recommended to know some important concepts about system analyses.

In this file we describe how to analyze multiple-choice QA models. We will give an example using the fig_qa dataset, but other datasets can be analyzed in a similar way.

Data Preparation

Format of `Dataset` File

(1) datalab: if your datasets have been supported by datalab, you fortunately don't need to prepare the dataset.
(2) json (basically, it's a list of dictionaries with two keys: context , options, question, and answers)

[
  {'context': 'The girl had the flightiness of a sparrow', 'question': '', 'answers': {'text': 'The girl was very fickle.', 'option_index': 0}, 'options': ['The girl was very fickle.', 'The girl was very stable.']},
  {'context': 'The girl had the flightiness of a rock', 'question': '', 'answers': {'text': 'The girl was very stable.', 'option_index': 1}, 'options': ['The girl was very fickle.', 'The girl was very stable.']}
  ...
]

Format of `System Output` File

In order to perform analysis of your results, they should be in the following json format:

[
    {
        "context": "The girl was as down-to-earth as a Michelin-starred canape",
        "question": "",
        "answers": {
            "text": "The girl was not down-to-earth at all.",
            "option_index": 0
        },
        "options": [
            "The girl was not down-to-earth at all.",
            "The girl was very down-to-earth."
        ],
        "predicted_answers": {
            "text": "The girl was not down-to-earth at all.",
            "option_index": 0
        }
    },
  ...
]

where

context represents the text providing context information
question represents the question, which could be null in some scenario
options is a list of string, denoting all potential options.
answers is a dictionary with two elements:
- text: the true answer text
- option_index: the index options for true answer
predicted_answers is a dictionary with two elements:
- text: the predicted answer text
- option_index: the index options for predicted answer

Let's say we have several files such as

gpt2.json

etc. from different systems.

Performing Basic Analysis

In order to perform your basic analysis, we can run the following command:

    explainaboard --task qa-multiple-choice --system-outputs ./data/system_outputs/fig_qa/gpt2.json > report.json

where

--task: denotes the task name, you can find all supported task names here
--system-outputs: denote the path of system outputs. Multiple one should be separated by space, for example, system1 system2
--dataset:optional, denotes the dataset name
report.json: the generated analysis file with json format. You can find the file here. Tips: use a json viewer like this one for better interpretation.

Now let's look at the results to see what sort of interesting insights we can glean from them.

TODO: add insights

Advanced Analysis Options

One also can perform pair-wise analysis:

explainaboard --task qa-multiple-choice --system-outputs model_1 model_2 > report.json

where two system outputs are fed separated by space.

report.json: the generated analysis file with json format, whose schema is similar to the above one with single system evaluation except that all performance values are obtained using the sys1 subtract sys2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

task_qa_multiple_choice.md

task_qa_multiple_choice.md

Analyzing Multiple Choices QA

Data Preparation

Format of `Dataset` File

Format of `System Output` File

Performing Basic Analysis

Advanced Analysis Options

Files

task_qa_multiple_choice.md

Latest commit

History

task_qa_multiple_choice.md

File metadata and controls

Analyzing Multiple Choices QA

Data Preparation

Format of Dataset File

Format of System Output File

Performing Basic Analysis

Advanced Analysis Options

Format of `Dataset` File

Format of `System Output` File