Skip to content
/ NovPhy Public

NovPhy: A Testbed for Physical Reasoning in Open-world Environments

License

Notifications You must be signed in to change notification settings

phy-q/NovPhy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NovPhy: A Physical Reasoning Benchmark for Open-world AI Systems

Chathura Gamage*,1, Vimukthini Pinto*,1, Cheng Xue*,1
Peng Zhang1, Ekaterina Nikonova1, Matthew Stephenson2, Jochen Renz1
1School of Computing
The Australian National University
Canberra, Australia
2Maastricht University
Maastricht, The Netherlands
{chathura.gamage, vimukthini.inguruwattage, cheng.xue}@anu.edu.au

Due to the emergence of AI systems that interact with the physical environment, there is an increased interest in incorporating physical reasoning capabilities into those AI systems. But is it enough to only have physical reasoning capabilities to operate in a real physical environment? In the real world, we constantly face novel situations we have not encountered before. As humans, we are competent at successfully adapting to those situations. Similarly, an agent needs to have the ability to function under the impact of novelties in order to properly operate in an open-world physical environment. To facilitate the development of such AI systems, we propose a new testbed that requires an agent to reason about physical scenarios in the presence of novelties and take actions accordingly. To create tasks in the testbed, we develop novelties representing a diverse novelty space and apply them to commonly encountered five physical scenarios in a physical environment related to applying forces and motions such as rolling, falling, and sliding of objects. We evaluate the agents on their novelty detection and adaptation performance using these tasks. According to our novelty design, we measure two capabilities of an agent: the performance on a novelty when it is applied to different physical scenarios and the performance of a physical scenario when different novelties are applied to it. We conduct a thorough evaluation with human players, learning agents, and heuristic agents. Our evaluation shows that humans' performance is far beyond the agents' performance, some agents even with good normal task performance fail drastically when there is a novelty, and some agents that can adapt to novelties adapt slower than humans. We promote the development of intelligent agents capable of performing at the human level or above when operating in open-world physical environments.

* equal contribution

Link to the published paper: https://www.sciencedirect.com/science/article/pii/S0004370224001346#ac0010

Please site this article as: V. Pinto, C. Gamage, C. Xue et al., NovPhy: A Physical Reasoning Benchmark for Open-world AI Systems, Artificial Intelligence, https://doi.org/10.1016/j.artint.2024.104198

Table of contents

  1. Physical Scenarios in NovPhy
  2. Novelties in NovPhy
  3. Novel Tasks in NovPhy
  4. Generating Tasks
    1. Task Generator
    2. Tasks Generted for the Baseline Analysis
    3. Creating Your Own Tasks
  5. Baseline Agents
    1. How to Run Heuristic Agents
    2. How to Run Learning Agents
      1. How to Run DQN and Deep Relational Baselines
      2. How to Run Stable Baselines
    3. How to Develop Your Own Agent
    4. Outline of the Agent Code
  6. Framework
    1. The Game Environment
    2. Symbolic Representation Data Structure
    3. Communication Protocols
  7. Evaluation Data


1. Physical Scenarios in NovPhy

We consider 5 physical scenarios in NovPhy testbed. Firstly, we consider the basic physical scenarios associated with applying forces directly on the target objects, i.e., the effect of a single force and the effect of multiple forces. On top of simple forces application, we also include scenarios associated with more complex motion including rolling, falling, and sliding, which are inspired by the physical reasoning capabilities developed in human infancy. The physical scenarios we consider and the corresponding physical rules that can use to achieve the goal of the associated tasks are:

  1. Single force: Target objects have to be destroyed with a single force.
  2. Multiple forces: Target objects need multiple forces to destroy.
  3. Rolling: Circular objects have to be rolled along a surface to a target.
  4. Falling: Objects have to fall onto a target.
  5. Sliding: Non-circular objects have to be slid along a surface to a target.

2. Novelties in NovPhy

We design a representative novelty for each hierarchy level in the open-world novelty hierarchy proposed by the DARPA SAIL-ON program novelty working group. The novelty hierarchy consists of eight novelty levels that cover a wide range of novelty types that could occur in an open-world environment. The table below shows the open-world novelty hierarchy and descriptions of representative novelties in NovPhy.

Novelty Level Description Representative Novelty
1. Objects New classes, attributes, or representations of non-volitional entities. A new pig/block that has a different colour to the normal pigs/blocks.
2. Agents New classes, attributes, or representations of volitional entities. A novel external agent, Fan, that blows air (horizontally from left to right) affecting the moving path of objects.
3. Actions New classes, attributes, or representations of external agent behavior. The non-novel external agent, Air Turbulence, increases the magnitude of its upward force.
4. Interactions New classes, attributes, or representations of dynamic properties of behaviors impacting multiple entities. Existing circular wood object now has magnetic properties: repels objects of its type and attracts other object types.
5. Relations New classes, attributes, or representations of static properties of the relationships between multiple entities. The slingshot which is at the left side of the tasks is now at at the right side of the tasks (i.e., the spatial relationship between the slingshot and other objects are changed).
6. Environments New classes, attributes, or representations of global constraints that impact all entities. The gravity in the environment is now inverted, which affects the behaviour of the dynamic objects.
7. Goals New classes, attributes, or representations of external agent objectives. The non-novel external agent, Air Turbulence, changes its goal from pushing objects up to pushing objects down.
8. Events New classes, attributes, or representations of series of state changes. When the first bird is dead, a storm occurs that affects the motion of the objects (by applying a force to the right direction).

3. Novel Tasks in NovPhy

We created novel task templates in NovPhy by applying the above mentioned eight novelties into the tasks of the five physical scenarios. Sceenshots of the 40 task templates with novelties and their corresponding normal task templates are shown below.

Objects Agents
Actions Interactions
Relations Environments
Goals Events
Task templates of the single force scenario. In each pair, the left figure is the normal task template and the right figure is the corresponding task template with the novelty applied.
Objects Agents
Actions Interactions
Relations Environments
Goals Events
Task templates of the multiple forces scenario. In each pair, the left figure is the normal task template and the right figure is the corresponding task template with the novelty applied.
Objects Agents
Actions Interactions
Relations Environments
Goals Events
Task templates of the rolling scenario. In each pair, the left figure is the normal task template and the right figure is the corresponding task template with the novelty applied.
Objects Agents
Actions Interactions
Relations Environments
Goals Events
Task templates of the falling scenario. In each pair, the left figure is the normal task template and the right figure is the corresponding task template with the novelty applied.
Objects Agents
Actions Interactions
Relations Environments
Goals Events
Task templates of the sliding scenario. In each pair, the left figure is the normal task template and the right figure is the corresponding task template with the novelty applied.

4. Generating Tasks

4.1 Task Generator

We develop a Task Generator that can generate tasks for the task templates we designed for each novelty-scenario.

  1. To run the Task Generator:
    1. Go to tasks/task_generator
    2. Copy the task templates that you want to generate tasks into the input (task templates can be found in tasks/task_templates)
    3. Run the Task Generator providing the number of tasks as an argument
       python generate_tasks.py <number of tasks to generate>
    
    1. Generated tasks will be available in the output

4.2 Tasks Generated for Baseline Analysis

We generated 350 tasks from each of the 40 normal task templates and 40 novel task templates for the baseline analysis. The generated tasks can be found in tasks/generated_tasks.zip. After extracting this file, the generatd tasks can be found located in the folder structure:
    generated_tasks/
        -- index of the novelty (novelty_level_i)/
            -- index of the scenario (type_j)/
                -- Levels/
                    -- task files of the novelty-scenario

The novelty indexes i, 1 to 8 are the novelties objects, agents, actions, interactions, relations, environments, goals, and events respectively. The scenario indexes j, xxxx01 to xxxx05 are the scenarios single force, multiple forces, rolling, falling, and sliding respectively. All the tasks of the normal task templates are in the novelty_level_0 folder under corresponding type_j subfolders. The task templates folder (tasks/task_templates) also follows the same naming structure.

4.3 Creating Your Own Tasks

If you want to design your own task templates, you can use the interactive Task Template Designer tool we have provided, which is developed in Unity.

  1. To design your own task template:

    1. Open the project tasks/task_template_designer in Unity
    2. Run the application in Unity Editor and load any game level
    3. While in the game level, open the Level Editor menu by navigating to the Level Editor -> Edit Level in the top-menu of the Unity editor
    4. From the Level Editor menu you can load a game level, save the level, and add any game objects to the level
    5. Design the template by adding new game objects, adjusting their positions, and resizing them as you wish
    6. After designing the task template, save the template using the Save Level button in the Level Editor menu
  2. To generate tasks using your own task template

    1. Add necessary constraints according to your template into the tasks/task_generator/template_constraints.xlsx file
    2. Run the Task Generator using the instructions given in Section 3.1

5. Baseline Agents and the Framework

Tested environments:

  • Ubuntu: 18.04/20.04
  • Python: 3.9
  • Numpy: 1.20
  • torch: 1.8.1
  • torchvision: 0.9.1
  • lxml: 4.6.3
  • tensorboard: 2.5.0
  • Java: 13.0.2/13.0.7
  • stable-baselines3: 1.1.0

Before running agents, please:

  1. Go to sciencebirdsgames and unzip Linux.zip
  2. Go to sciencebirdslevels/generated_tasks and unzip fifth_generation.zip

5.1 How to Run Heuristic Agents

  1. Run Java heuristic agents: Datalab and Eagle Wings:

    1. Go to Utils and in terminal run
      python PrepareTestConfig.py --os [Linux/MacOS]
      
    2. Go to sciencebirdsgames/Linux, in terminal run
      java -jar game_playing_interface.jar
    3. Go to sciencebirdsagents/HeuristicAgents/ and in terminal run Datalab
      java -jar datalab_037_v4_java12.jar 1
      or Eagle Wings
      java -jar eaglewings_037_v3_java12.jar 1

Note that the integer 1 in the end controls the number of agents to be running. You can set it to different integer value that suits you the best.

  1. Run Random Agent and Pig Shooter:
    1. Go to sciencebirdsagents/
    2. In terminal, after grant execution permission run Random Agent
      ./TestPythonHeuristicAgent.sh RandomAgent
      or Pig Shooter
      ./TestPythonHeuristicAgent.sh PigShooter

5.2.1 How to Run DQN and Deep Relational Baselines

For Symbolic Agent

  1. Go to sciencebirdsagents/Utils
  2. Open Parameters.py and set agent to be DQNDiscreteAgent and network to be DQNSymbolicDuelingFC_v2 for DQN and DQNRelationalSymbolic for Deep Relationa, and state_repr_type to be "symbolic"

For Image Agent

  1. Go to sciencebirdsagents/Utils

  2. Open Parameters.py and set agent to be DQNDiscreteAgent and network to be DQNImageResNet for DQN and DQNRelationalImage for Deep Relationa and state_repr_type to be "image"

  3. Go to sciencebirdsagents/

  4. In terminal, after grant execution permission, train the agent for within template training

    ./TrainLearningAgent.sh within_template

    and for within scenatio

    ./TrainLearningAgent.sh benchmark
  5. Models will be saved to sciencebirdsagents/LearningAgents/saved_model

  6. To test learning agents, go the folder sciencebirdsagents:

    1. test within template performance, run
    python TestAgentOfflineWithinTemplate.py
    
    1. test within capability performance, run
    python TestAgentOfflineWithinCapability.py
    

5.2.2 How to Run Stable Baselines 3 Agents

For Symbolic Agent

  1. Go to sciencebirdsagents/Utils
  2. Open Parameters.py and set agent to be "ppo" or "a2c" and state_repr_type to be "symbolic"

For Image Agent

  1. Go to sciencebirdsagents/Utils

  2. Open Parameters.py and set agent to be "ppo" or "a2c" and state_repr_type to be "image"

  3. Go to sciencebirdsagents/

  4. In terminal, after grant execution permission, train the agent for within template training

    ./TrainAndTestOpenAIStableBaselines.sh within_template

    and for within scenatio

    ./TrainAndTestOpenAIStableBaselines.sh benchmark
  5. Models will be saved to sciencebirdsagents/OpenAIModelCheckpoints and tensorboard log will be saved to OpenAIStableBaseline

5.3 How to Develop Your Own Agent

We provide a gym-like environment. For a simple demo, which can be found at demo.py

from SBAgent import SBAgent
from SBEnvironment.SBEnvironmentWrapper import SBEnvironmentWrapper

# for using reward as score and 50 times faster game play
env = SBEnvironmentWrapper(reward_type="score", speed=50)
level_list = [1, 2, 3]  # level list for the agent to play
dummy_agent = SBAgent(env=env, level_list=level_list)  # initialise agent
dummy_agent.state_representation_type = 'image'  # use symbolic representation as state and headless mode
env.make(agent=dummy_agent, start_level=dummy_agent.level_list[0],
         state_representation_type=dummy_agent.state_representation_type)  # initialise the environment

s, r, is_done, info = env.reset()  # get ready for running
for level_idx in level_list:
    is_done = False
    while not is_done:
        s, r, is_done, info = env.step([-100, -100])  # agent always shoots at -100,100 as relative to the slingshot

    env.current_level = level_idx+1  # update the level list once finished the level
    if env.current_level > level_list[-1]: # end the game when all game levels in the level list are played
        break
    s, r, is_done, info = env.reload_current_level() #go to the next level

5.4 Outline of the Agent Code

The ./sciencebirdsagents folder contains all the relevant source code of our agents. Below is the outline of the code (this is a simple description. Detailed documentation in progress):

  1. Client:
    1. agent_client.py: Includes all communication protocols.
  2. final_run: Place to store tensor board results.
  3. HeuristicAgents
    1. datalab_037_v4_java12.jar: State-of-the-art java agent for Angry Birds.
    2. eaglewings_037_v3_java12.jar: State-of-the-art java agent for Angry Birds.
    3. PigShooter.py: Python agent that shoots at the pigs only.
    4. RandomAgent.py: Random agent that choose to shoot from $x \in (-100,-10)$ and $y \in (-100,100)$.
    5. HeuristicAgentThread.py: A thread wrapper to run multi-instances of heuristic agents.
  4. LearningAgents
    1. RLNetwork: Folder includes all DQN structures that can be used as an input to DQNDiscreteAgent.py.
    2. saved_model: Place to save trained models.
    3. LearningAgent.py: Inherited from SBAgent class, a base class to implement learning agents.
    4. DQNDiscreteAgent.py: Inherited from LearningAgent, a DQN agent that has discrete action space.
    5. LearningAgentThread.py: A thread wrapper to run multi-instances of learning agents.
    6. Memory.py: A script that includes different types of memories. Currently, we have normal memory, PrioritizedReplayMemory and PrioritizedReplayMemory with balanced samples.
  5. SBEnvironment
    1. SBEnvironmentWrapper.py: A wrapper class to provide gym-like environment.
    2. SBEnvironmentWrapperOpenAI.py: A wrapper class to provide gym-like environment for OpenAI Stable Baseline 3 agents.
    3. Server.py: A wrapper class for the game server for the OpenAI Stable Baseline 3 agents.
  6. StateReader: Folder that contains files to convert symbolic state representation to inputs to the agents.
  7. Utils:
    1. Config.py: Config class that used to pass parameter to agents.
    2. GenerateCapabilityName.py: Generate a list of names of capability for agents to train.
    3. GenerateTemplateName.py: Generate a list of names of templates for agents to train.
    4. LevelSelection.py: Class that includes different strategies to select levels. For example, an agent may choose to go to the next level if it passes the current one, or only when it has played the current level for a predefined number of times.
    5. NDSparseMatrix.py: Class to store converted symbolic representation in a sparse matrix to save memory usage.
    6. Parameters.py: Training/testing parameters used to pass to the agent.
    7. PrepareTestConfig.py: Script to generate config file for the game console to use for testing agents only.
    8. trajectory_planner.py: It calculates two possible trajectories given a directly reachable target point. It returns None if the target is non-reachable by the bird
  8. demo.py: A demo to showcase how to use the framework.
  9. SBAgent.py: Base class for all agents.
  10. MultiAgentTestOnly.py: To test python heuristic agents with running multiple instances on one particular template.
  11. TestAgentOfflineWithinCapability.py: Using the saved models in LearningAgents/saved_model to test agent's within capability performance on test set.
  12. TestAgentOfflineWithinTemplate.py: Using the saved models in LearningAgents/saved_model to test agent's within template performance on test set.
  13. TrainLearningAgent.py: Script to train DQN baseline agents on particular template with defined mode.
  14. TestPythonHeuristicAgent.sh: Bash Script to test heuristic agent's performance on all templates.
  15. TrainLearningAgent.sh: Bash Script to train DQN baseline agents to test both local and board generalization.
  16. OpenAI_StableBaseline_Train.py: Python script to run OpenAI Stable Baseline 3 agents on particular template with defined mode..
  17. TrainAndTestOpenAIStableBaselines.sh: Bash script to run OpenAI Stable Baseline 3 agents to test both local and board generalization.

6. Framework

6.1 The Game Environment

  1. The coordination system
    • in the science birds game, the origin point (0,0) is the bottom-left corner, and the Y coordinate increases along the upwards direction, otherwise the same as above.
    • Coordinates ranging from (0,0) to (640,480).

6.2 Symbolic Representation Data Structure

  1. Symbolic Representation data of game objects is stored in a Json object. The json object describes an array where each element describes a game object. Game object categories, and their properties are described below:

    • Ground: the lowest unbreakable flat support surface

      • property: id = 'object [i]'
      • property: type = 'Ground'
      • property: yindex = [the y coordinate of the ground line]
    • Platform: Unbreakable obstacles

      • property: id = 'object [i]'
      • property: type = 'Object'
      • property: vertices = [a list of ordered 2d points that represents the polygon shape of the object]
      • property: colormap = [a list of compressed 8-bit (RRRGGGBB) colour and their percentage in the object]
    • Trajectory: the dots that represent the trajectories of the birds

      • property: id = 'object [i]'
      • property: type = 'Trajectory'
      • property: location = [a list of 2d points that represents the trajectory dots]
    • Slingshot: Unbreakable slingshot for shooting the bird

      • property: id = 'object [i]'
      • property: type = 'Slingshot'
      • property: vertices = [a list of ordered 2d points that represents the polygon shape of the object]
      • property: colormap = [a list of compressed 8-bit (RRRGGGBB) colour and their percentage in the object]
    • Red Bird:

      • property: id = 'object [i]'
      • property: type = 'Object'
      • property: vertices = [a list of ordered 2d points that represents the polygon shape of the object]
      • property: colormap = [a list of compressed 8-bit (RRRGGGBB) colour and their percentage in the object]
    • all objects below have the same representation as red bird

    • Blue Bird:

    • Yellow Bird:

    • White Bird:

    • Black Bird:

    • Small Pig:

    • Medium Pig:

    • Big Pig:

    • TNT: an explosive block

    • Wood Block: Breakable wooden blocks

    • Ice Block: Breakable ice blocks

    • Stone Block: Breakable stone blocks

  2. Round objects are also represented as polygons with a list of vertices

  3. Symbolic Representation with noise

    • If noisy Symbolic Representation is requested, the noise will be applied to each point in vertices of the game objects except the ground, all birds and the slingshot
    • The noise for 'vertices' is applied to all vertices with the same amount within 5 pixels
    • The colour map has a noise of +/- 2%.
    • The colour is the colour map compresses 24 bit RGB colour into 8 bit
      • 3 bits for Red, 3 bits for Green and 2 bits for Blue
      • the percentage of the colour that accounts for the object is followed by colour
      • example: (127, 0.5) means 50% pixels in the objects are with colour 127
    • The noise is uniformly distributed
    • We will later offer more sophisticated and adjustable noise.

6.3 Communication Protocols

</tbody>
Message ID Request Format (byte[ ]) Return Format (byte[ ])
1-10 Configuration Messages
1 Configure team ID
Configure running mode
[1][ID][Mode]
ID: 4 bytes
Mode: 1 byte
COMPETITION = 0
TRAINING = 1
Four bytes array.
The first byte indicates the round;
the second specifies the time limit in minutes;
the third specifies the number of available levels
[round info][time limit][available levels]
Note: in training mode, the return will be [0][0][0].
As the round info is not used in training,
the time limit will be 600 hours,
and the number of levels needs to be requested via message ID 15
2 Set simulation speed
speed$\in$[0.0, 50.0]
Note: this command can be sent at anytime during playing to change the simulation speed
[2][speed]
speed: 4 bytes
OK/ERR [1]/[0]
11-30 Query Messages
11 Do Screenshot [11] Width, height, image bytes
Note: this command only returns screenshots without symbolic representation
[width][height][image bytes]
width, height: 4 bytes
12 Get game state [12] One byte indicates the ordinal of the state [0]: UNKNOWN
[1] : MAIN_MENU
[2]: EPISODE_MENU
[3]: LEVEL_SELECTION
[4]: LOADING
[5]: PLAYING
[6]: WON
[7]: LOST
14 Get the current level [14] four bytes array indicates the index of the current level [level index]
15 Get the number of levels [15] four bytes array indicates the number of available levels [number of level]
23 Get my score [23] A 4 bytes array indicating the number of levels
followed by ([number_of_levels] * 4) bytes array with every four
slots indicates a best score for the corresponding level
[number_of_levels][score_level_1]....[score_level_n]
Note: This should be used carefully for the training mode,
because there may be large amount of levels used in the training.
Instead, when the agent is in winning state,
use message ID 65 to get the score of a single level at winning state
31-50 In-Game Action Messages
31 Shoot using the Cartesian coordinates [Safe mode*] [31][fx][fy][dx][dy][t1][t2]
focus_x : the x coordinate of the focus point
focus_y: the y coordinate of the focus point
dx: the x coordinate of the release point minus focus_x
dy: the y coordinate of the release point minus focus_y
t1: the release time
t2: the gap between the release time and the tap time.
If t1 is set to 0, the server will execute the shot immediately.
The length of each parameter is 4 bytes
OK/ERR [1]/[0]
32 Shoot using Polar coordinates [Safe mode*] [32][fx][fy][theta][r][t1][t2]
theta: release angle
r: the radial coordinate
The length of each parameter is 4 bytes
OK/ERR [1]/[0]
33 Sequence of shots [Safe mode*] [33][shots length][shot message ID][Params]...[shot message ID][Params]
Maximum sequence length: 16 shots
An array with each slot indicates good/bad shot.
The bad shots are those shots that are rejected by the server
For example, the server received 5 shots, and the third one
was not executed due to some reason, then the server will return
[1][1][0][1][1]
41 Shoot using the Cartesian coordinates [Fast mode**] [41][fx][fy][dx][dy][t1][t2]
The length of each parameter is 4 bytes
OK/ERR [1]/[0]
42 Shoot using Polar coordinates [Fast mode**] [42][fx][fy][theta][r][t1][t2]
The length of each parameter is 4 bytes
OK/ERR [1]/[0]
43 Sequence of shots [Fast mode**] [43][shots length][shot message ID][Params]...[shot message ID][Params]
Maximum sequence length: 16 shots
An array with each slot indicates good/bad shot.
The bad shots are those shots that are rejected by the server
For example, the server received 5 shots, and the third one
was not executed due to some reason, then the server will return
[1][1][0][1][1]
34 Fully Zoom Out [34] OK/ERR [1]/[0]
35 Fully Zoom In [35] OK/ERR [1]/[0]
51-60 Level Selection Messages
51 Load a level [51][Level]
Level: 4 bytes
OK/ERR [1]/[0]
52 Restart a level [52] OK/ERR [1]/[0]
61-70 Science Birds Specific Messages
61 Get Symbolic Representation With Screenshot [61] Symbolic Representation and corresponding screenshot [symbolic representation byte array length][Symbolic Representation bytes][image width][image height][image bytes]
symbolic representation byte array length: 4 bytes
image width: 4 bytes image height: 4 bytes
62 Get Symbolic Representation Without Screenshot [62] Symbolic Representation [symbolic representation byte array length][Symbolic Representation bytes]
63 Get Noisy Symbolic Representation With Screenshot [63] noisy Symbolic Representation and corresponding screenshot [symbolic representation byte array length][Symbolic Representation bytes][image width][image height][image bytes]
64 Get Noisy Symbolic Representation Without Screenshot [64] noisy Symbolic Representation [symbolic representation byte array length][Symbolic Representation bytes]
65 Get Current Level Score [65] current score
Note: this score can be requested at any time at Playing/Won/Lost state
This is used for agents that take intermediate score seriously during training/reasoning
To get the winning score, please make sure to execute this command when the game state is "WON"
[score]
score: 4 bytes
* Safe Mode: The server will wait until the state is static after making a shot.
** Fast mode: The server will send back a confirmation once a shot is made. The server will not do any check for the appearance of the won page.

7. Evaluation Data

Evaluation data folder contains three zip files. adaptation_data.zip, detection_data.zip, and human_playdata.zip.

7.1 Adpatation Data

The adaptation_data.zip contains datasets for 11 agents. Each .csv file is for a single agent and the following are the columns.

  1. LevelIndex: The index assigned to the task
  2. Score: The score achieved in the task
  3. LevelStatus: The Pass/Fail status of the task
  4. birdsRemaining: Number of birds remaining at the end of the task
  5. pigsRemaining: Number of pigs remaining at the end of the task
  6. birdsAtStart: Number of birds at the start of the task
  7. pigsAtStart: Number of pigs at the start of the task
  8. trial: The trial the task belongs
  9. levelName: The name of the task
  10. informed: The informed/uninformed status of the task
  11. novelty: If the task is basic/novelty
  12. index: An index assigned to the task (Negatives for basic and postives for novel)

7.2 Detection Data

The detection_data.zip contains datasets two zip files pre_assumed_moving_average.zip and simple_moving_average.zip. The first is the detection calculted using pre-assumed moving average and the later is the detection calculated using simple moving average.

There are 16 .csv files under pre_assumed_moving_average.zip and 32 .csv files under simple_moving_average.zip with variations of agents, window sizes, and detection thresholds. Each .csv file is named as "agent name - window size - detection threshold". The columns of a .csv file is as follows.

  1. trial: The trial index
  2. distribution_shift: The index where distribution shift occurs
  3. detected_level: The index where agent detected novelty
  4. detection_delay: The delay in dection calculated for a trial
  5. cdt: 1/0 to indicate if agent detected novelty or not
  6. novelty_type: The name of the novelty type

7.1 Human Player Data

The human player data on NovPhy is given in human_playdata.zip. This includes data collected from 48 players. Each .csv file is for a player and all the columns are same as what was collected for an agent with an additional column at the end "novelty_detected" to indicate if the player detected novelty.

About

NovPhy: A Testbed for Physical Reasoning in Open-world Environments

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •