POMDP States with different state variables types, definition and management with basicPOMCP online solver #423
-
Hi, I wonder how to implement and deal with a POMDP State with different state variables types using the QuickPOMDP approach? Currently, I am using the following states, actions, and observations:
With the variables defined as int-type objects:
I am trying to include multiple float-type state variables (BatteryLevel, SurfaceType, ExplorationPercentage) and observations, something like:
Thanks in advance for your help! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
One important conceptual thing to understand is that the word "states" is used in two ways in the literature and discussions. Sometimes people use "states" to refer to the state dimensions, i.e. what are the variables needed to describe the state. For example someone might say "the states of the pendulum system are theta and theta dot". The second meaning is a set of all of the possible values that the state can take, i.e. the state can be In your example above, you seem to be mixing the two meanings, i.e.
Yes, you can now use the using CommonRLSpaces
statuses = [1, 2, 3, 4, 5]
battery_levels = 0..1 # this represents the continuous interval from 0 to 1
state_space = product(statuses, battery_levels)
m = QuickPOMDP(
states = state_space,
# Other arguments
) To see what an element of your new space looks like, you can use Now, whenever you represent states, they will be tuples with an integer and float in them, so, for example, you might have: initialstate = Deterministic((1, 1.0))
transition = function(s, a)
ImplicitDistribution() do rng
if a == 2 # say action 2 is hovering
return (2, s[2] - 0.05 - 0.01*rand(rng)) # s[2] is the current battery. For hovering, one step costs 5% battery plus a small random amount
else
# other cases
end
end
end
observation = function(a, sp)
ImplicitDistribution() do rng
return (sp[1], sp[2] + 0.01*randn(rng))
end
end Note however, that POMCP will not perform well in continuous observation spaces as noted here: https://arxiv.org/abs/1709.06196. ARDESPOT is better, but still not optimal. POMCPOW requires an explicit observation function, so you will have to define your own distribution since we haven't implemented a |
Beta Was this translation helpful? Give feedback.
One important conceptual thing to understand is that the word "states" is used in two ways in the literature and discussions. Sometimes people use "states" to refer to the state dimensions, i.e. what are the variables needed to describe the state. For example someone might say "the states of the pendulum system are theta and theta dot". The second meaning is a set of all of the possible values that the state can take, i.e. the state can be
s_LANDED
ors_HOVERING
. In POMDPs.jl, we use the second meaning when a function or argument is calledstates
.In your example above, you seem to be mixing the two meanings, i.e.
s_LANDED
ands_HOVERING
are values of one of the state variables, whereasB…