Difference between revisions of "Markov decision process"
Line 5: | Line 5: | ||
===Terminology=== | ===Terminology=== | ||
− | '''Agent:''' an agent is the entity which we are training to make correct decisions (we teach a robot how to move | + | '''Agent:''' an agent is the entity which we are training to make correct decisions (we teach a robot how to move around the house without crashing). |
'''Enviroment:''' is the sorrounding with which the agent interacts (a house), the agent cannot manipulate its sorroundings, it cannot only control its own actions (a robot cannot move a table in the house, it can walk around it in order to avoid crashing). | '''Enviroment:''' is the sorrounding with which the agent interacts (a house), the agent cannot manipulate its sorroundings, it cannot only control its own actions (a robot cannot move a table in the house, it can walk around it in order to avoid crashing). | ||
Line 16: | Line 16: | ||
===Markov Property=== | ===Markov Property=== | ||
+ | Markov property says that current state of the agent (for example a Robot) depends solely on the previous state and doesn't depend in any way on states the agent was in prior the previous state. | ||
+ | |||
+ | ===Markov Process/Markov Chain=== | ||
+ | <math>Pss' = P[S_t+1 = s' | S_t = s]</math> |
Revision as of 22:57, 1 January 2021
Contents
Introduction
Markov decision process is a mathematical framework used for modeling decision-making problems when the outcomes are partly random and partly controllable.
Terminology
Agent: an agent is the entity which we are training to make correct decisions (we teach a robot how to move around the house without crashing).
Enviroment: is the sorrounding with which the agent interacts (a house), the agent cannot manipulate its sorroundings, it cannot only control its own actions (a robot cannot move a table in the house, it can walk around it in order to avoid crashing).
State: the state defines the current situation of the agent (the robot can be in particular room of the house, or in a particular posture, states depend on a point of view).
Action: the choice that the agent makes at the current step (move left, right, stand up, bend over etc.). We know all possible options for actions in advance.
Characteristics
Markov Property
Markov property says that current state of the agent (for example a Robot) depends solely on the previous state and doesn't depend in any way on states the agent was in prior the previous state.
Markov Process/Markov Chain
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle Pss' = P[S_t+1 = s' | S_t = s]}