Tasks¶
This module demonstrates how to build environments of various tasks with Vista.
We follow roughly OpenAI gym interface for reinforcement learning setting here (with
member functions of the environments like reset
, step
and return values like
observation
, reward
, done
, info
).
-
class
vista.tasks.lane_following.
LaneFollowing
(trace_paths: List[str], trace_config: Dict, car_config: Dict, sensors_configs: Optional[List[Dict]] = [], task_config: Optional[Dict] = {}, logging_level: Optional[str] = 'WARNING')[source]¶ This class defines a simple lane following task in Vista. It basically handles vehicle state update of the ego car, rendering of specified sensors, and determing reward and terminal condition. The default terminal condition is set to (1) being out of lane (2) exceed maximal roation (3) reaching the end of the trace.
- Parameters
trace_paths (List[str]) – A list of trace paths.
trace_config (Dict) – Configuration of the trace.
car_configs (List[Dict]) – Configuration of
every
cars.sensors_configs (List[Dict]) – Configuration of sensors on
every
cars.task_config (Dict) – Configuration of the task, which specifies reward function and terminal condition. For more details, please check the source code.
logging_level (str) – Logging level (
DEBUG
,INFO
,WARNING
,ERROR
,CRITICAL
); default set toWARNING
.
-
reset
()[source]¶ Reset the environment. This basically reset the
World
object in Vista, which reset all (the only) agent in the world.- Returns
A dictionary with keys as agent IDs and values as observation for each agent, which is also a dictionary with keys as sensor IDs and values as sensory measurements.
- Return type
Dict
-
step
(action, dt=0.03333333333333333)[source]¶ Step the environment. This involves updating agent’s state based on the given actions and determining reward and termination.
- Parameters
actions (Dict[str, np.ndarray]) – A dictionary with keys as agent IDs and values as actions to be executed to interact with the environment and other agents.
dt (float) – Elapsed time in second; default set to 1/30.
- Returns
Return a tuple (
dict_a
,dict_b
,dict_c
,dict_d
), wheredict_a
is the observation,dict_b
is the reward,dict_c
is whether the episode terminates,dict_d
is additional informations for every agents; keys of every dictionary are agent IDs.
-
property
config
¶ Configuration of this task.
-
property
world
¶ World
of this task.
-
property
seed
¶ Random seed for the task and the associated
World
.
-
class
vista.tasks.multi_agent_base.
MultiAgentBase
(trace_paths: List[str], trace_config: Dict, car_configs: List[Dict], sensors_configs: List[List[Dict]], task_config: Optional[Dict] = {}, logging_level: Optional[str] = 'WARNING')[source]¶ This class builds a simple environment with multiple cars in the scene, which involves randomly initializing ado cars in the front of the ego car, checking collision between cars, handling meshes for all virtual agents, and determining terminal condition.
- Parameters
trace_paths (List[str]) – A list of trace paths.
trace_config (Dict) – Configuration of the trace.
car_configs (List[Dict]) – Configuration of
every
cars.sensors_configs (List[Dict]) – Configuration of sensors on
every
cars.task_config (Dict) –
Configuration of the task. An example (default) is,
>>> DEFAULT_CONFIG = { 'n_agents': 1, 'mesh_dir': None, 'overlap_threshold': 0.05, 'max_resample_tries': 10, 'init_dist_range': [5., 10.], 'init_lat_noise_range': [-1., 1.], 'init_yaw_noise_range': [-0.0, 0.0], 'reward_fn': default_reward_fn, 'terminal_condition': default_terminal_condition }
Note that both
reward_fn
andterminal_condition
have function signature asf(task, agent_id, **kwargs) -> (value, dict)
. For more details, please check the source code.logging_level (str) – Logging level (
DEBUG
,INFO
,WARNING
,ERROR
,CRITICAL
); default set toWARNING
.
-
reset
() → Dict[source]¶ Reset the environment. This involves regular world reset, randomly initializing ado agent in the front of the ego agent, and resetting the mesh library for all virtual agents.
- Returns
A dictionary with keys as agent IDs and values as observation for each agent, which is also a dictionary with keys as sensor IDs and values as sensory measurements.
- Return type
Dict
-
step
(actions, dt=0.03333333333333333)[source]¶ Step the environment. This includes updating agents’ states, synthesizing agents’ observations, checking terminal conditions, and computing rewards.
- Parameters
actions (Dict[str, np.ndarray]) – A dictionary with keys as agent IDs and values as actions to be executed to interact with the environment and other agents.
dt (float) – Elapsed time in second; default set to 1/30.
- Returns
Return a tuple (
dict_a
,dict_b
,dict_c
,dict_d
), wheredict_a
is the observation,dict_b
is the reward,dict_c
is whether the episode terminates,dict_d
is additional informations for every agents; keys of every dictionary are agent IDs.
-
property
config
¶ Configuration of this task.
-
property
ego_agent
¶ Ego agent.
-
property
world
¶ World
of this task.
-
property
seed
¶ Random seed for the task and the associated
World
.