Imitation Learning¶
Here we show an example of using VISTA to learn a neural-network-based policy with the most basic variant of imitation learning (i.e., behavior cloning). Since sensor data synthesis is not required in imitation learning, we show how to extract a passive dataset upon which the data-driven simulation is built with VISTA’s interface. The high-level idea is to construct a world containing an ego-agent in VISTA and run the agent with human control command in the passive dataset without doing sensor data synthesis to avoid redundant computation.
First, we initialize VISTA world with an agent attached with a sensor (we use RGB camera here for illustration). Note that it’s often useful to implement an additional sampler since balanced training data distribution is often of great significance for supervised learning.
self._world = vista.World(self.trace_paths, self.trace_config)
self._agent = self._world.spawn_agent(self.car_config)
self._camera = self._agent.spawn_camera(self.camera_config)
self._world.reset()
self._sampler = RejectionSampler() # data sampler
Then, we can implement a data generator that runs indefinitely to produce a training dataset.
# Data generator from simulation
self._snippet_i = 0
while True:
# reset simulator
if self._agent.done or self._snippet_i >= self.snippet_size:
self._world.reset()
self._snippet_i = 0
# step simulator
sensor_name = self._camera.name
img = self._agent.observations[sensor_name] # associate action t with observation t-1
self._agent.step_dataset(step_dynamics=False)
# rejection sampling
val = self._agent.human_curvature
sampling_prob = self._sampler.get_sampling_probability(val)
if self._rng.uniform(0., 1.) > sampling_prob:
self._snippet_i += 1
continue
self._sampler.add_to_history(val)
# preprocess and produce data-label pairs
img = transform_rgb(img, self._camera, self.train)
label = np.array([self._agent.human_curvature]).astype(np.float32)
self._snippet_i += 1
yield {'camera': img, 'target': label}
The implementation is straightforward. After resetting the simulator (the pointer to the passive
dataset is randomly initialized), we step through the dataset to get the next frame by calling
agent.step_dataset
, followed by a rejection sampling to balance the steering control command
(human_curvature
). Finally, we preprocess sensor data and construct data-label pairs for training.
Note that we usually set a maximum snippet size to make sure that each snippet, a series of data
from the start (a reset) to the termination (another reset), won’t last indefinitely and training
data can have sufficient diversity. Also, to ensure the i.i.d. data distribution that is required for
stochastic gradient descent, the data stream (yield {'camera': img, 'target': label}
) is connected to a buffer with shuffling. For more details, please check examples/advanced_usage/il_rgb_dataset.py
.