Sensors¶

class vista.entities.sensors.BaseSensor.BaseSensor(attach_to: vista.entities.Entity.Entity, config: Optional[Dict] = None)[source]¶

Base class of all sensors.

Parameters

attach_to (Entity) – A car to be attached to.
config (dict) – Configuration of the sensor.

capture(timestamp: float, **kwargs) → Any[source]¶

Run sensor synthesis based on current timestamp and transformation between the novel viewpoint to be simulated and the nominal viewpoint from the pre-collected dataset.

Parameters: timestamp (float) – Timestamp that allows to retrieve a pointer to the dataset for data-driven simulation (synthesizing point cloud from real LiDAR sweep).

update_scene_object(name: str, scene_object: Any, pose: Any) → None[source]¶

Update object in the scene for rendering. This is only used when we put virtual objects in the scene.

Parameters

name (str) – Name of the scene object.
scene_object (Any) – The scene object.
pose (Any) – The pose of the scene object.

property name¶: The name of the sensor.

property id¶: The identifier of this entity.

property parent¶: The parent of this entity.

class vista.entities.sensors.MeshLib.MeshLib(root_dirs: List[str])[source]¶

Handle all meshes of actors (as opposed to scene/background itself) in the scene. It basically reads in all meshes with .obj extension, calibrates the meshes such that they are centered at the origin, and convert them to pyrender.Mesh for later usage. Note that this class is written specifically for a set of meshes (carpack01) and thus if there is a custom set of meshes, you would need to change how to read from the root directory and to calibrate the mesh.

Parameters: root_dirs (List[str]) – A list of root directories that contains meshes.
Raises: AttributeError – If there is no mesh in the directory.

In the source code, there is a main function that uses this class independently that performs rendering on meshes in pyrender.

reset(n_agents: int, random: Optional[bool] = True) → None[source]¶

Reset agents meshes by sampling n_agents meshes from the mesh library.

Parameters

n_agents (int) – Number of agents.
random (bool) – Whether to randomly sample n_agents meshes from the entire mesh library; default is set to True.

property fpaths¶: Paths to all meshes.

property tmeshes¶: A list of trimesh objects.

property n_tmeshes¶: Number of trimeshes.

property agents_meshes¶: A list of meshes for all agents.

property agents_meshes_dim¶: The dimensions (width, length) of agents’ meshes.

RGB Camera¶

class vista.entities.sensors.Camera.Camera(attach_to: vista.entities.Entity.Entity, config: Dict)[source]¶

A RGB camera sensor object that synthesizes RGB image locally around the dataset given a viewpoint (potentially different from the dataset) and timestamp.

Parameters

attach_to (Entity) – A parent object (Car) to be attached to.

config (Dict) –

Configuration of the sensor. An example (default) is,

>>> DEFAULT_CONFIG = {
    'depth_mode': 'FIXED_PLANE',
    'znear': ZNEAR,
    'zfar': ZFAR,
    'use_lighting': False,
    'directional_light_intensity': 10,
    'recoloring_factor': 0.5,
    'use_synthesizer': True,
}

Check Viewsynthesis object for more details about the configuration.

reset() → None[source]¶: Reset RGB camera sensor by initiating RGB data stream based on current reference pointer to the dataset.

capture(timestamp: float, **kwargs) → numpy.ndarray[source]¶

Synthesize RGB image based on current timestamp and transformation between the novel viewpoint to be simulated and the nominal viewpoint from the pre-collected dataset. Note that if there exists optical flow data in the trace directory, the Camera object will take the optical flow to interpolate across frame to the exact timestamp as opposed to retrieving the RGB frame with the closest timestamp in the dataset.

Parameters: timestamp (float) – Timestamp that allows to retrieve a pointer to the dataset for data-driven simulation (synthesizing RGB image from real RGB video).
Returns: A synthesized RGB image.
Return type: np.ndarray

update_scene_object(name: str, scene_object: pyrender.mesh.Mesh, pose: Any) → None[source]¶

Update pyrender mesh object in the scene for rendering.

Parameters

name (str) – Name of the scene object.
scene_object (pyrender.Mesh) – The scene object.
pose (Any) – The pose of the scene object.

property config¶: Configuration of the RGB camera sensor.

property camera_param¶: Camera parameters of the virtual camera.

property streams¶: Data stream of RGB image/video dataset to be simulated from.

property flow_streams¶: Data stream of optical flow (if any).

property flow_meta¶: Meta data of optical flow (if any).

property view_synthesis¶: View synthesizer object.

property id¶: The identifier of this entity.

property name¶: The name of the sensor.

property parent¶: The parent of this entity.

class vista.entities.sensors.camera_utils.ViewSynthesis.ViewSynthesis(camera_param: vista.entities.sensors.camera_utils.CameraParams.CameraParams, config: Dict, init_with_bg_mesh: Optional[bool] = True)[source]¶

A RGB synthesizer that simulates RGB image at novel viewpoint around a pre-collected RGB image/video dataset. Conceptually, it (1) projects a reference 2D RGB image to 3D colored mesh using camera projection matrices and (approximated) depth, (2) place virtual objects in the scene, (3) render RGB image at novel viewpoint.

Parameters

camera_param (CameraParams) – Camera parameter object of the virtual camera.
config (Dict) – Configuration of the synthesizer.
init_with_bg_mesh (bool) – Whether to initialize with background mesh; default is set to True.

synthesize(trans: numpy.ndarray, rot: numpy.ndarray, imgs: Dict[str, numpy.ndarray], depth: Optional[Dict[str, numpy.ndarray]] = None) → Tuple[numpy.ndarray, numpy.ndarray][source]¶

Synthesize RGB image at the novel viewpoint specified by trans and rot with respect to the nominal viewpoint that corresponds to a set of RGB images and depth maps.

Parameters

trans (np.ndarray) – Translation vector.
rot (np.ndarray) – Rotation vector in Euler angle.
imgs (Dict[str, np.ndarray]) – A set of images (potentially from multiple camera).
depth (Dict[str, np.ndarray]) – A set of depth maps corresponding to imgs.

Returns

Returns a tuple (array_1, array_2), where array_1 is the synthesized RGB image and array_2 is the corresponding depth image.

update_object_node(name: str, mesh: pyrender.mesh.Mesh, trans: numpy.ndarray, quat: numpy.ndarray) → None[source]¶

Update the virtual object in the scene.

Parameters

name (str) – Name of the virtual object.
mesh (pyrender.Mesh) – Mesh of the virtual object.
trans (np.ndarray) – Translation vector.
quat (np.ndarray) – Quaternion vector.

add_bg_mesh(camera_param: vista.entities.sensors.camera_utils.CameraParams.CameraParams) → None[source]¶

Add background mesh to the scene based on camera projection and the initial depth. The color of the mesh will be updated at every synthesize call and if not using ground-plane depth approximation, the geometry of the mesh will be also updated.

Parameters: camera_param (CameraParams) – Camera parameter of the virtual camera.

property bg_mesh_names¶: Names of all background meshes in the scene.

property object_nodes¶: Pyrender nodes of all virtual objects added to the scene.

property config¶: Configuration of the view synthesizer.

class vista.entities.sensors.camera_utils.CameraParams.CameraParams(rig_path: str = None, name: str = None, params: dict = None)[source]¶

The CameraParams object stores information pertaining to a single physical camera mounted on the car. It is useful for encapsulating the relevant calibration information for easy access in other modules.

Parameters

rig_path (str) – Path to RIG.xml that specifies camera parameters.
name (str) – Name of the camera identifier to initialize. Must be a valid TopicName and present inside the RIG.xml file. Can also specify None to auto grab the first named camera in the RIG.xml file.
params (dict) – Dictionary camera parameters to instantiate with. If not provided then the rig_path is used

Raises

ValueError – if name is provided but not found in the rig file.

resize(height: int, width: int) → None[source]¶

Scales the camera object and adjusts the internal parameters such that it projects images of a certain size.

Parameters

height (int) – New height of the camera images in pixels.
width (int) – New width of the camera images in pixels.

crop(i1: int, j1: int, i2: int, j2: int) → None[source]¶

Crops a camera object to a given region of interest specified by the coordinates of the top left (i1,j1) and bottom right (i2,j2) corner.

Parameters

i1 (int) – Top row of ROI.
j1 (int) – Left column of ROI.
i2 (int) – Bottom row of ROI.
j2 (int) – Right column of ROI.

get_height() → int[source]¶

Get the raw pixel height of images captured by the camera.

Returns: Height in pixels.
Return type: int

get_width() → int[source]¶

Get the raw pixel width of images captured by the camera.

Returns: Width in pixels.
Return type: int

get_K() → numpy.ndarray[source]¶

Get intrinsic calibration matrix.

Returns: Intrinsic matrix (3,3).
Return type: np.array

get_K_inv() → numpy.ndarray[source]¶

Get inverse intrinsic calibration matrix.

Returns: Inverse intrinsic matrix (3,3).
Return type: np.array

get_distortion() → numpy.ndarray[source]¶

Get the distortion coefficients of the camera.

Returns: Distortion coefficients (-1,).
Return type: np.array

get_position() → numpy.ndarray[source]¶

Get the 3D position of camera.

Returns: 3D position of camera.
Return type: np.array

get_quaternion() → numpy.ndarray[source]¶

Get the rotation in quaternion of camera.

Returns: Rotation in quaternion of camera.
Return type: np.array

get_yaw() → float[source]¶

Get the yaw of the camera relative the frame of reference.

Returns: Yaw of the camera [rads].
Return type: float

get_ground_plane() → List[float][source]¶

Get the equation of the ground plane.

The equation of the ground plane is given by: Ax + By + Cz = D and is computed from the position and orientation of the camera.

Returns: Parameterization of the ground plane: [A,B,C,D].
Return type: List[float]

get_roi(axis: Optional[str] = 'ij') → List[int][source]¶

Get the region of interest of the images captured by the camera.

Parameters

axis (str) – Axis order to return the coordinates in (default ‘ij’,
also be 'xy') (can) –

Returns

Coordinates of the ROI box.

Return type

List[int]

Raises

ValueError – If axis is not valid.

get_roi_angle() → float[source]¶

Get the angle of the region of interest.

Returns: The rotation of the ROI box.
Return type: float

get_roi_points() → List[source]¶

Get the points of the region of interest.

Returns: the list of points surrounding the ROI box
Return type: List

get_roi_dims() → Tuple[source]¶

Get the dimensions of the region of interest.

Returns: Height and width of ROI.
Return type: Tuple

3D LiDAR¶

class vista.entities.sensors.Lidar.Lidar(attach_to: vista.entities.Entity.Entity, config: Dict)[source]¶

A LiDAR sensor object that synthesizes LiDAR measurement locally around the dataset given a viewpoint (potentially different from the dataset) and timestamp.

Parameters

attach_to (Entity) – A car to be attached to.

config (dict) –

Configuration of LiDAR sensor. An example (default) is,

>>> DEFAULT_CONFIG = {
    'name': 'lidar_3d',
    'yaw_fov': None,
    'pitch_fov': None,
    'culling_r': 1,
    'use_synthesizer': True,
}

Check Lidarsynthesis object for more details about the configuration.

reset() → None[source]¶: Reset LiDAR sensor by initiating LiDAR data stream based on current reference pointer to the dataset.

capture(timestamp: float, **kwargs) → numpy.ndarray[source]¶

Synthesize LiDAR point cloud based on current timestamp and transformation between the novel viewpoint to be simulated and the nominal viewpoint from the pre-collected dataset.

Parameters: timestamp (float) – Timestamp that allows to retrieve a pointer to the dataset for data-driven simulation (synthesizing point cloud from real LiDAR sweep).
Returns: Synthesized point cloud.
Return type: np.ndarray

update_scene_object(name: str, scene_object: Any, pose: Any) → None[source]¶: Adding virtual object in LiDAR synthesis is not yet implemented.

property config¶: Configuration of the LiDAR sensor.

property streams¶: Data stream of LiDAR dataset to be simulated from.

property id¶: The identifier of this entity.

property name¶: The name of the sensor.

property parent¶: The parent of this entity.

property view_synthesis¶: Wiew synthesizer object for the first trace.

class vista.entities.sensors.lidar_utils.LidarSynthesis.LidarSynthesis(input_yaw_fov: Tuple[float, float], input_pitch_fov: Tuple[float, float], yaw_fov: Optional[Tuple[float, float]] = None, pitch_fov: Optional[Tuple[float, float]] = None, yaw_res: float = 0.1, pitch_res: float = 0.1, culling_r: int = 1, load_model: bool = True, **kwargs)[source]¶

A Lidar synthesizer that simulates point cloud from novel viewpoint around a pre-collected Lidar sweep. At a high level, it involves (1) performing rigid transformation on point cloud based on given viewpoint change (2) projecting 3D point cloud to 2D image space with angle coordinates (3) densifying the sparse 2D image (4) culling occluded region (5) masking out some points/pixels to simulate the sparse pattern of LiDAR sweep (6) reprojecting back to 3D point cloud or rays.

Parameters

input_yaw_fov (float) – Input LiDAR field of view in yaw axis; can be read from params.xml file.
input_pitch_fov (float) – Input LiDAR field of view in pitch axis; can be read from params.xml file.
yaw_fov (float) – Output LiDAR field of view in yaw axis; default is input_yaw_fov.
pitch_fov (float) – Output LiDAR field of view in pitch axis; default is input_pitch_fov.
yaw_res (float) – Resolution in yaw axis; default is 0.1.
pitch_res (float) – Resolution in pitch axis; default is 0.1.
culling_r (int) – The radius (from the origin) for culling occluded points.
load_model (bool) – Whether to load Lidar densifier model; default to True.

synthesize(trans: numpy.ndarray, rot: numpy.ndarray, pcd: numpy.ndarray) → Tuple[vista.entities.sensors.lidar_utils.Pointcloud.Pointcloud, numpy.ndarray][source]¶

Apply rigid transformation to a dense pointcloud and return new dense representation or sparse pointcloud.

Parameters

trans (np.ndarray) – Translation vector.
rot (np.ndarray) – Rotation matrix.
pcd (np.ndarray) – Point cloud.

Returns

Returns a tuple (pointcloud, array), where pointcloud is the synthesized point cloud with view point change from the transform (trans, rot), and array is the dense depth map in 2D image space.

class vista.entities.sensors.lidar_utils.Pointcloud.Point(value)[source]¶: Point feature, including x, y, z, intensity, depth, and mask.

class vista.entities.sensors.lidar_utils.Pointcloud.Pointcloud(xyz: Union[torch.Tensor, numpy.ndarray], intensity: Optional[Union[torch.Tensor, numpy.ndarray]] = None)[source]¶

A helper class that allow handling point cloud more easily with functionality like transforming point cloud and extracting features/properties from point cloud. Pointcloud can be built from either numpy.ndarray or torch.Tensor data. Methods will maintain the same data type.

Parameters

xyz (tensor) – x, y, z position of the point cloud. shape of (N,3)
intensity (tensor) – Intensity of the point cloud. shape (N,)

transform(R: Optional[Union[torch.Tensor, numpy.ndarray]] = None, trans: Optional[Union[torch.Tensor, numpy.ndarray]] = None)[source]¶

Transform the point cloud.

Parameters

R (tensor) – Rotation matrix with shape (3,3).
trans (tensor) – Translation vector with length 3.

Raises

AssertionError – Invalid rotation matrix (3,3) or translation (3,)

get(feature: vista.entities.sensors.lidar_utils.Pointcloud.Point) → numpy.ndarray[source]¶

Get feature (x, y, z, intensity, depth, mask) of the point cloud.

Parameters: feature (Point) – Feature to extract from the point cloud.
Returns: Point feature.
Return type: np.ndarray
Raises: ValueError – Unrecognized Point feature.

numpy()[source]¶: Returns a copy of the torch pointcloud built using numpy. If the pointcloud is already in numpy format, then a copy is returned.

property num_points¶: Number of points.

property x¶: The x component of all points.

property y¶: The y component of all points.

property z¶: The z component of all points.

property xyz¶: xyz of all points.

property intensity¶: The intensity of all points.

property dist¶: Distance to the origin of all points.

property yaw¶: Yaw angle (radians) of each point in the cloud.

property pitch¶: Pitch angle (radians) of each point in the cloud.

Event Camera¶

class vista.entities.sensors.EventCamera.EventCamera(attach_to: vista.entities.Entity.Entity, config: Dict)[source]¶

A event camera sensor object that synthesizes event data locally around the RGB dataset given a viewpoint (potentially different from the dataset) and timestamp with video interpolation and an event emission model.

Parameters

attach_to (Entity) – A parent object (car) to be attached to.

config (Dict) –

Configuration of the sensor. An example (default) config is,

>>> DEFAULT_CONFIG = {
    'rig_path': None,
    # Event camera
    'name': 'event_camera_front',
    'original_size': (480, 640),
    'size': (240, 320),
    'optical_flow_root': '../data_prep/Super-SloMo',
    'checkpoint': '../data_prep/Super-SloMo/ckpt/SuperSloMo.ckpt',
    'lambda_flow': 0.5,
    'max_sf': 16,
    'use_gpu': True,
    'positive_threshold': 0.1,
    'sigma_positive_threshold': 0.02,
    'negative_threshold': -0.1,
    'sigma_negative_threshold': 0.02,
    'reproject_pixel': False,
    'subsampling_ratio': 0.5,
    # RGB rendering
    'base_camera_name': 'camera_front',
    'base_size': (600, 960),
    'depth_mode': 'FIXED_PLANE',
    'use_lighting': False,
}

Note that event camera simulation requires third-party dependence and pretrained checkpoint for video interpolation.

reset() → None[source]¶: Reset Event camera sensor by initiating RGB data stream (as it’s simulated using RGB data) based on current reference pointer to the dataset.

capture(timestamp: float, update_rgb_frame_only: Optional[bool] = False) → numpy.ndarray[source]¶

Synthesize event data based on current timestamp and transformation between the novel viewpoint to be simulated and the nominal viewpoint from the pre-collected RGB dataset. In a very high level, it basically performs video interpolation across consecutive RGB frames, extract events with an event emission model, projects the events to the event camera space. Note that the simulation is running a deep network (SuperSloMo) for video interpolation.

Parameters: timestamp (float) – Timestamp that allows to retrieve a pointer to the dataset for data-driven simulation (synthesizing RGB image from real RGB video).
Returns: A synthesized event data.
Return type: np.ndarray

property config¶: Configuration of this sensor.

property streams¶: Data stream of RGB image/video dataset to be simulated from.

property camera_param¶: Camera parameters of the virtual event camera.

property base_camera_param¶: Camera parameters of the RGB camera.

property view_synthesis¶: View synthesizer object.

property id¶: The identifier of this entity.

property name¶: The name of the sensor.

property parent¶: The parent of this entity.

property prev_frame¶: Previous RGB frame.

update_scene_object(name: str, scene_object: Any, pose: Any) → None¶

Update object in the scene for rendering. This is only used when we put virtual objects in the scene.

Parameters

name (str) – Name of the scene object.
scene_object (Any) – The scene object.
pose (Any) – The pose of the scene object.

property prev_timestamp¶: Previous timestamp.