Sensors

class vista.entities.sensors.BaseSensor.BaseSensor(attach_to: vista.entities.Entity.Entity, config: Optional[Dict] = None)[source]

Base class of all sensors.

Parameters
  • attach_to (Entity) – A car to be attached to.

  • config (dict) – Configuration of the sensor.

capture(timestamp: float, **kwargs) → Any[source]

Run sensor synthesis based on current timestamp and transformation between the novel viewpoint to be simulated and the nominal viewpoint from the pre-collected dataset.

Parameters

timestamp (float) – Timestamp that allows to retrieve a pointer to the dataset for data-driven simulation (synthesizing point cloud from real LiDAR sweep).

update_scene_object(name: str, scene_object: Any, pose: Any) → None[source]

Update object in the scene for rendering. This is only used when we put virtual objects in the scene.

Parameters
  • name (str) – Name of the scene object.

  • scene_object (Any) – The scene object.

  • pose (Any) – The pose of the scene object.

property name

The name of the sensor.

property id

The identifier of this entity.

property parent

The parent of this entity.

class vista.entities.sensors.MeshLib.MeshLib(root_dirs: List[str])[source]

Handle all meshes of actors (as opposed to scene/background itself) in the scene. It basically reads in all meshes with .obj extension, calibrates the meshes such that they are centered at the origin, and convert them to pyrender.Mesh for later usage. Note that this class is written specifically for a set of meshes (carpack01) and thus if there is a custom set of meshes, you would need to change how to read from the root directory and to calibrate the mesh.

Parameters

root_dirs (List[str]) – A list of root directories that contains meshes.

Raises

AttributeError – If there is no mesh in the directory.

In the source code, there is a main function that uses this class independently that performs rendering on meshes in pyrender.

reset(n_agents: int, random: Optional[bool] = True) → None[source]

Reset agents meshes by sampling n_agents meshes from the mesh library.

Parameters
  • n_agents (int) – Number of agents.

  • random (bool) – Whether to randomly sample n_agents meshes from the entire mesh library; default is set to True.

property fpaths

Paths to all meshes.

property tmeshes

A list of trimesh objects.

property n_tmeshes

Number of trimeshes.

property agents_meshes

A list of meshes for all agents.

property agents_meshes_dim

The dimensions (width, length) of agents’ meshes.

RGB Camera

class vista.entities.sensors.Camera.Camera(attach_to: vista.entities.Entity.Entity, config: Dict)[source]

A RGB camera sensor object that synthesizes RGB image locally around the dataset given a viewpoint (potentially different from the dataset) and timestamp.

Parameters
  • attach_to (Entity) – A parent object (Car) to be attached to.

  • config (Dict) –

    Configuration of the sensor. An example (default) is,

    >>> DEFAULT_CONFIG = {
        'depth_mode': 'FIXED_PLANE',
        'znear': ZNEAR,
        'zfar': ZFAR,
        'use_lighting': False,
        'directional_light_intensity': 10,
        'recoloring_factor': 0.5,
        'use_synthesizer': True,
    }
    

    Check Viewsynthesis object for more details about the configuration.

reset() → None[source]

Reset RGB camera sensor by initiating RGB data stream based on current reference pointer to the dataset.

capture(timestamp: float, **kwargs) → numpy.ndarray[source]

Synthesize RGB image based on current timestamp and transformation between the novel viewpoint to be simulated and the nominal viewpoint from the pre-collected dataset. Note that if there exists optical flow data in the trace directory, the Camera object will take the optical flow to interpolate across frame to the exact timestamp as opposed to retrieving the RGB frame with the closest timestamp in the dataset.

Parameters

timestamp (float) – Timestamp that allows to retrieve a pointer to the dataset for data-driven simulation (synthesizing RGB image from real RGB video).

Returns

A synthesized RGB image.

Return type

np.ndarray

update_scene_object(name: str, scene_object: pyrender.mesh.Mesh, pose: Any) → None[source]

Update pyrender mesh object in the scene for rendering.

Parameters
  • name (str) – Name of the scene object.

  • scene_object (pyrender.Mesh) – The scene object.

  • pose (Any) – The pose of the scene object.

property config

Configuration of the RGB camera sensor.

property camera_param

Camera parameters of the virtual camera.

property streams

Data stream of RGB image/video dataset to be simulated from.

property flow_streams

Data stream of optical flow (if any).

property flow_meta

Meta data of optical flow (if any).

property view_synthesis

View synthesizer object.

property id

The identifier of this entity.

property name

The name of the sensor.

property parent

The parent of this entity.

class vista.entities.sensors.camera_utils.ViewSynthesis.ViewSynthesis(camera_param: vista.entities.sensors.camera_utils.CameraParams.CameraParams, config: Dict, init_with_bg_mesh: Optional[bool] = True)[source]

A RGB synthesizer that simulates RGB image at novel viewpoint around a pre-collected RGB image/video dataset. Conceptually, it (1) projects a reference 2D RGB image to 3D colored mesh using camera projection matrices and (approximated) depth, (2) place virtual objects in the scene, (3) render RGB image at novel viewpoint.

Parameters
  • camera_param (CameraParams) – Camera parameter object of the virtual camera.

  • config (Dict) – Configuration of the synthesizer.

  • init_with_bg_mesh (bool) – Whether to initialize with background mesh; default is set to True.

synthesize(trans: numpy.ndarray, rot: numpy.ndarray, imgs: Dict[str, numpy.ndarray], depth: Optional[Dict[str, numpy.ndarray]] = None) → Tuple[numpy.ndarray, numpy.ndarray][source]

Synthesize RGB image at the novel viewpoint specified by trans and rot with respect to the nominal viewpoint that corresponds to a set of RGB images and depth maps.

Parameters
  • trans (np.ndarray) – Translation vector.

  • rot (np.ndarray) – Rotation vector in Euler angle.

  • imgs (Dict[str, np.ndarray]) – A set of images (potentially from multiple camera).

  • depth (Dict[str, np.ndarray]) – A set of depth maps corresponding to imgs.

Returns

Returns a tuple (array_1, array_2), where array_1 is the synthesized RGB image and array_2 is the corresponding depth image.

update_object_node(name: str, mesh: pyrender.mesh.Mesh, trans: numpy.ndarray, quat: numpy.ndarray) → None[source]

Update the virtual object in the scene.

Parameters
  • name (str) – Name of the virtual object.

  • mesh (pyrender.Mesh) – Mesh of the virtual object.

  • trans (np.ndarray) – Translation vector.

  • quat (np.ndarray) – Quaternion vector.

add_bg_mesh(camera_param: vista.entities.sensors.camera_utils.CameraParams.CameraParams) → None[source]

Add background mesh to the scene based on camera projection and the initial depth. The color of the mesh will be updated at every synthesize call and if not using ground-plane depth approximation, the geometry of the mesh will be also updated.

Parameters

camera_param (CameraParams) – Camera parameter of the virtual camera.

property bg_mesh_names

Names of all background meshes in the scene.

property object_nodes

Pyrender nodes of all virtual objects added to the scene.

property config

Configuration of the view synthesizer.

class vista.entities.sensors.camera_utils.CameraParams.CameraParams(rig_path: str = None, name: str = None, params: dict = None)[source]

The CameraParams object stores information pertaining to a single physical camera mounted on the car. It is useful for encapsulating the relevant calibration information for easy access in other modules.

Parameters
  • rig_path (str) – Path to RIG.xml that specifies camera parameters.

  • name (str) – Name of the camera identifier to initialize. Must be a valid TopicName and present inside the RIG.xml file. Can also specify None to auto grab the first named camera in the RIG.xml file.

  • params (dict) – Dictionary camera parameters to instantiate with. If not provided then the rig_path is used

Raises

ValueError – if name is provided but not found in the rig file.

resize(height: int, width: int) → None[source]

Scales the camera object and adjusts the internal parameters such that it projects images of a certain size.

Parameters
  • height (int) – New height of the camera images in pixels.

  • width (int) – New width of the camera images in pixels.

crop(i1: int, j1: int, i2: int, j2: int) → None[source]

Crops a camera object to a given region of interest specified by the coordinates of the top left (i1,j1) and bottom right (i2,j2) corner.

Parameters
  • i1 (int) – Top row of ROI.

  • j1 (int) – Left column of ROI.

  • i2 (int) – Bottom row of ROI.

  • j2 (int) – Right column of ROI.

get_height() → int[source]

Get the raw pixel height of images captured by the camera.

Returns

Height in pixels.

Return type

int

get_width() → int[source]

Get the raw pixel width of images captured by the camera.

Returns

Width in pixels.

Return type

int

get_K() → numpy.ndarray[source]

Get intrinsic calibration matrix.

Returns

Intrinsic matrix (3,3).

Return type

np.array

get_K_inv() → numpy.ndarray[source]

Get inverse intrinsic calibration matrix.

Returns

Inverse intrinsic matrix (3,3).

Return type

np.array

get_distortion() → numpy.ndarray[source]

Get the distortion coefficients of the camera.

Returns

Distortion coefficients (-1,).

Return type

np.array

get_position() → numpy.ndarray[source]

Get the 3D position of camera.

Returns

3D position of camera.

Return type

np.array

get_quaternion() → numpy.ndarray[source]

Get the rotation in quaternion of camera.

Returns

Rotation in quaternion of camera.

Return type

np.array

get_yaw() → float[source]

Get the yaw of the camera relative the frame of reference.

Returns

Yaw of the camera [rads].

Return type

float

get_ground_plane() → List[float][source]

Get the equation of the ground plane.

The equation of the ground plane is given by: Ax + By + Cz = D and is computed from the position and orientation of the camera.

Returns

Parameterization of the ground plane: [A,B,C,D].

Return type

List[float]

get_roi(axis: Optional[str] = 'ij') → List[int][source]

Get the region of interest of the images captured by the camera.

Parameters
  • axis (str) – Axis order to return the coordinates in (default ‘ij’,

  • also be 'xy') (can) –

Returns

Coordinates of the ROI box.

Return type

List[int]

Raises

ValueError – If axis is not valid.

get_roi_angle() → float[source]

Get the angle of the region of interest.

Returns

The rotation of the ROI box.

Return type

float

get_roi_points() → List[source]

Get the points of the region of interest.

Returns

the list of points surrounding the ROI box

Return type

List

get_roi_dims() → Tuple[source]

Get the dimensions of the region of interest.

Returns

Height and width of ROI.

Return type

Tuple

3D LiDAR

class vista.entities.sensors.Lidar.Lidar(attach_to: vista.entities.Entity.Entity, config: Dict)[source]

A LiDAR sensor object that synthesizes LiDAR measurement locally around the dataset given a viewpoint (potentially different from the dataset) and timestamp.

Parameters
  • attach_to (Entity) – A car to be attached to.

  • config (dict) –

    Configuration of LiDAR sensor. An example (default) is,

    >>> DEFAULT_CONFIG = {
        'name': 'lidar_3d',
        'yaw_fov': None,
        'pitch_fov': None,
        'culling_r': 1,
        'use_synthesizer': True,
    }
    

    Check Lidarsynthesis object for more details about the configuration.

reset() → None[source]

Reset LiDAR sensor by initiating LiDAR data stream based on current reference pointer to the dataset.

capture(timestamp: float, **kwargs) → numpy.ndarray[source]

Synthesize LiDAR point cloud based on current timestamp and transformation between the novel viewpoint to be simulated and the nominal viewpoint from the pre-collected dataset.

Parameters

timestamp (float) – Timestamp that allows to retrieve a pointer to the dataset for data-driven simulation (synthesizing point cloud from real LiDAR sweep).

Returns

Synthesized point cloud.

Return type

np.ndarray

update_scene_object(name: str, scene_object: Any, pose: Any) → None[source]

Adding virtual object in LiDAR synthesis is not yet implemented.

property config

Configuration of the LiDAR sensor.

property streams

Data stream of LiDAR dataset to be simulated from.

property id

The identifier of this entity.

property name

The name of the sensor.

property parent

The parent of this entity.

property view_synthesis

Wiew synthesizer object for the first trace.

class vista.entities.sensors.lidar_utils.LidarSynthesis.LidarSynthesis(input_yaw_fov: Tuple[float, float], input_pitch_fov: Tuple[float, float], yaw_fov: Optional[Tuple[float, float]] = None, pitch_fov: Optional[Tuple[float, float]] = None, yaw_res: float = 0.1, pitch_res: float = 0.1, culling_r: int = 1, load_model: bool = True, **kwargs)[source]

A Lidar synthesizer that simulates point cloud from novel viewpoint around a pre-collected Lidar sweep. At a high level, it involves (1) performing rigid transformation on point cloud based on given viewpoint change (2) projecting 3D point cloud to 2D image space with angle coordinates (3) densifying the sparse 2D image (4) culling occluded region (5) masking out some points/pixels to simulate the sparse pattern of LiDAR sweep (6) reprojecting back to 3D point cloud or rays.

Parameters
  • input_yaw_fov (float) – Input LiDAR field of view in yaw axis; can be read from params.xml file.

  • input_pitch_fov (float) – Input LiDAR field of view in pitch axis; can be read from params.xml file.

  • yaw_fov (float) – Output LiDAR field of view in yaw axis; default is input_yaw_fov.

  • pitch_fov (float) – Output LiDAR field of view in pitch axis; default is input_pitch_fov.

  • yaw_res (float) – Resolution in yaw axis; default is 0.1.

  • pitch_res (float) – Resolution in pitch axis; default is 0.1.

  • culling_r (int) – The radius (from the origin) for culling occluded points.

  • load_model (bool) – Whether to load Lidar densifier model; default to True.

synthesize(trans: numpy.ndarray, rot: numpy.ndarray, pcd: numpy.ndarray) → Tuple[vista.entities.sensors.lidar_utils.Pointcloud.Pointcloud, numpy.ndarray][source]

Apply rigid transformation to a dense pointcloud and return new dense representation or sparse pointcloud.

Parameters
  • trans (np.ndarray) – Translation vector.

  • rot (np.ndarray) – Rotation matrix.

  • pcd (np.ndarray) – Point cloud.

Returns

Returns a tuple (pointcloud, array), where pointcloud is the synthesized point cloud with view point change from the transform (trans, rot), and array is the dense depth map in 2D image space.

class vista.entities.sensors.lidar_utils.Pointcloud.Point(value)[source]

Point feature, including x, y, z, intensity, depth, and mask.

class vista.entities.sensors.lidar_utils.Pointcloud.Pointcloud(xyz: Union[torch.Tensor, numpy.ndarray], intensity: Optional[Union[torch.Tensor, numpy.ndarray]] = None)[source]

A helper class that allow handling point cloud more easily with functionality like transforming point cloud and extracting features/properties from point cloud. Pointcloud can be built from either numpy.ndarray or torch.Tensor data. Methods will maintain the same data type.

Parameters
  • xyz (tensor) – x, y, z position of the point cloud. shape of (N,3)

  • intensity (tensor) – Intensity of the point cloud. shape (N,)

transform(R: Optional[Union[torch.Tensor, numpy.ndarray]] = None, trans: Optional[Union[torch.Tensor, numpy.ndarray]] = None)[source]

Transform the point cloud.

Parameters
  • R (tensor) – Rotation matrix with shape (3,3).

  • trans (tensor) – Translation vector with length 3.

Raises

AssertionError – Invalid rotation matrix (3,3) or translation (3,)

get(feature: vista.entities.sensors.lidar_utils.Pointcloud.Point) → numpy.ndarray[source]

Get feature (x, y, z, intensity, depth, mask) of the point cloud.

Parameters

feature (Point) – Feature to extract from the point cloud.

Returns

Point feature.

Return type

np.ndarray

Raises

ValueError – Unrecognized Point feature.

numpy()[source]

Returns a copy of the torch pointcloud built using numpy. If the pointcloud is already in numpy format, then a copy is returned.

property num_points

Number of points.

property x

The x component of all points.

property y

The y component of all points.

property z

The z component of all points.

property xyz

xyz of all points.

property intensity

The intensity of all points.

property dist

Distance to the origin of all points.

property yaw

Yaw angle (radians) of each point in the cloud.

property pitch

Pitch angle (radians) of each point in the cloud.

Event Camera

class vista.entities.sensors.EventCamera.EventCamera(attach_to: vista.entities.Entity.Entity, config: Dict)[source]

A event camera sensor object that synthesizes event data locally around the RGB dataset given a viewpoint (potentially different from the dataset) and timestamp with video interpolation and an event emission model.

Parameters
  • attach_to (Entity) – A parent object (car) to be attached to.

  • config (Dict) –

    Configuration of the sensor. An example (default) config is,

    >>> DEFAULT_CONFIG = {
        'rig_path': None,
        # Event camera
        'name': 'event_camera_front',
        'original_size': (480, 640),
        'size': (240, 320),
        'optical_flow_root': '../data_prep/Super-SloMo',
        'checkpoint': '../data_prep/Super-SloMo/ckpt/SuperSloMo.ckpt',
        'lambda_flow': 0.5,
        'max_sf': 16,
        'use_gpu': True,
        'positive_threshold': 0.1,
        'sigma_positive_threshold': 0.02,
        'negative_threshold': -0.1,
        'sigma_negative_threshold': 0.02,
        'reproject_pixel': False,
        'subsampling_ratio': 0.5,
        # RGB rendering
        'base_camera_name': 'camera_front',
        'base_size': (600, 960),
        'depth_mode': 'FIXED_PLANE',
        'use_lighting': False,
    }
    

Note that event camera simulation requires third-party dependence and pretrained checkpoint for video interpolation.

reset() → None[source]

Reset Event camera sensor by initiating RGB data stream (as it’s simulated using RGB data) based on current reference pointer to the dataset.

capture(timestamp: float, update_rgb_frame_only: Optional[bool] = False) → numpy.ndarray[source]

Synthesize event data based on current timestamp and transformation between the novel viewpoint to be simulated and the nominal viewpoint from the pre-collected RGB dataset. In a very high level, it basically performs video interpolation across consecutive RGB frames, extract events with an event emission model, projects the events to the event camera space. Note that the simulation is running a deep network (SuperSloMo) for video interpolation.

Parameters

timestamp (float) – Timestamp that allows to retrieve a pointer to the dataset for data-driven simulation (synthesizing RGB image from real RGB video).

Returns

A synthesized event data.

Return type

np.ndarray

property config

Configuration of this sensor.

property streams

Data stream of RGB image/video dataset to be simulated from.

property camera_param

Camera parameters of the virtual event camera.

property base_camera_param

Camera parameters of the RGB camera.

property view_synthesis

View synthesizer object.

property id

The identifier of this entity.

property name

The name of the sensor.

property parent

The parent of this entity.

property prev_frame

Previous RGB frame.

update_scene_object(name: str, scene_object: Any, pose: Any) → None

Update object in the scene for rendering. This is only used when we put virtual objects in the scene.

Parameters
  • name (str) – Name of the scene object.

  • scene_object (Any) – The scene object.

  • pose (Any) – The pose of the scene object.

property prev_timestamp

Previous timestamp.