ECE276A β€” Sensing & Estimation in Robotics

a Robotics project

A series of three projects from ECE 276A: Sensing & Estimation in Robotics at UC San Diego, building up from IMU-only state estimation to a full visual-inertial SLAM system.


Project 1 β€” Orientation Tracking

πŸ“„ Full Report (PDF)

IMU-based 3D orientation estimation formulated as a batch optimization problem on the quaternion manifold, followed by panoramic image generation from a monocular camera.

Pipeline:

  1. IMU Calibration β€” Gyroscope and accelerometer biases are estimated from a static interval (detected via VICON ground truth) and aggregated across all datasets for robustness.
  2. Orientation Prediction β€” The bias-corrected gyroscope is integrated via discrete-time quaternion kinematics:
\[\mathbf{q}_{t+1} = \mathbf{q}_t \circ \exp\!\left([0,\; \tfrac{\tau_t \boldsymbol{\omega}_t}{2}]\right)\]
  1. Batch Optimization β€” A cost function combining a motion-consistency term (quaternion log residual) and an accelerometer gravity-alignment observation term is minimized with projected gradient descent using PyTorch autograd. After each gradient step, quaternions are projected back onto the unit-norm constraint.
  2. Panorama Generation β€” Each camera frame is associated with the nearest IMU orientation by timestamp, pixels are mapped to direction vectors via a spherical projection model, and accumulated onto a global panorama canvas.

Key result: Optimization substantially reduces roll and pitch error. Yaw remains unobservable from accelerometer measurements alone (gravity provides no constraint about the vertical axis), which is reflected in residual panorama misalignment.

Roll-pitch-yaw absolute error distribution before (left) and after (right) batch optimization. Roll and pitch improve significantly; yaw is unobservable from gravity alone.
Panoramas constructed from optimized IMU orientations (left) vs. VICON ground truth (right).

Project 2 β€” LiDAR-Based SLAM

πŸ“„ Full Report (PDF)

A full 2D SLAM pipeline for a differential-drive robot equipped with wheel encoders, an IMU, a Hokuyo LiDAR, and a Kinect RGBD camera.

Pipeline:

  1. Encoder + IMU Odometry β€” Wheel encoder counts are converted to per-side displacements; IMU yaw rate is interpolated onto encoder timestamps. A midpoint integration scheme propagates the SE(2) pose.
  2. ICP Scan Matching β€” Consecutive LiDAR frames are aligned using the Kabsch (SVD) algorithm initialized from odometry. Two complementary outlier rejection mechanisms are introduced:
    • Trimmed ICP β€” retains only the closest p% of correspondences per iteration.
    • Max-distance guard β€” unconditionally rejects pairs beyond a hard distance threshold before the percentile trim.
  3. Occupancy Grid Mapping β€” A 0.05 m/cell log-odds grid is updated using Bresenham ray tracing for each LiDAR scan.
  4. Floor Texture Mapping β€” Kinect depth frames are converted to 3D points, projected to world frame via a rigid transform chain, and floor points ( z < 0.2 m) are averaged into a color grid.
  5. Pose Graph Optimization (GTSAM) β€” A factor graph is built with odometry factors and loop closure factors (fixed-interval + proximity-based). All loop closure edges use a Huber robust kernel. The graph is optimized with Levenberg–Marquardt.

Key result: The combined Trim 90 + MaxDist 1.0 m ICP configuration achieves +29% accepted loop closures on Dataset 20 and +27% on Dataset 21 compared to classic ICP. Pose graph optimization achieves a 76% / 72% error reduction on the two datasets, producing sharp occupancy maps with correct room and corridor closure.

Left: scan-matched (blue) vs. GTSAM-optimized (red) trajectories with loop closure edges (green). Right: occupancy maps before and after pose graph optimization β€” sharper walls and closed corridors after optimization.
Final occupancy map (left) and floor texture map from Kinect RGBD (right), both built from GTSAM-optimized trajectories.

Project 3 β€” Visual-Inertial SLAM

πŸ“„ Full Report (PDF)

An incremental EKF-based visual-inertial SLAM pipeline fusing IMU body-frame velocities with stereo camera feature observations. Implemented in four progressive stages.

Pipeline:

  1. Task 1 β€” IMU EKF Prediction β€” SE(3) poses are propagated via the matrix exponential on se(3). The 6Γ—6 covariance is tracked using the adjoint-based state transition matrix.
  2. Task 2 β€” Feature Tracking (extra credit) β€” For dataset 02 (raw stereo images), a custom tracker uses Shi-Tomasi corner detection with grid-based balancing, Lucas-Kanade pyramidal optical flow with forward-backward consistency, and epipolar-constrained stereo matching. A critical fix: replacing a slot-pool ID scheme with a monotonic counter prevented different physical landmarks from sharing the same tensor column, which had been causing EKF divergence.
  3. Task 3 β€” Landmark Mapping β€” Each 3D landmark is maintained with an independent 3Γ—3 covariance. A key improvement was adding landmark process noise (Q_lm = 0.01Β·I₃) to prevent covariance collapse β€” without it, Kalman gains shrank to near-zero and the filter effectively stopped updating. Successful EKF updates increased from ~5,000 to ~65,000–155,000.
  4. Task 4 β€” Visual-Inertial SLAM β€” Per timestep: (a) IMU prediction, (b) batch pose update in information form using well-converged landmarks (β‰₯6 observations, trace(P) < 15), (c) landmark initialization and EKF update using the corrected pose. Conservative tuning (high pose noise Οƒ = 25 px, innovation filter at 15 px, hard correction cap ‖δξ‖ ≀ 0.15) prevents overcorrection while still providing meaningful drift reduction.

Key insight: Phase reordering β€” performing the pose correction before landmark updates β€” significantly improved landmark accuracy because stereo triangulation is highly sensitive to pose errors.

Left: IMU-only trajectory (open-loop drift). Center: landmark map with poses fixed at IMU predictions. Right: full visual-inertial SLAM β€” blue is IMU prediction, green is SLAM-corrected trajectory, black dots are the landmark map.