Home

Vision‑Guided Robot Arm Control with Intel RealSense

This project turns the Interbotix ReactorX‑200 into a self‑reliant pick–place system. A calibrated RealSense depth camera estimates block poses; a dual‑path inverse kinematics (IK) stack computes feasible grasps; and a ROS 2 execution loop carries out smooth, repeatable motion. The design goal wasn’t merely “move once,” but operate continuously under ordinary lab messiness—lighting changes, minor camera bumps, and imperfect block placement—without human corrections.

Video Demo: Click to Grasp & Place

Video Demo: Teach & Play

System Overview

The software stack has three layers:
(1) Perception (color segmentation + depth, AprilTag‑based extrinsic calibration, homography for a bird’s‑eye view)
(2) Planning (algebraic IK for speed and a least‑squares IK fallback within joint limits)
(3) Execution (teach‑and‑repeat primitives plus guarded motion, all in ROS 2).

The arm publishes/consumes standard messages (/rx200/joint_states, /camera/*, vision_msgs) and re‑checks detections after every sequence to stay robust to drops or bumps.

Camera Calibration

Intrinsics: RealSense factory intrinsics are used for accuracy; checkerboard calibration was performed as a cross‑check.
Extrinsics (SolvePnP): AprilTags placed at 8 known world locations allow automatic camera‑to‑world calibration at startup, making the system robust to small camera bumps.
Pixel→World: Depth‑back‑projection (K⁻¹) recovers camera‑frame points; applying the extrinsic transform yields world poses. A planar homography rectifies to a top‑down view for easier reasoning and grid overlays.

Calibrated Camera View with Grid Overlaid. — Calibrated Camera View with Grid Overlaid

Block Detection

Blocks are segmented by color: HSV ranges for blue/green/yellow/orange/violet, and YCrCb thresholds for red (more robust to lighting drift). Post‑processing uses morphological filtering, contour extraction, and minAreaRect to return centroid, size, and rotation. I gate detections by area and mask out the arm’s workspace buffer to reduce false positives. Error rises at the far edge of the board (larger depth variance), so the planner favors nearer picks first when possible.

Forward Kinematics

Modeled the 5‑DOF RX200 using the Modified Denavit–Hartenberg convention, deriving transforms from base to end‑effector. Validation involved driving to test poses and comparing FK‑predicted end‑effector coordinates against measured joint angles; the results matched closely, confirming the geometric model.

Robot with DH Frames, Rotations, and Measurements. — Robot with DH Frames, Rotations, and Measurements

Table of DH Parameters. — Table of DH Parameters

Inverse Kinematics

Algebraic IK (primary): Closed‑form relations from the end‑effector transform yield joint solutions quickly, with explicit handling of multiple branches (base ±180° and two 3R elbow configurations) and joint‑limit filtering.
Least‑Squares IK (fallback): A numerical solver minimizes position error p_FK(θ) → p_target subject to joint bounds. Average positional error ≈ 0.5 cm over 10 trials; runtime ≈ 25 ms vs ≈ 0.13 ms for algebraic IK. I use it only when closed‑form hits limits or singularities.
Singularities: I detect and avoid wrist‑aligned cases (e.g., gripper over the base with loss of a rotational DOF) by preferring alternative branches or slightly offset targets.

Teach & Play

Teach‑and‑repeat baseline built by manually guiding the arm through a full block‑swap cycle, recording joint angles via ROS 2 bags, and replaying them as a repeatable program. This produced smooth trajectories and served as a fallback behavior while perception matured.

Results

Teach‑and‑Repeat: Completed 10 continuous swap cycles with smooth, repeatable joint traces.
IK Accuracy: Algebraic IK round‑trip (FK→IK) recovered original joint angles nearly identically; LS‑IK achieved ≈ 0.5 cm mean position error over 10 trials.
Speed: Algebraic IK ≈ 0.00013 s/solve; LS‑IK ≈ 0.025 s/solve.

Robot Completing for Highest Block Stack

Min Kyu Kim