Synthetic Data Generation
Introduction
Training AI models for robotics requires massive labeled datasets—millions of images with bounding boxes, segmentation masks, or depth maps. Collecting and labeling real-world data is expensive and time-consuming.
Synthetic data generated in simulation solves this problem: create photorealistic scenes in Isaac Sim, randomize parameters (lighting, textures, object positions), and automatically generate labels. Train models on synthetic data, then deploy to real robots.
This section covers synthetic data workflows, domain randomization techniques, and sim-to-real transfer strategies.
Why Synthetic Data?
Real-World Data Challenges
Manual Labeling is Slow:
1 person × 8 hours/day × 100 images/hour = 800 labeled images/day
Need 100,000 images → 125 days of work
Expensive:
- Annotation services: $0.10 - $1.00 per image
- 100,000 images = $10,000 - $100,000
Limited Diversity:
- Hard to capture rare scenarios (fire, flooding)
- Expensive to stage variations (100 lighting conditions)
Synthetic Data Advantages
Automatic Labeling:
Isaac Sim: 1,000 images/hour with perfect labels (bounding boxes, segmentation, depth)
Need 100,000 images → 100 hours of GPU time (~$50 on cloud)
Perfect Ground Truth:
- Exact 3D positions
- Pixel-perfect segmentation
- Occlusion-aware labels
Unlimited Diversity:
- Randomize lighting, weather, backgrounds
- Generate rare scenarios easily
- Test edge cases systematically
Domain Randomization
Domain randomization varies simulation parameters to create diverse training data that generalizes to real-world variations.
What to Randomize?
1. Lighting
- Light intensity (100 - 5000 lux)
- Color temperature (warm/cool)
- Number and position of lights
- HDR environment maps
2. Camera Parameters
- Exposure, gain, white balance
- Lens distortion, chromatic aberration
- Motion blur (for dynamic scenes)
- Sensor noise
3. Object Properties
- Textures and materials
- Colors (hue, saturation, brightness)
- Positions and orientations
- Scales (within realistic bounds)
4. Scene Composition
- Background clutter
- Distractors (irrelevant objects)
- Number of target objects
- Occlusions
Domain Randomization Example
import random
import omni
def randomize_scene():
# Randomize lighting
dome_light = "/World/DomeLight"
intensity = random.uniform(500, 3000)
omni.kit.commands.execute("ChangePrimProperty",
prim_path=dome_light,
property="inputs:intensity",
value=intensity
)
# Randomize object positions
for obj in ["/World/Cup", "/World/Book", "/World/Pen"]:
x = random.uniform(-1.0, 1.0)
y = random.uniform(-0.5, 0.5)
z = 1.0 # On table
set_position(obj, [x, y, z])
# Randomize textures
materials = ["Wood", "Metal", "Plastic", "Ceramic"]
apply_material("/World/Table", random.choice(materials))
# Randomize camera
camera = "/World/Humanoid/Camera"
exposure = random.uniform(-2, 2) # EV stops
set_camera_exposure(camera, exposure)
Run this function 100,000 times → 100,000 diverse training images.
Synthetic Data Pipeline in Isaac Sim
Step 1: Create Base Scene
Build realistic environment:
from omni.isaac.core import World
from omni.isaac.core.objects import DynamicCuboid, VisualCuboid
world = World()
# Add table
table = VisualCuboid(
prim_path="/World/Table",
size=[1.0, 0.6, 0.9],
position=[1.0, 0.0, 0.45],
color=[0.7, 0.5, 0.3] # Wood color
)
# Add target objects
cup = DynamicCuboid(
prim_path="/World/Cup",
size=[0.08, 0.08, 0.12],
position=[1.0, 0.2, 1.0],
color=[1.0, 1.0, 1.0] # White
)
Step 2: Configure Replicator
Isaac Sim includes Omniverse Replicator for data generation:
import omni.replicator.core as rep
# Define camera
camera = rep.create.camera(position=(2, 0, 1.5), look_at="/World/Table")
# Configure rendering
render_product = rep.create.render_product(camera, (1280, 720))
# Enable annotators (labels)
rgb = rep.AnnotatorRegistry.get_annotator("rgb")
bbox_2d = rep.AnnotatorRegistry.get_annotator("bounding_box_2d_tight")
semantic_seg = rep.AnnotatorRegistry.get_annotator("semantic_segmentation")
# Attach to camera
rgb.attach(render_product)
bbox_2d.attach(render_product)
semantic_seg.attach(render_product)
Step 3: Randomization Graph
Define randomization logic:
def randomize():
with rep.new_layer():
# Randomize light intensity
lights = rep.get.prims(path_pattern="/World/.*Light")
with lights:
rep.modify.attribute("inputs:intensity", rep.distribution.uniform(1000, 5000))
# Randomize object positions
objects = rep.get.prims(path_pattern="/World/Cup|/World/Book")
with objects:
rep.modify.pose(
position=rep.distribution.uniform((-1, -0.5, 1.0), (1, 0.5, 1.0)),
rotation=rep.distribution.uniform((0, 0, 0), (0, 0, 360))
)
# Randomize textures
with objects:
rep.randomizer.materials(
materials=rep.get.prims(path_pattern="/World/Looks/.*")
)
return True
rep.randomizer.register(randomize)
Step 4: Generate Data
Run data generation loop:
# Generate 10,000 frames
rep.orchestrator.run_until_complete(num_frames=10000)
# Data automatically saved to:
# - RGB images: /_isaac_sim/rgb/*.png
# - Bounding boxes: /_isaac_sim/bounding_box_2d_tight/*.json
# - Segmentation: /_isaac_sim/semantic_segmentation/*.png
Annotation Formats
Bounding Box (COCO Format)
{
"images": [
{"id": 1, "file_name": "frame_0001.png", "width": 1280, "height": 720}
],
"annotations": [
{
"id": 1,
"image_id": 1,
"category_id": 1,
"bbox": [320, 180, 150, 200], # [x, y, width, height]
"area": 30000,
"iscrowd": 0
}
],
"categories": [
{"id": 1, "name": "cup"},
{"id": 2, "name": "book"}
]
}
Compatible with YOLOv8, Faster R-CNN, etc.
Semantic Segmentation
# segmentation.png (grayscale image)
# Pixel value = class ID
# 0: background
# 1: cup
# 2: book
# 3: table
Convert to color-coded visualization:
import numpy as np
import cv2
seg_map = cv2.imread("segmentation.png", cv2.IMREAD_GRAYSCALE)
color_map = {
0: [0, 0, 0], # Background: black
1: [255, 0, 0], # Cup: red
2: [0, 255, 0], # Book: green
3: [0, 0, 255] # Table: blue
}
colored_seg = np.zeros((seg_map.shape[0], seg_map.shape[1], 3), dtype=np.uint8)
for class_id, color in color_map.items():
colored_seg[seg_map == class_id] = color
cv2.imwrite("segmentation_colored.png", colored_seg)
Depth Maps
# depth.npy (NumPy array)
depth = np.load("depth.npy") # Shape: (720, 1280), values in meters
Visualize as grayscale image:
depth_normalized = (depth - depth.min()) / (depth.max() - depth.min())
depth_vis = (depth_normalized * 255).astype(np.uint8)
cv2.imwrite("depth_vis.png", depth_vis)
Training on Synthetic Data
Object Detection (YOLOv8)
# Install Ultralytics
pip install ultralytics
# Train on synthetic COCO dataset
yolo detect train \
data=/path/to/coco_synthetic.yaml \
model=yolov8n.pt \
epochs=100 \
imgsz=640
coco_synthetic.yaml:
path: /isaac_sim_output
train: images/train
val: images/val
names:
0: cup
1: book
2: pen
After training, deploy to Isaac ROS for real-time inference.
Depth Estimation
Train depth prediction model:
# Pseudo-code for depth training
model = DepthNet()
optimizer = Adam(model.parameters())
for epoch in epochs:
for rgb, depth_gt in dataloader:
# Predict depth from RGB
depth_pred = model(rgb)
# Loss: L1 distance
loss = nn.L1Loss()(depth_pred, depth_gt)
optimizer.zero_grad()
loss.backward()
optimizer.step()
Synthetic depth maps provide perfect ground truth.
Sim-to-Real Transfer
The Reality Gap
Models trained purely on synthetic data may fail on real data due to:
Visual Differences:
- Simulated textures don't match real materials exactly
- Lighting models approximate real-world illumination
- Camera sensor noise differs
Physics Differences:
- Contact dynamics simplified
- Object behaviors (deformation, friction) approximated
Bridging the Gap
1. Domain Randomization (covered above)
- Vary synthetic data parameters widely
- Model learns to ignore irrelevant variations
2. Domain Adaptation
- Fine-tune on small real-world dataset
- Use transfer learning (pre-train on synthetic, fine-tune on real)
3. Sensor Noise Modeling
- Add realistic camera noise to synthetic images
- Match ISO, exposure, compression artifacts
4. Progressive Training
Phase 1: Train on 100% synthetic data (100k images)
Phase 2: Fine-tune on 10% real data (10k images)
Phase 3: Validate on held-out real data
5. Visual Style Transfer
- Use CycleGAN to make synthetic images look more realistic
- Or use real images as backgrounds, composite synthetic objects
Validation Strategy
Simulation Testing:
Synthetic validation set: 10k images
Measure: mAP@0.5 = 0.92 (great!)
Real-World Testing:
Real validation set: 1k images
Measure: mAP@0.5 = 0.78 (good, acceptable gap)
If gap > 20%, revisit domain randomization or add real data.
Best Practices
1. Match Real-World Distribution
Ensure synthetic data reflects real deployment:
# If real robot operates in warehouses:
environments = ["warehouse_1", "warehouse_2", "warehouse_3"]
# If real robot sees 80% boxes, 20% people:
object_distribution = {
"box": 0.8,
"person": 0.2
}
2. Balance Randomization
Too little: Model overfits to specific conditions Too much: Model can't learn meaningful patterns
Start conservative, increase randomization if real-world performance is poor.
3. Validate Early
Test on real data after every 10k synthetic images:
Iteration 1: 10k synthetic → Test on 100 real → mAP = 0.60
Iteration 2: 20k synthetic → Test on 100 real → mAP = 0.70
Iteration 3: 30k synthetic → Test on 100 real → mAP = 0.75 (diminishing returns)
Stop when adding more data doesn't improve performance.
4. Use Pre-trained Models
Don't train from scratch:
# Start with ImageNet pre-trained model
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
# Fine-tune on synthetic data
fine_tune(model, synthetic_dataset)
Pre-training on real images (ImageNet) helps bridge the sim-to-real gap.
Summary
Synthetic data generation in Isaac Sim enables:
Scalable Training Data:
- Generate 100k+ labeled images in hours
- Perfect annotations (bounding boxes, segmentation, depth)
- Unlimited diversity via domain randomization
Domain Randomization:
- Vary lighting, textures, positions, camera parameters
- Model learns to generalize across variations
- Improves real-world performance
Sim-to-Real Transfer:
- Domain randomization reduces reality gap
- Fine-tune on small real-world datasets
- Validate early and often on real data
Integration:
- Train models on synthetic data
- Deploy with Isaac ROS for GPU-accelerated inference
- Use in Nav2 for perception-based navigation
Synthetic data democratizes AI robotics—no longer need massive labeled datasets to train production-quality models.
Next Steps: You've completed Module 3! Continue to Module 4: Vision-Language-Action to learn how humanoids understand and execute natural language commands.
References
NVIDIA. (2024). Omniverse Replicator Documentation. https://docs.omniverse.nvidia.com/extensions/latest/ext_replicator.html
Tobin, J., et al. (2017). Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. IEEE/RSJ IROS.