Accelerated Perception for ROS | Speed up your robotics perception stack

Benchmarks

(plots are interactive)

ROS 2 PERCEPTION NODES

(Measurements discard ROS 2 message-passing infrastructure overhead and host-device (GPU or FPGA) data transfer overheads)

Resize - speedup

2.61x

Resize - kernel runtime latency (ms) (ROBOTCORE Perception running on an AMD KV260, NVIDIA Isaac ROS running in a Jetson Nano 2GB. Measurements present the kernel runtime in milliseconds (ms) and discard ROS 2 message-passing infrastructure overhead and host-device (GPU or FPGA) data transfer overhead)

Resize - % resource consumption (LUT, FF, DSP, BRAM)
(Considering an AMD KV260 board for resource estimation and an image with 4K resolution as input. Additional details available in the ROBOTCORE Perception documentation.)

Rectify - speedup

7.34x

Rectify - kernel runtime latency (ms)

Rectify - % resource consumption (LUT, FF, DSP, BRAM)
(Considering an AMD KV260 board for resource estimation and an image with 4K resolution as input. Additional details available in the ROBOTCORE Perception documentation.)

Harris - speedup

30.27x

Harris - kernel runtime latency (ms)

Harris - % resource consumption (LUT, FF, DSP, BRAM)
(Considering an AMD KV260 board for resource estimation and an image with 4K resolution as input. Additional details available in the ROBOTCORE Perception documentation.)

Histogram of Oriented Gradients - speedup

509.52x

Histogram of Oriented Gradients - kernel runtime latency (ms)

Histogram of Oriented Gradients - % resource consumption (LUT, FF, DSP, BRAM)
(Considering an AMD KV260 board for resource estimation and an image with 4K resolution as input. Additional details available in the ROBOTCORE Perception documentation.)

Canny Edge Tracing - speedup

3.26x

Canny Edge Tracing - kernel runtime latency (ms)

Canny Edge Tracing - % resource consumption (LUT, FF, DSP, BRAM)
(Considering an AMD KV260 board for resource estimation and an image with 4K resolution as input. Additional details available in the ROBOTCORE Perception documentation.)

Fast Corner Detection - speedup

8.43x

Fast Corner Detection - kernel runtime latency (ms)

Fast Corner Detection - % resource consumption (LUT, FF, DSP, BRAM)
(Considering an AMD KV260 board for resource estimation and an image with 4K resolution as input. Additional details available in the ROBOTCORE Perception documentation.)

Gaussian Difference - speedup

11.94x

Gaussian Difference - kernel runtime latency (ms)

Gaussian Difference - % resource consumption (LUT, FF, DSP, BRAM)
(Considering an AMD KV260 board for resource estimation and an image with 4K resolution as input. Additional details available in the ROBOTCORE Perception documentation.)

Bilateral Filter - speedup

9.33x

Bilateral Filter - kernel runtime latency (ms)

Bilateral Filter - % resource consumption (LUT, FF, DSP, BRAM)
(Considering an AMD KV260 board for resource estimation and an image with 4K resolution as input. Additional details available in the ROBOTCORE Perception documentation.)

Stereo LBM - speedup

5.19x

Stereo LBM - kernel runtime latency (ms)

Stereo LBM - % resource consumption (LUT, FF, DSP, BRAM)
(Considering an AMD KV260 board for resource estimation and an image with 4K resolution as input. Additional details available in the ROBOTCORE Perception documentation.)

ROS 2 PERCEPTION GRAPHS

2-Node pre-processing perception graph latency (ms)
(Simple graph with 2 Nodes (Rectify-Resize) demonstrating perception pre-processing with the image_pipeline ROS 2 package. AMD's KV260 and NVIDIA's Jetson Nano 2GB boards are used for benchmarking, the former featuring a Quad-core arm Cortex-A53 and the latter a Quad-core arm Cortex-A57. Source code used for the benchmark is available at the perception_2nodes ROS 2 package)

2-Node pre-processing perception graph performance-per-watt (Hz/W)

Graph speedup - 2-Node pre-processing perception graph latency

3.6x

Performance-per-watt improvement - 2-Node pre-processing perception graph

3-Node pre-processing and region of interest detector perception graph latency (ms)
(3-Nodes graph (Rectify-Resize-Harris) demonstrating perception pre-processing and region of interest detection with the image_pipeline ROS 2 package. AMD's KV260 featuring a Quad-core arm Cortex-A53 is used for benchmarking. Source code used for the benchmark available at the perception_3nodes ROS 2 package)

Graph speedup - 3-Node pre-processing and region of interest detector perception graph

4.5x

Other products

ROBOTCORE Perception

Speed up your ROS robotics perception pipelines

Accelerated robotics perception

Default ROS
perception

0.7 Hz/image for a 3-Node perception pipeline in embedded

ROBOTCORE
Perception

4.5x speedup in simple ROS perception pipelines

Towards more
capable robots

Co-developed
with the best

Developer-ready documentation and support

Benchmarks

ROS 2 PERCEPTION NODES

ROS 2 PERCEPTION GRAPHS

Do you have any questions?

Other products

ROBOTCORE Perception

Speed up your ROS robotics perception pipelines

Accelerated robotics perception

Default ROSperception

0.7 Hz/image for a 3-Node perception pipeline in embedded

ROBOTCOREPerception

4.5x speedup in simple ROS perception pipelines

Towards more capable robots

Co-developed with the best

Developer-ready documentation and support

Benchmarks

ROS 2 PERCEPTION NODES

ROS 2 PERCEPTION GRAPHS

Do you have any questions?

Default ROS
perception

ROBOTCORE
Perception

Towards more
capable robots

Co-developed
with the best