ROBOTCORE™ helps build custom compute architectures for robots, or robot cores, that make robots faster, more deterministic and power-efficient. Simply put, it provides a development, build and deployment experience for creating robot hardware and hardware accelerators similar to the standard, non-accelerated ROS development flow.
Get ROBOTCORE license
Traditional software development in robotics is about building dataflows with ROS
computational graphs. These dataflows go from sensors to compute technologies, all the way down to actuators and back,
representing the "brain" of the robot. Generally, ROS computational graphs are built in the CPUs of a given robot. But CPUs have fixed hardware, with pre-defined memory architectures and constraints which limit the performance. Sparked by the decline of Moore's Law and Dennard Scaling, specialized computing units capable of hardware acceleration have proven to be the answer for achieving higher performance in robotics.
ROBOTCORE™ allows to easily leverage hardware acceleration in a ROS-centric manner and build custom compute architectures for robots, or "robot cores".
With robot cores, roboticists can adapt one or simultaneously more of the properties of their computational graphs
(e.g., its speed, determinism, power consumption) optimizing the amount of hardware resources and, as a
consequence, the performance in an accelerated dataflow.
ROBOTCORE™ deals with vendor proprietary libraries for hardware acceleration in robotics. It helps accelerate computations, increase performance and abstract away the complexity of bringing your ROS computational graphs to your favourite silicon architecture. All while delivering the common ROS development flow.
ROBOTCORE Perception is a ROS 2-API compatible optimized perception stack that leverages hardware acceleration to provide a speedup in your perception computations.
ROBOTCORE Cloud allows roboticists to launch parts of their ROS 2 computational graphs into the cloud while addressing interoperability and scalability issues. Supports Azure, GCP or AWS cloud providers.
ROBOTCORE Perception ROBOTCORE CloudROBOTCORE™ extends the ROS and ROS 2 build systems to allows roboticists to generate acceleration kernels in the same way they generate CPU libraries. Support for legacy ROS systems, and extensions to other middlewares is also possible.
ROBOTCORE™ is served by seasoned ROS developers for ROS development. It includes documentation, examples, reference designs and the possibility of various levels of support.
Ask about support levels(plots are interactive)
ROBOTCORE™ intra-FPGA ROS 2 communication queue speedup
(measured with perception_2nodes
ROS 2 pakcage, between subsequent Nodes, using AMD KV260)
1.5x
Performance-per-watt (Hz/W)
(measured during
iterations 10-100 using faster_doublevadd_publisher,
with AMD
KV260)
Performance-per-watt improvement (Hz/W)
3.69x
Resize - kernel runtime latency (ms) (ROBOTCORE Perception running on an AMD KV260, NVIDIA Isaac ROS running in a Jetson Nano 2GB. Measurements present the kernel runtime in milliseconds (ms) and discard ROS 2 message-passing infrastructure overhead and host-device (GPU or FPGA) data transfer overhead)
Rectify - speedup
7.34x
Rectify - kernel runtime latency (ms)
Harris - speedup
30.27x
Harris - kernel runtime latency (ms)
Histogram of Oriented Gradients - speedup
509.52x
Histogram of Oriented Gradients - kernel runtime latency (ms)
(requires ROBOTCORE™ Perception)
2-Node pre-processing perception graph latency (ms)
(Simple graph with 2 Nodes (Rectify-Resize) demonstrating perception
pre-processing with the image_pipeline ROS 2
package. AMD's KV260 and NVIDIA's Jetson Nano 2GB boards are used for benchmarking, the
former featuring a Quad-core arm Cortex-A53 and the latter a Quad-core arm Cortex-A57.
Source code used for the benchmark is available at the perception_2nodes
ROS 2 package)
2-Node pre-processing perception graph performance-per-watt (Hz/W)
Graph speedup - 2-Node pre-processing perception graph latency
3.6x
Performance-per-watt improvement - 2-Node pre-processing perception graph
6x
3-Node pre-processing and region of interest detector perception graph latency
(ms)
(3-Nodes graph (Rectify-Resize-Harris) demonstrating perception
pre-processing and region of interest detection with the image_pipeline ROS 2
package. AMD's KV260 featuring a Quad-core arm Cortex-A53 is used for benchmarking.
Source code used for the benchmark available at the perception_3nodes
ROS 2 package)
Graph speedup - 3-Node pre-processing and region of interest detector
perception
graph
4.5x
(requires ROBOTCORE™ Transform)
tf tree subscription latency
(us),
2 subscribers
(Measured the worse case subscription latency in a graph with
2 tf tree subscribers. Using AMD's KV260 board, NVIDIA's Jetson Nano 2GB and Microchip's PolarFire Icicle Kit.
AMD's KV260 board has been used for benchmarking the CPU default tf2 baseline.
)
tf tree subscription latency
(us),
20-100 subscribers
(Measured the worse case subscription latency in a graph with
multiple tf tree subscribers.
AMD's KV260 board has been used for benchmarking all results.
)
(requires ROBOTCORE™ Cloud)
ORB-SLAM2 Simultaneous Localization and Mapping (SLAM) Node runtime (s)
(Measured the mean per-frame runtime obtained from the ORB-SLAM2 Node while running in two scenarios: 1) Default ROS 2 running on the edge with an Intel NUC with an Intel® Pentium® Silver J5005 CPU @ 1.50 GHz with 2 cores enabled and with a 10 Mbps network connection and 2) ROBOTCORE™ Cloud running in the cloud with a 36-core cloud computer provisioned.)
Node speedup - ORB-SLAM2 SLAM Node runtime
4x
(requires ROBOTCORE™ Cloud)
Grasp Planning with Dex-Net compute runtime (s)
(Measured the mean compute runtime obtained over 10 trials while using a a Dex-Net Grasp Quality Convolutional Neural Network to compute grasps from raw RGBD image observations. Two scenarios are considered: 1) Edge - running on the edge with an Intel NUC with an Intel® Pentium® Silver J5005 CPU @ 1.50 GHz with 2 cores enabled and with a 10 Mbps network connection and 2) ROBOTCORE™ Cloud - the same edge machine to collect raw image observations sent to a cloud computer equiped with an Nvidia Tesla T4 GPU.
)
Grasp Planning speedup - Dex-Net computation total runtime (including network)
11.7x
Motion Planning Templates (MPT) compute runtime (s)
(Measuring the mean compute runtime while running multi-core motion planners from the Motion Planning Templates (MPT) on reference planning problems from the Open Motion Planning Library (OMPL). Two scenarios are considered: 1) Edge - running on the edge with an Intel NUC with an Intel® Pentium® Silver J5005 CPU @ 1.50 GHz with 2 cores enabled and with a 10 Mbps network connection and 2) ROBOTCORE™ Cloud - the same edge offloads computations to a 96-core cloud computer.
)
Motion planning speedup - Motion Planning Templates (MPT) compute runtime (including network)
28.9x
Get in touch with our team and we'll do our best to get back to you.
Let's talk