Unwrapping the OAK-D's 5 layers
July 29, 2022
We spoke to Brandon Gilles, Co-Founder of Luxonis about their product: the OAK-D. You can listen to the full episode below, or read along for key takeaways and snippets.
Brandon pulled in lessons from his career at Ubiquiti, where they shook up the networking industry by taking a software-first approach to a hardware-first industry.
The networking industry traditionally relied on using faster and faster CPUs to solve all of the routing, switching, and wifi problems. This led to ever-increasing power consumption.
Ubiquiti saw that these problems typically use the same functions, and wondered - What would happen if they baked those into Silicon?
Two things would increase by orders of magnitude:
- Power efficiency
- Software complexity
The proposed architecture is referred to as a “Network-On-Chip”. Writing low-level software for a CPU requires learning a single architecture. Supporting a Network-On-Chip requires learning dozens of distinct architectures.
Ubiquiti took the gamble and it paid off. The industry standard 1000-watt systems had to compete against Ubiquiti’s 5-watt systems.
Applying Network-On-Chip to Computer Vision
Movidius, much like Ubiquiti, saw the opportunity to reduce power consumption for computer vision dramatically.
You have things that you just know you're always gonna want, like warp and de-warp and feature extraction and vectorize processing and neural inference, acceleration. All of these things that go together on robotic perception systems.
Movidius implemented a System-On-Chip architecture to hardware optimize multiple vision-related functions in their Myriad line of products. The OAK-D leverages the Myriad X to run AI algorithms on board and make high-level, vision-based assessments of the environment.
What makes the OAK-D a great platform?
Brandon breaks the OAK-D’s value proposition down into five layers.
Layer 1: Hardware
This includes the hardware components most people associate with the OAK-D:
- A stereo camera for depth calculation
- A 4K camera in the center for high-quality video
- An Intel® Movidius™ Myriad™ X Vision Processing Unit
Layer 2: Firmware
For the OAK-D, their firmware interfaces with 38 architectures. A tough path to go down and a deterrent for any company except the most determined. As we saw with Network-On-Chip performance gains by Ubiquiti, doing this well has pushed computer vision at the edge forward.
Layer 3: Software
This layer is where most robotics companies are comfortable developing code to address business applications. The Luxonis team developed several open-source packages optimized for the hardware platform.
- Object Detection: MobileNet, Yolo, EfficientDet, Palm detection.
- Landmark detection: Human pose, hand landmarks, facial landmarks.
- Semantic segmentation: Person segmentation, multiclass segmentation, road segmentation.
- Classification: EfficientNet, Tensorflow classification, fire classification, emotions classification.
- Recognition: Face recognition, person identification, OCR, license plate recognition.
- And hundreds more.
Layer 4: Training AI Models
This is where you get your AI models ready for production.
They offer open source training and retraining notebooks for AI models. Integrations to dataset management platforms like Roboflow.
A plugin to Unity enables developers to create simulations allowing them the flexibility to experiment rapidly.
By this stage, a business can convert a 7.5 gigabit per second stream of raw data into a 2 kilobyte per second stream of actionable, high-level data generated from the AI models.
Layer 5: Cloud Monitoring, Deployment, and A/B Testing
The final layer is key to unlocking deployment at scale. How do you go from having a single device performing complex vision tasks to deploying several thousand across the country?
Firstly, you need a robust method of debugging. Programmatic hooks allow the cameras to record the full 7.5GBPS stream of raw data to disk, only when a pre-set condition is met.
Secondly, you need dashboards to organize the large streams of data to give you actionable insights into performance.
Thirdly, testing and updating the software, at scale, is key to improving performance. This is where A/B testing routines give operators the flexibility to try updates out on a case-by-case basis.
For robotics applications, finite resources are a constant. You have limited battery life, compute, engineering resources, and money.
Going down a difficult, hardware optimized path like Luxonis did takes a lot of money, time, and skill. But still, you can leverage their hard work for $200 and use the time saved to focus on building product.