Blog:
Hands-On with Hailo: Accelerating Edge AI on Toradex Modules

Why go to the edge?
Embedded devices are getting smarter by the day and many machine learning and computer vision tasks are getting pushed to the edge. Running AI models on such devices can be quite challenging but it offers several advantages:
- Reduced latency: Processing data on the device eliminates the delay while transmitting data to the cloud or to a central processor.
- Enhanced privacy: Sensitive data remains on the device, ensuring compliance with strict privacy regulations.
- Lower bandwidth costs: Edge processing reduces the need to send large volumes of data to centralized servers.
- Improved reliability: Systems can operate independently of an internet connection.
Why an external AI accelerator?
Toradex offers a wide range of Systems on Modules (SoMs), some of which feature integrated Neural Processing Units (NPUs) capable of handling different AI workloads. For example, the Verdin iMX8M Plus, Verdin iMX95, and Aquila AM69 feature NPUs designed to accelerate inference on the edge, making them suitable for many computer vision and machine learning applications.
While those SoMs provide robust AI solutions, External AI accelerators, such as the Hailo-8, EdgeX, MemryX, and Google Coral, address these challenges by offering modular, decoupled, and scalable solutions for edge AI inference. This allows for more flexibility and future-proofing AI.
1. Decouple AI Processing from SoC Vendor Stacks
One of the biggest challenges in running machine learning at the edge is adapting the model to run on a specific hardware or runtime library. Be it the NXP eiQ platform, TI Edge AI Studio, or ONNX exporter, each comes with its own AI toolkits and optimizations. External AI accelerators decouple AI workload from the rest of the hardware, providing a unified AI environment that can be run across multiple hardware platforms.
2. Modularity and Scalability
AI applications are dynamic and the performance requirements can change as the workloads grow in complexity or new features are created. While built-in NPUs can provide a solid solution, sometimes they may fall short of adapting to new scenarios.
Hailo is a manufacturer of AI processors designed to run advanced machine learning applications on the edge in a wide variety of applications and industries including smart cities, automotive, manufacturing, agriculture, retail, and more.
We got our hands on the Hailo-8 M.2 module and tested it with several Toradex SOMs. The Hailo-8 M.2 Module is an AI accelerator module with 26 TOPS and a PCIe Gen-3.0 4-lane in M-key module. The M.2 module can be plugged into several Toradex carrier boards to execute in real-time deep neural network inferencing.
How can Hailo leverage the Toradex Ecosystem?
Offload pre/post processing steps
The typical Computer Vision pipeline follows a linear pattern. From the camera capture source, until the application takes action, the image must go through each of the processing steps. This means that if any of the steps take longer than the next, that is the bottleneck.
Usually, when comparing machine learning models or hardware, we give a lot of attention to the inference speed, but that is just a single part of the problem.
Complete Software Stack
Hailo is a complete AI solution that supports most of the steps of a common machine-learning workflow.
- Evaluation Performance
- Model Training
- Some pre-trained models include a re-training environment.
- Compiler and Runtime
- Hailo Dataflow Compiler
- pyHailoRT and GStreamer plugins
From the Toradex side, this workflow can be complemented using the Torizon Cloud Platform.
- Monitoring the Performance
- Updating the Weights
Hardware
Supported Hardware Configuration
| Family | SoM | Carrier Board | Hailo |
| Aquila | TI AM69 (1+2 x PCIe 3.0) |
Clover (M.2 key B+M) |
Hailo-8 Hailo-8L |
| Aquila | NXP i.MX 95 (1 x PCIe 3.0) |
Clover (M.2 key B+M) |
Hailo-8 Hailo-8L |
| Verdin | NXP i.MX 95 (1 x PCIe 3.0) |
Mallow (M.2 key B) |
Hailo-8 Hailo-8L |
| Verdin | NXP i.MX 8M Plus (1 x PCIe 3.0) |
Mallow (M.2 key B) |
Hailo-8 Hailo-8L |
| Verdin | NXP i.MX 8M Mini (1 x PCIe 2.0) |
Mallow (M.2 key B) |
Hailo-8 Hailo-8L |
| Apalis | NXP i.MX8 (2 x PCIe 3.0) |
Ixora (Mini PCIe) |
Hailo-8R mPCIe |
Software
| OS | Version | Additional Resources |
| Torizon OS | BSP 7 | meta-hailo layer (coming soon) |
| Torizon OS | BSP 6 | runtime container (coming soon) |
| Torizon OS Minimal | BSP 6 | meta-hailo kirkstone OpenEmbedded layer for GStreamer 1.0 |
| tdx-reference-multimedia | BSP 6 | meta-hailo kirkstone |
In this example, we are going to run the demo application from Tappas:
At the end of this example, you should have an output like this. Running at 60+ FPS (depending on your camera).
What we are going to use:
- Camera
- If using a USB camera, the FPS may be very low because of the camera capture speed.
- Display
- Verdin i.MX8MP + Mallow Carrier Board
- The Verdin iMX8M Plus QuadLite 1GB IT (0065) is not compatible with the Framos camera.
- Hailo AI Accelerator
- Build Torizon OS from Source
- Build base Torizon OS
- Add the dependencies
- Hardware setup
- Connect the Hailo Device
- Connect the Camera
- Install the new image
- Check Everything
- Run the Example
Build Torizon OS from Source
First, you may want to check your computer's RAM and disk space
Build Torizon OS base Image
We are going to use the CROPS container to build the following image:
| Torizon OS Disto | Machine | Torizon OS Image Target | Version |
| torizon | verdin-imx8mp | torizon-minimal | 6.8.0 |
Setup the working directory
cd ~ mkdir ~/yocto-workdir
Run the Container (this is going to build the base image)
This is going to use a lot of RAM and take a couple of hours to finish.
The second line of the command maps the host volume to the container workdir. Note that this folder ~/yocto-workdir was created in the previous step..
docker run --rm -it --name=crops \ -v ~/yocto-workdir:/workdir \ --workdir=/workdir \ -e MACHINE="verdin-imx8mp" \ -e IMAGE="torizon-minimal" \ -e DISTRO="torizon" \ -e BRANCH="refs/tags/6.8.0" \ -e MANIFEST="torizoncore/default.xml" \ -e ACCEPT_FSL_EULA="1" \ -e BB_NUMBER_THREADS="2" \ -e PARALLEL_MAKE="-j 2" \ # not sure if I can pass those like this torizon/crops:kirkstone-6.x.y startup-tdx.sh
Add the Dependencies to the Image
To add the dependencies, first navigate to the ~/yocto-workdir/layers folder.
cd ./layers
We are going to add the following layers:
For more details about meta layers, check the full documentation.
From inside the torizon/crops:kirkstone-6.x.y container, run the bitbake add layers command.
bitbake-layers add-layer meta-hailo/meta-hailo-accelerator bitbake-layers add-layer meta-hailo/meta-hailo-libhailort bitbake-layers add-layer meta-hailo/meta-hailo-tappas bitbake-layers add-layer meta-hailo/meta-hailo-vpu bitbake-layers add-layer meta-toradex-framos bitbake-layers add-layer meta-gstreamer1.0
Add the packages on the build-torizon/conf/local.conf file. Add these lines at the end of the file.
IMAGE_INSTALL:append = " libhailort hailortcli pyhailort libgsthailo hailo-pci hailo-firmware" IMAGE_INSTALL:append = " gstreamer1.0 gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-bad" IMAGE_INSTALL:append = " v4l-utils"
Build the image with the new layers.
bitbake torizon-minimal
You can find the Toradex Easy Installer compatible image under ~/yocto-workdir/build-torizon/deploy/images/verdin-imx8mp/torizon-minimal-verdin-imx8mp-Tezi_6.8.0-devel-<date>+build.0.tar.
Hardware Setup
Connect the Hailo Device
Connect the Hailo Device in the M.2 Slot of the Mallow Carrier Board.
Connect the Camera
Connect the Camera to the MIPI-CSI connector of the Mallow Carrier Board.
Install the new Torizon OS Image
Use the Toradex Easy Installer (Tezi) to flash a new image to the Device.
- Download the Tezi
- Put the device into recovery mode
- Install the newly built image
Check the Instalation
Hailo Device
sudo su hailocli scan hailocli device-info
The output of those commands should detect that the device is properly connected and the drivers are working.
Display
gst-launch-1.0 videotestsrc ! videoconvert ! autovideosink
You should see some colorful patterns on the screen.
Camera Device
This step can be different depending on the camera used.
v4l2-ctl -d2 -D
v4l2-ctl --list-formats-ext -d /dev/video2
For the Framos Camera, this is the expected output.
root@verdin-imx8mp-15445736:~# v4l2-ctl --list-formats-ext -d /dev/video2 ioctl: VIDIOC_ENUM_FMT Type: Video Capture [0]: 'YUYV' (YUYV 4:2:2) Size: Stepwise 176x144 - 4096x3072 with step 16/8 [1]: 'NV12' (Y/CbCr 4:2:0) Size: Stepwise 176x144 - 4096x3072 with step 16/8 [2]: 'NV16' (Y/CbCr 4:2:2) Size: Stepwise 176x144 - 4096x3072 with step 16/8 [3]: 'RG12' (12-bit Bayer RGRG/GBGB) Size: Stepwise 176x144 - 4096x3072 with step 16/8
In the demo, we are going to use the YUYV format. So keep in mind the values in there.
gst-launch-1.0 -v v4l2src device=/dev/video2 ! video/x-raw ! videoconvert ! autovideosink
Run the Example
Some cameras specify the resolution and frames per second, so adjust the values accordingly. This can be done by modifying the framerate value of the PIPELINE variable.
sudo su cd ~/apps/detection/ ./detection.sh
Done![]()
In a future blog post we will cover the following:
- Use the device-metrics feature to monitor the device from the Torizon Cloud.
- Retrain a model using Hailo environment containers.
- Use the Torizon Remote Updates to change the version of the model running.
Why Toradex?
Toradex has over 21 years of excellence in the embedded industry, offering a wide range portfolio of system-on-modules (SoMs) and carrier boards that allow businesses to build scalable, high-performance embedded applications.
- Quality and Reliability:
Toradex hardware is designed to last, even under harsh industrial environments. With high-quality components and rigorous testing, Toradex ensures minimal downtime for the most critical applications. - Software Ecosystem:
- Torizon OS → A Yocto-based, easy-to-use industrial Linux distribution.
- Torizon Cloud → Secure OTA updates, device monitoring, and remote access.
- Torizon IDE → Develop, debug, and deploy from the VS Code Extension.
- Product Lifetime:
Toradex is committed to 10+ years of product availability, ensuring stability. The product remains supported and available over an extended period of time. - Developer Resources:
Easier development means faster deployment. Toradex provides extensive developer resources: - Comprehensive Documentation
- Free support channel from The Community and from Toradex experts.
- Development tools, such as TCB, Tezi, Torizon Containers
Allan Kamimura, Technical Marketing Intern, Toradex







