Blog:
Hands-On with Hailo: Accelerating Edge AI on Toradex Modules

Friday, February 28, 2025
Hailo Processor
Hailo Processor
Computer Vision on the Edge

Why go to the edge?

Embedded devices are getting smarter by the day and many machine learning and computer vision tasks are getting pushed to the edge. Running AI models on such devices can be quite challenging but it offers several advantages:

  1. Reduced latency: Processing data on the device eliminates the delay while transmitting data to the cloud or to a central processor.
  2. Enhanced privacy: Sensitive data remains on the device, ensuring compliance with strict privacy regulations.
  3. Lower bandwidth costs: Edge processing reduces the need to send large volumes of data to centralized servers.
  4. Improved reliability: Systems can operate independently of an internet connection.

Why an external AI accelerator?

Toradex offers a wide range of Systems on Modules (SoMs), some of which feature integrated Neural Processing Units (NPUs) capable of handling different AI workloads. For example, the Verdin iMX8M Plus, Verdin iMX95, and Aquila AM69 feature NPUs designed to accelerate inference on the edge, making them suitable for many computer vision and machine learning applications.

While those SoMs provide robust AI solutions, External AI accelerators, such as the Hailo-8, EdgeX, MemryX, and Google Coral, address these challenges by offering modular, decoupled, and scalable solutions for edge AI inference. This allows for more flexibility and future-proofing AI.

1. Decouple AI Processing from SoC Vendor Stacks
One of the biggest challenges in running machine learning at the edge is adapting the model to run on a specific hardware or runtime library. Be it the NXP eiQ platform, TI Edge AI Studio, or ONNX exporter, each comes with its own AI toolkits and optimizations. External AI accelerators decouple AI workload from the rest of the hardware, providing a unified AI environment that can be run across multiple hardware platforms.

Example: A computer vision solution developed for an x86 device using a Hailo-8 AI accelerator can be migrated to an Aquila AM69 SoM using a Hailo-8 without reworking the AI stack. This decoupling ensures that only minimal adjustments are required, significantly reducing time-to-market.

2. Modularity and Scalability
AI applications are dynamic and the performance requirements can change as the workloads grow in complexity or new features are created. While built-in NPUs can provide a solid solution, sometimes they may fall short of adapting to new scenarios.

Introduction to Hailo

Hailo is a manufacturer of AI processors designed to run advanced machine learning applications on the edge in a wide variety of applications and industries including smart cities, automotive, manufacturing, agriculture, retail, and more.

We got our hands on the Hailo-8 M.2 module and tested it with several Toradex SOMs. The Hailo-8 M.2 Module is an AI accelerator module with 26 TOPS and a PCIe Gen-3.0 4-lane in M-key module. The M.2 module can be plugged into several Toradex carrier boards to execute in real-time deep neural network inferencing.

How can Hailo leverage the Toradex Ecosystem?

Offload pre/post processing steps

The typical Computer Vision pipeline follows a linear pattern. From the camera capture source, until the application takes action, the image must go through each of the processing steps. This means that if any of the steps take longer than the next, that is the bottleneck.

Usually, when comparing machine learning models or hardware, we give a lot of attention to the inference speed, but that is just a single part of the problem.

Complete Software Stack

Hailo is a complete AI solution that supports most of the steps of a common machine-learning workflow.

  1. Evaluation Performance
    1. TAPPAS is a repository with application examples.
    2. the Model Zoo includes some model benchmarks as well as the pre-trained models.
  2. Model Training
    1. Some pre-trained models include a re-training environment.
  3. Compiler and Runtime
    1. Hailo Dataflow Compiler
    2. pyHailoRT and GStreamer plugins


From the Toradex side, this workflow can be complemented using the Torizon Cloud Platform.

  1. Monitoring the Performance
    1. Identify any problems ahead of time and keep your system reliable.
  2. Updating the Weights
    1. Have updates running on production devices with ease.
Support on Toradex Modules

Hardware

Supported Hardware Configuration

Family SoM Carrier Board Hailo
Aquila TI AM69
(1+2 x PCIe 3.0)
Clover
(M.2 key B+M)
Hailo-8
Hailo-8L
Aquila NXP i.MX 95
(1 x PCIe 3.0)
Clover
(M.2 key B+M)
Hailo-8
Hailo-8L
Verdin NXP i.MX 95
(1 x PCIe 3.0)
Mallow
(M.2 key B)
Hailo-8
Hailo-8L
Verdin NXP i.MX 8M Plus
(1 x PCIe 3.0)
Mallow
(M.2 key B)
Hailo-8
Hailo-8L
Verdin NXP i.MX 8M Mini
(1 x PCIe 2.0)
Mallow
(M.2 key B)
Hailo-8
Hailo-8L
Apalis NXP i.MX8
(2 x PCIe 3.0)
Ixora
(Mini PCIe)
Hailo-8R mPCIe

Software

OS Version Additional Resources
Torizon OS BSP 7 meta-hailo layer (coming soon)
Torizon OS BSP 6 runtime container (coming soon)
Torizon OS Minimal BSP 6 meta-hailo kirkstone
OpenEmbedded layer for GStreamer 1.0
tdx-reference-multimedia BSP 6 meta-hailo kirkstone
YOLOv5 example

In this example, we are going to run the demo application from Tappas:
At the end of this example, you should have an output like this. Running at 60+ FPS (depending on your camera).

Hailo Pre-Post Processing Steps

What we are going to use:

Steps:
  1. Build Torizon OS from Source
    1. Build base Torizon OS
    2. Add the dependencies
  2. Hardware setup
    1. Connect the Hailo Device
    2. Connect the Camera
    3. Install the new image
  3. Check Everything
  4. Run the Example

Build Torizon OS from Source

Complete guide

First, you may want to check your computer's RAM and disk space

Build Torizon OS base Image
We are going to use the CROPS container to build the following image:

Torizon OS Disto Machine Torizon OS Image Target Version
torizon​ verdin-imx8mp torizon-minimal 6.8.0

Setup the working directory

cd ~
mkdir ~/yocto-workdir

Run the Container (this is going to build the base image)

This is going to use a lot of RAM and take a couple of hours to finish.

The second line of the command maps the host volume to the container workdir. Note that this folder ~/yocto-workdir was created in the previous step..

docker run --rm -it --name=crops \
  -v ~/yocto-workdir:/workdir \
  --workdir=/workdir \
  -e MACHINE="verdin-imx8mp" \
  -e IMAGE="torizon-minimal" \
  -e DISTRO="torizon" \
  -e BRANCH="refs/tags/6.8.0" \
  -e MANIFEST="torizoncore/default.xml" \
  -e ACCEPT_FSL_EULA="1" \
  -e BB_NUMBER_THREADS="2" \
  -e PARALLEL_MAKE="-j 2" \ # not sure if I can pass those like this
  torizon/crops:kirkstone-6.x.y startup-tdx.sh

Add the Dependencies to the Image

To add the dependencies, first navigate to the ~/yocto-workdir/layers folder.

cd ./layers

We are going to add the following layers:

For more details about meta layers, check the full documentation.

From inside the torizon/crops:kirkstone-6.x.y container, run the bitbake add layers command.

bitbake-layers add-layer meta-hailo/meta-hailo-accelerator
bitbake-layers add-layer meta-hailo/meta-hailo-libhailort
bitbake-layers add-layer meta-hailo/meta-hailo-tappas
bitbake-layers add-layer meta-hailo/meta-hailo-vpu
bitbake-layers add-layer meta-toradex-framos
bitbake-layers add-layer meta-gstreamer1.0

Add the packages on the build-torizon/conf/local.conf file. Add these lines at the end of the file.

IMAGE_INSTALL:append = " libhailort hailortcli pyhailort libgsthailo hailo-pci hailo-firmware"
IMAGE_INSTALL:append = " gstreamer1.0 gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-bad"
IMAGE_INSTALL:append = " v4l-utils"

Build the image with the new layers.

bitbake torizon-minimal

You can find the Toradex Easy Installer compatible image under ~/yocto-workdir/build-torizon/deploy/images/verdin-imx8mp/torizon-minimal-verdin-imx8mp-Tezi_6.8.0-devel-<date>+build.0.tar.

Hardware Setup

Connect the Hailo Device

Connect the Hailo Device in the M.2 Slot of the Mallow Carrier Board.

Hailo Device Connect to Mallow Carrier Board

Connect the Camera

Connect the Camera to the MIPI-CSI connector of the Mallow Carrier Board.

MIPI-CSI connector of the Mallow Carrier Board

Install the new Torizon OS Image

Use the Toradex Easy Installer (Tezi) to flash a new image to the Device.

  1. Download the Tezi
  2. Put the device into recovery mode
  3. Install the newly built image
Toradex Easy Installer - Torizon Installation

Check the Instalation

Hailo Device

sudo su
hailocli scan
hailocli device-info

The output of those commands should detect that the device is properly connected and the drivers are working.

Display

gst-launch-1.0 videotestsrc ! videoconvert ! autovideosink

You should see some colorful patterns on the screen.

Camera Device

This step can be different depending on the camera used.

v4l2-ctl -d2 -D
v4l2-ctl --list-formats-ext -d /dev/video2

For the Framos Camera, this is the expected output.

root@verdin-imx8mp-15445736:~# v4l2-ctl --list-formats-ext -d /dev/video2
ioctl: VIDIOC_ENUM_FMT
	Type: Video Capture

	[0]: 'YUYV' (YUYV 4:2:2)
		Size: Stepwise 176x144 - 4096x3072 with step 16/8
	[1]: 'NV12' (Y/CbCr 4:2:0)
		Size: Stepwise 176x144 - 4096x3072 with step 16/8
	[2]: 'NV16' (Y/CbCr 4:2:2)
		Size: Stepwise 176x144 - 4096x3072 with step 16/8
	[3]: 'RG12' (12-bit Bayer RGRG/GBGB)
		Size: Stepwise 176x144 - 4096x3072 with step 16/8

In the demo, we are going to use the YUYV format. So keep in mind the values in there.

gst-launch-1.0 -v v4l2src device=/dev/video2 ! video/x-raw ! videoconvert ! autovideosink

Run the Example

Some cameras specify the resolution and frames per second, so adjust the values accordingly. This can be done by modifying the framerate value of the PIPELINE variable.

sudo su
cd ~/apps/detection/
./detection.sh

Done

Next step: Pair the Device to the Torizon Cloud

In a future blog post we will cover the following:

  • Use the device-metrics feature to monitor the device from the Torizon Cloud.
  • Retrain a model using Hailo environment containers.
  • Use the Torizon Remote Updates to change the version of the model running.

Why Toradex?

Toradex has over 21 years of excellence in the embedded industry, offering a wide range portfolio of system-on-modules (SoMs) and carrier boards that allow businesses to build scalable, high-performance embedded applications.

  • Quality and Reliability:
    Toradex hardware is designed to last, even under harsh industrial environments. With high-quality components and rigorous testing, Toradex ensures minimal downtime for the most critical applications.
  • Software Ecosystem:
    • Torizon OS → A Yocto-based, easy-to-use industrial Linux distribution.
    • Torizon Cloud → Secure OTA updates, device monitoring, and remote access.
    • Torizon IDE → Develop, debug, and deploy from the VS Code Extension.
  • Product Lifetime:
    Toradex is committed to 10+ years of product availability, ensuring stability. The product remains supported and available over an extended period of time.
  • Developer Resources:
    Easier development means faster deployment. Toradex provides extensive developer resources:
    • Comprehensive Documentation
    • Free support channel from The Community and from Toradex experts.
    • Development tools, such as TCB, Tezi, Torizon Containers
Author:
Allan Kamimura
, Technical Marketing Intern, Toradex

Leave a comment

Please login to leave a comment!

Get in Touch with Our Experts


Latest Blog

Tuesday, December 16, 2025
Bringing Displays to Life: Developing a MIPI DSI Panel Driver for Linux
Thursday, December 11, 2025
Accelerating Robot Prototyping with ROS 2 on Toradex Hardware:
A Technical Perspective from SiBrain - Published in Torizon website
Tuesday, July 8, 2025
定制 Linux Kernel Driver 编译示例
Have a Question?