Blog:
Deep Dive into the i.MX 95 NPU and Vision Pipeline:
Unlocking Next-Gen Edge AI Performance

The i.MX 95 applications processor from NXP Semiconductors represents a significant leap in edge AI computing, particularly for AI and machine vision applications. In a recent NXP-hosted webinar with Toradex, experts Manish Bajaj (System Architect Engineer, NXP) and Mohamed Raad (Hardware Product Manager, Toradex) provided an in-depth exploration of the i.MX 95’s Neural Processing Unit (NPU) and vision pipeline, highlighting its architectural innovations, real-world performance advantages, and software ecosystem.
This blog post distills the key insights from the webinar, offering a comprehensive look at how the NXP i.MX 95 is engineered to meet the demands of Industrial, Medical IoT, and Edge AI computing applications
The i.MX 95 is the latest addition to NXP’s i.MX 9 series, designed to deliver high-performance AI inference, advanced imaging, and real-time processing at the edge. Unlike traditional processors that rely solely on CPU/GPU compute, the i.MX 95 integrates dedicated accelerators to optimize AI workloads while maintaining power efficiency.
- 6x Arm Cortex-A55 cores (up to 2.0 GHz) – Balanced performance and efficiency for general compute tasks.
- 1x Cortex-M33 (333MHz) and 1x Cortex-M7 (800MHz) – Handle real-time and safety critical operations.
- ARM® Mali GPU – Delivers 30% better graphics performance than previous i.MX generations, enabling advanced HMI and vision processing.
- NXP’s Neutron NPU (2.0 TOPS) – A custom-designed AI accelerator optimized for low-latency, high-efficiency inference.
- Integrated ISP (Image Signal Processor) – Supports 12MP @ 45 FPS, 20-bit HDR processing, and RGB-IR sensor fusion.
- Multi-camera support – Up to 8 virtual MIPI-CSI channels for advanced vision applications.
While many AI accelerators are marketed based on Tera Operations Per Second (TOPS), NXP’s Neutron NPU is designed with real-world efficiency in mind.
- Memory bottlenecks – Many NPUs suffer from high DDR bandwidth usage, reducing real-world performance.
- Inefficient data movement – Frequent memory transfers increase latency and power consumption.
- Model compression trade-offs – Some NPUs rely on aggressive pruning/sparsification, sacrificing accuracy.
- Data reuse architecture – Minimizes memory access by keeping weights and activations on-chip.
- Lossless weight compression – Reduces model size without accuracy loss.
- Zero-copy execution – Seamless handoff between CPU and NPU, eliminating unnecessary data copies.
- 3x faster than i.MX 8M Plus – Despite similar TOPS ratings, real-world benchmarks (MobileNetV1, object detection) show significant performance gains.
The NXP i.MX 95’s vision pipeline is optimized for low-latency, high-throughput processing, making it ideal for automotive ADAS, industrial machine vision, and smart cameras.
- MIPI-CSI & ISP Processing
- Supports dual MIPI-CSI with 8 virtual channels.
- NXP’s proprietary ISP enables HDR fusion, RGB-IR separation, and advanced noise reduction.
- Capable of 4K@60fps encoding/decoding.
- 2D/3D GPU Acceleration
- ARM® Mali 3D GPU with support for OpenGL® ES 3.2, Vulkan® 1.2, OpenCL™ 3.0 handles OpenGL/OpenCL-accelerated pre-processing such as scaling and color conversion.
- 2D GPU optimized for dewarping, lens correction, and display composition.
- AI Inferencing with the Neutron NPU
- Seamless integration with TensorFlow Lite, ONNX, and PyTorch via NXP’s EIQ Toolkit.
- Supports NNStreamer for building end-to-end vision AI pipelines.
NXP provides a comprehensive software toolkit to help developers optimize and deploy AI models efficiently.
- EIQ Toolkit – Converts models from TensorFlow/PyTorch/ONNX → TF Lite for NPU deployment.
- Neutron Converter – Enables offline and on-device model compilation.
- OpenCV & OpenVX Acceleration – Hardware-optimized libraries for vision processing.
- Camera Tuning Tools – Support for pre-tuned camera modules and custom ISP configurations.
The i.MX 95 is not just another AI-enabled processor—it’s a holistic solution for high-performance, power-efficient edge AI computing. By combining:
- A custom NPU optimized for real-world efficiency
- A versatile vision pipeline with ISP and GPU acceleration, and
- A mature software ecosystem
NXP has created a platform that reduces development time while maximizing AI performance.
For engineers working on autonomous systems, industrial automation, or smart cameras, the NXP i.MX 95 offers a compelling blend of performance, efficiency, and ease of deployment.
- Watch the full webinar on NXP’s website
- Explore the EIQ Toolkit for AI model optimization
- Get hands-on with the i.MX 95 today
What are your thoughts on the future of Edge AI processors? Let us know in the comments!
Want to discuss your project or explore a potential collaboration? Reach out to us.
Mohamed Raad, Hardware Product Manager, Toradex







