Deploy YOLOX Object Detection and Tracking System on Qualcomm Hexagon NPU

Since R2026a

This example uses:

This example demonstrates how to deploy a YOLOX-based real-time object detection and tracking system on Qualcomm® Hexagon® NPU using the Qualcomm AI Engine Direct (QNN) SDK. You configure the target, generate code, and run processor-in-the-loop (PIL) execution for inference and tracking. This example uses a Qualcomm Snapdragon® SoC with Hexagon NPU as the target hardware.

Required Hardware

Snapdragon X Elite Mobile (SM8750)
A webcam connected to the host PC

Object Detection and Tracking Application Overview

In this example, you deploy an object detection and tracking MATLAB® application to the CPU and NPU resources on the Snapdragon platform.

Host PC: The host PC captures live frames from the webcam, preprocesses them to match the YOLOX input size, and transfers the formatted data to the Snapdragon target executable using the PIL interface over ADB. After the target completes inference, post-processing, and tracking, the host PC receives the results, overlays bounding boxes, labels, track IDs, and stage-level FPS information, and displays the output in real time. The PIL framework automatically handles all communication and data transfer between the host PC and the target.
Application Processor (AP): The AP handles pre-processing, object tracking, and post-processing tasks such as non-maximum suppression (NMS) and image scaling. The AP optimizes these operations using ARM Cortex-A CMSIS CRL and Neon v7 single instruction multiple data (SIMD) instructions.
Hexagon NPU (HTP): The Hexagon HTP/NPU executes the deep neural network inference.

The webcam connected to the host machine captures the video frames and resizes the frames to 640×640×3 for the YOLOX model. The PIL object uses the resized frames to detect and track objects.

The PIL executes on the target and calls the NPU using the qnn.HTP system object to obtain raw detections. To obtain valid detections that contain bounding boxes, class indices, and labels, the PIL application cleans up the raw detections followed by post-processing using non-maximum suppression (NMS). The PIL application scales the outputs back to the original image aspect ratio. Finally, the GNNTracker tracks the detections, and returns the results as output of the PIL object.

The host machine overlays the tracked detections on the original input image and updates them for every frame. The host measures the execution time at each stage and displays frames per second (FPS). The host analyzes three key parameters - inference FPS (fpsI), detection FPS (fpsID), and tracking FPS (fpsIDT) - to represent end-to-end system efficiency.

Set Up Environment and Initialize Model Metadata

Before you configure the PIL execution path or invoke the NPU, prepare the MATLAB environment. Load the required resources and initialize the model metadata. Add the helper folders for preprocessing, post-processing, and model utilities to the MATLAB path.

addpath(fullfile(pwd,"helperFunctions"));

The model metadata structure initializes the webcam settings, preprocessing parameters, and YOLOX class labels required for the application.

% YOLOX Model Metadata
info = struct;
info.TargetSize    = [640 640];
info.PadValue      = 0;
info.ClassNames    = yoloxClassNames();
info.Scale         = 0.3333;
info.Threshold     = 0.25;
info.NMSThreshold  = 0.50;
info.MultiClassNMS = true;
info.MinSize       = 1;
info.MaxSize       = 2000;
clear yoloxInference   % Clears persistent state of inference function

Make sure that the YOLOX network is in QNN format, which includes a shared library (.dll) for the host and a context binary (.bin) for the target. This example provides both files. To learn more about converting models to QNN format, refer to the example: Deploy Deep Neural Networks to Qualcomm Targets Using Qualcomm AI Engine Direct (QNN).

qnnHostModel = fullfile('model_libs','x64','qnn_yolo_onnx_smallCoco.dll');
qnnTargetContextBinary = fullfile('model_libs','qnn_yolo_onnx_smallCoco.bin');

Select Target and Configure Coder Object

Configure the coder settings to target the CPU host and enable the required acceleration libraries for CPU-side code. This maps the generated C/C++ efficiently from high-level MATLAB operations to low-level, hardware-optimized instructions on the Qualcomm Snapdragon platform.

Identify the target Android device used for PIL execution.

deviceID = '52ed31ad'; % Unique Android Device ID (USB)

Create a code generation configuration object for the target board. Set the verification mode to PIL.

cfg = coder.config('lib','ecoder',true);
cfg.VerificationMode = "PIL";

Specify the target hardware board as Qualcomm Android Board. The target board communicates with MATLAB using a USB-based Android debug bridge (ADB). Set the ADBInterface (In monospace format) as USB. To establish communication between MATLAB and the target board, assign the deviceID.

cfg.Hardware = coder.Hardware('Qualcomm Android Board');
cfg.Hardware.ADBInterface = 'USB';
cfg.Hardware.ADBDeviceID = deviceID;

To achieve optimal runtime performance, combine these two complementary acceleration mechanisms:

ARM Cortex-A CMSIS CRL replaces standard C library calls with highly efficient, low-level routines and hand-optimizes them for the CPU.
Neon v7 Instruction Set Extensions (ISE) enable SIMD parallelism, allowing the CPU to execute multiple floating-point operations per cycle.

These settings maximize both scalar and vector-level performance and significantly improve the speed of CPU-side tasks such as NMS, Kalman filtering, and coordinate transformations in the tracking pipeline. FMA support improves floating-point throughput by combining multiplication and addition into a single instruction.

cfg.CodeReplacementLibrary = 'ARM Cortex-A CMSIS';
cfg.OptimizeReductions = true;
cfg.InstructionSetExtensions = "Neon v7";
cfg.InstructionSetExtensionsConfig.FMA = true;

To evaluate the performance, enable profiling and reporting options. Use the configuration flag cfg.GenerateCodeReplacementReport to confirm that the compiler applies ARM CMSIS CRL functions during code generation.

cfg.GenCodeOnly = false;
cfg.GenerateReport = true;
cfg.GenerateCodeReplacementReport = true;

The codegen command compiles the YOLOX tracking function into a PIL-compatible executable, links it with the QNN runtime, and deploys it to the target board over the USB connection. Once generated, the application runs on the target board for PIL testing.

codegen -config cfg yoloxTrack.m -args {coder.Constant(qnnHostModel), coder.Constant(qnnTargetContextBinary), zeros(640,640,3,'uint8'), coder.Constant(info), 0}

Generate and Deploy PIL Code to Target

After you compile and deploy the application to the Qualcomm target device, the system performs real-time object detection and tracking by continuously exchanging data between the MATLAB host PC and the target over the USB-based PIL connection. The MATLAB host PC captures video, preprocesses frames, and handles visualization. The Hexagon NPU runs deep learning inference and tracking using the compiled QNN model.

Initialize Webcam and Video Player

Initialize the webcam and video player on the host PC to capture live video frames and display the processed output with object overlays.

wCam = webcam(1);
wCam.Resolution = '1920x1080'; % Use native camera resolution
vidPlayer = vision.VideoPlayer;

Before you start the execution, clear any existing instances of the PIL-generated functions. This action ensures that the target device starts in a clean state, without residual data or persistent variables from previous runs.

clear yoloxTrack
clear yoloxGNNTrack

Make the video player window visible and initialize key runtime parameters.

show(vidPlayer);
fpsPIL = 0;
timeStamp = 0;

Initialize PIL Function on Target

Initialize the PIL function on the target by calling it once with a dummy frame. This step sets up the inference engine and QNN runtime environment, loads the compiled QNN shared object (.dll), and allocates communication buffers for data exchange between MATLAB and the target.

yoloxTrack_pil(qnnHostModel, qnnTargetContextBinary, zeros(640,640,3,'uint8'), info, 0);
disp('### Starting ADAS Application on Target Hardware...');

Main Tracking Loop

The system enters the main tracking loop to continuously process frames. In each iteration, capture a frame from the webcam, resize it to the model’s input resolution (640×640), and send it to the target device for inference. The Qualcomm Hexagon NPU runs deep learning and tracking using the compiled QNN model, while MATLAB measures end-to-end latency and updates the real-time FPS display.

Each frame follows this pipeline: acquisition, preprocessing, inference and tracking, and postprocessing with visualization. During postprocessing, overlay bounding boxes, class labels, and performance metrics on the output frame for real-time display.

while isOpen(vidPlayer)
    % Read the next frame from the video (RAW frame)
    rawFrame = snapshot(wCam);

    % Resize the raw frame to the model's target resolution (640x640)
    resizedFrame = customImageResize(rawFrame, info);
    timeStamp = timeStamp + 1;
    % --- Inference and Tracking on Target ---
    tic;
    [tracks, fpsI, fpsID, fpsIDT] = yoloxTrack_pil(qnnHostModel, qnnTargetContextBinary, resizedFrame, info, timeStamp);
    newt = toc;
    fpsPIL = .9*fpsPIL + .1*(1/newt); % Update the host-side FPS
    % Display tracking results on the frame
    frameOverlay = insertTracksToFrame(rawFrame, tracks, info.ClassNames);

    % Insert the FPS text onto the frame
    frameOverlay = insertText(frameOverlay, [1 1], ...
        "Inference FPS = " + num2str(fpsI) + ", time (s)= " + num2str(1/fpsI), ...
        "FontSize", 20, "FontColor", "white", "TextBoxColor", "black");

    frameOverlay = insertText(frameOverlay, [1 55], ...
        "Detection FPS = " + num2str(fpsID) + ", time (s)= " + num2str((1/fpsID) -(1/fpsI)), ...
        "FontSize", 20, "FontColor", "white", "TextBoxColor", "black");

    frameOverlay = insertText(frameOverlay, [1 110], ...
        "Tracking FPS = " + num2str(fpsIDT) + ", time (s)= " + num2str((1/fpsIDT) -(1/fpsID)), ...
        "FontSize", 20, "FontColor", "white", "TextBoxColor", "black");

    frameOverlay = insertText(frameOverlay, [1 165], ...
        "PIL Demo FPS = " + num2str(fpsPIL) + ", time (s)= " + num2str((1/fpsPIL) -(1/fpsIDT)), ...
        "FontSize", 20, "FontColor", "white", "TextBoxColor", "black");

    % Display the annotated frame in the video player
    step(vidPlayer, frameOverlay);
end

The while loop continues until you close the video player window. This maintains continuous processing of incoming frames and keeps MATLAB (host) synchronized with the Qualcomm target (PIL executable).

When the loop exits, release all system resources. Terminate the webcam connection, stop video playback, and clear any persistent functions associated with the target application.

release(vidPlayer)
clear wCam