Main Content

Debug YOLO v2 Vehicle Detector on FPGA

This example shows how to debug hardware by visualizing signals from a vehicle detector design deployed on the Xilinx® Zynq® UltraScale+™ MPSoC ZCU102 board. You use FPGA data capture and AXI manager features of HDL Verifier™ support package for Xilinx FPGA Boards software to set triggers and capture the signals of interest. The Deploy and Verify YOLO v2 Vehicle Detector on FPGA example shows how to deploy a vehicle detector design on an FPGA. In this example, you integrate FPGA data capture and AXI manager features into this design to debug and visualize its functionality.

Introduction

Debugging designs, especially those deployed to the FPGA, can be a difficult task without a proper set of tools. FPGA data capture and AXI manager offer many capabilities to easily debug designs deployed to an FPGA. In this example, you focus on the Preprocessing module of the design. You analyze several scenarios where proper debugging is required to ensure the application behaves correctly. The scenarios are:

  • Handshaking between the Preprocessing DUT and deep learning (DL) IP core. This scenario shows how to use FPGA data capture and AXI manager features to visualize the handshaking events between the Preprocessing DUT and the DL IP in the Logic Analyzer (DSP System Toolbox). You use FPGA data capture to tap the handshaking signals between the Preprocessing DUT and the DL IP from the FPGA.

  • Functionality of the Resize Subsystem. This scenario shows how to add debug hooks to the model and use them for debugging and verification.

  • Handshaking between the Preprocessing DUT and DDR memory. This scenario shows how to visualize the handshaking events between the Preprocessing DUT and the DDR memory in the Logic Analyzer. You use FPGA data capture to tap the handshaking signals between the Preprocessing DUT and the DDR memory from the FPGA.

Add Debug Hooks and Test Points in Model

To capture signal data using FPGA data capture, configure the signal as a test point. For more information, see Configure Signals as Test Points (Simulink). Configure all the signals described in this section as test points. Use the Bus Selector (Simulink) block to extract signals from a bus and then add test points. To calculate the valid pixel flow through the Resize subsystem, add debugging logic using counters within the YOLOv2PreprocessAlgorithm model. Use the helperConfigAndAddTestPoints function to automate the process of adding the counters and test points to the YOLOv2PreprocessAlgorithm and DLHandshakeLogicExtMem models. The helperConfigAndAddTestPoints function creates the four models, which are YOLOv2PreprocessTbDebug, YOLOv2PreprocessDUTDebug, YOLOv2PreprocessAlgoDebug, and DLHandshakeLogicDebug. These four models contain all the required testpoints and debug hooks.

This figure shows the signals that are configured as test points in the YOLOv2PreprocessAlgoDebug model.

This figure shows the signals that are configured as test points in the DLHandshakeLogicDebug model.

Use the Simulink.BlockDiagram.arrangeSystem (Simulink) function to improve the layout of the model.

Integrate FPGA Data Capture and AXI Manager in HDL Workflow Advisor

To generate IP core files for a DL processor, follow the steps in the Configure Deep Learning Processor and Generate IP Core section of the Deploy and Verify YOLO v2 Vehicle Detector on FPGA example. Use the helperUpdateHDLWorkflowAdvisor function to automate the process of configuring the HDL workflow advisor settings and generate the bitstream. You must provide the complete path to the DL IP core files. Set the buffer size for FPGA data capture IP to 16384 and the maximum sequence depth to 7.

pathToDLIPFiles = 'F:\dlhdl_prj\ipcore\dlprocessor_v1_0';
modelWithTestPoints = {'YOLOv2PreprocessTbDebug','YOLOv2PreprocessDUTDebug','YOLOv2PreprocessAlgoDebug','DLHandshakeLogicDebug'};
helperUpdateHDLWorkflowAdvisor(pathToDLIPFiles,modelWithTestPoints,'16384','7')

Follow these steps to perform this task manually.

  1. Start the targeting workflow by right-clicking the YOLO v2 Preprocess DUT Subsystem subsystem in the YOLOv2PreprocessTbDebug model and selecting HDL Code > HDL Workflow Advisor.

  2. In step 1.1, select IP Core Generation and set Target platform to Xilinx Zynq Ultrascale+ MPSoC ZCU102 Evaluation Kit.

  3. In step 1.2, set Reference design to Deep Learning with Preprocessing Interface. The DL Processor IP name and the DL Processor IP location fields specify the name and location of the generated deep learning processor IP core, respectively. These details are fetched from the IP core report. Set Insert AXI manager to JTAG.

  4. In step 1.3, enable the Enable HDL DUT output port generation for test points setting to update the interface table with all the test points as output ports for the generated DUT. Map the target platform interfaces to the input and output ports of the DUT. For the required interface mapping, see step 1.3 in Generate and Deploy Bitstream to FPGA section of the Deploy and Verify YOLO v2 Vehicle Detector on FPGA example. This table shows the interface mapping for test points. To capture and visualize the trigger signals in the Logic Analyzer, map the trigger signals to Trigger and Data instead of Trigger. For more information, see Use As (HDL Verifier Support Package for Xilinx FPGA Boards).

  • Perform steps 1.4 to 3.1 as shown in the Generate and Deploy Bitstream to FPGA section of the Deploy and Verify YOLO v2 Vehicle Detector on FPGA example.

  • In step 3.2, set FPGA data capture buffer size to 16384 and FPGA data capture maximum sequence depth to 7. Select Include capture condition logic in FPGA data capture to enable the capture control logic option in the generated FPGA data capture component.

  • In step 4.3, generate the bitstream. The HDL Workflow Advisor generates the block_design_wrapper.bit bitstream file in the hdl_prj\vivado_ip_prj\vivado_prj.runs\impl_1 folder.

Handshaking Between Preprocessing DUT and Deep Learning IP Core

The DL IP core expects the preprocessed data to be at a specific address in the DDR memory and to have a specific size. The handshaking between the Preprocessing DUT and the DL IP core is to convey the expected address and size to the Preprocessing DUT. The handshaking comprises these steps:

  1. The Preprocessing DUT drives the rd_addr , rd_len, and rd_avalid control signals in the AXIReadCtrlOutDL bus.

  2. The DL IP core samples these control signals and responds to the Preprocessing DUT by sending the data at the rd_addr location through the AXIReadDataDL signal. The DL IP core also drives the corresponding control signals, rd_dvalid and rd_aready, in the AXIReadCtrlInDL bus.

  3. This process continues for three different addresses corresponding to InputValid (x"354"), InputAddr (x"358"), and InputSize (x"35C") signals. The IP core generation report for the DL IP contains the addresses for these registers.

Signals Required for Debugging

The DLHandshakeLogicExtMem model contains these signals.

  • rd_addr --- Address location in the DL IP from which the Preprocessing DUT fetches the required information during handshaking.

  • rd_len --- Size of data, in bytes, to read from the DL IP starting from the rd_addr address location.

  • rd_avalid --- Indication of whether the data in the rd_addr and rd_len signals of the same bus is valid.

  • Data_From_DL --- Information based on the control information the DL IP receives from the Preprocessing DUT in the AXIReadCtrlOutDL bus. The DL IP sends appropriate information on this signal.

  • rd_dvalid --- Control signal that forms part of the AXIReadCtrlInDL bus. This signal validates the data in the AXIReadDataDL signal.

  • inputAddr_from_DL --- Output of the Read DL Registers subsystem. The Preprocessing DUT places the preprocessed data in the DDR memory at this address.

  • inputSize_from_DL --- Output of the Read DL Registers subsystem. This output is the size of the data that the Preprocessing DUT places in the DDR memory.

  • inputValid_from_DL --- Output of the Read DL Registers subsystem. This signal validates the data in the inputAddr_from_DL and inputSize_from_DL signals.

Timing Diagram

This timing diagram shows the sequence of events for this scenario.

Trigger Conditions in FPGA Data Capture

A successful handshaking between the Preprocessing DUT and DL IP comprises seven events. These events act as sequential triggers in the FPGA Data Capture tool to capture the data.

Configure these settings in the FPGA Data Capture tool:

  • Set Number of capture windows to 1 to indicate that handshaking events happen only at the beginning of preprocessing. The signal data corresponding to the entire sample depth can be captured in a single window once these trigger conditions are satisfied.

  • Set Number of trigger stages to 7 to indicate that the handshaking comprises seven events.

  • Set Trigger Position to a small value close to zero. If you set this option to 0, you cannot visualize these events because the tool captures signal data only after this trigger.

  • Repeat the Trigger Stage 1 and Trigger Stage 2 sequences three times.

  • Use a trigger time out to ensure that Trigger Stage 7 happens within one clock cycle of Trigger Stage 6. Trigger Stage 7 corresponds to a rising edge on the inpValid_from_DL signal

  • Set Capture mode to On Trigger.

Visualize Captured Data in Logic Analyzer

This timing diagram shows that the handshaking between the Preprocessing DUT and the DL IP behaves as expected.

Functionality of Resize Subsystem

In this scenario, the focus is to verify the behavior of the Resize subsystem. The input image to the Resize subsystem is of size 224-by-340 (76,160 pixels). The output image of the Resize subsystem is of size 128-by-128 (16,384 pixels). You can use FPGA data capture feature to count the total number of output pixels from the Resize subsystem and capture the resized image data to find any errors within the logic. Simulink™ does not support renaming of the output of a Bus Selector block. To rename the signal, use the model components contained in the green boxes in this image.

Signals Required for Debugging

The YOLOv2PreprocessAlgoDebug model contains these signals.

  • Input_Pix_Valid --- Control signal that is a part of the pixelcontrol bus input of the Resize subsystem. This signal validates the pixel data in the Inp_Pixel_Data signal.

  • Input_Pix_Cnt --- Output of the HDL Counter block, which counts the number of valid pixels that you pass as input to the Resize subsystem. The model uses the Input_Pix_Valid signal to enable this counter.

  • Resized_Pix_Data --- Output signal of the Resize subsystem. This signal contains the pixel data corresponding to the resized image.

  • Resized_Pix_Valid --- Control signal that is a part of the pixelcontrol bus output of the Resize subsystem. This signal validates the pixel data in the Resized_Pix_Data signal.

  • Resized_Pix_Cnt --- Output of the HDL Counter block, which counts the number of valid pixels returned by the Resize subsystem. The model uses the Resized_Pix_Valid signal to enable this counter.

Timing Diagram

Validate the output pixel data using the Resized_Pix_Valid signal. Whenever this signal goes high, the Resize subsystem sends the valid output data, as this timing diagram shows. The Input_Pix_Cnt and Resized_Pix_Cnt signals indicate the number of valid pixels entering and emerging from the Resize subsystem, respectively.

Trigger Conditions in FPGA Data Capture

To capture the valid resized pixel data, use the capture condition logic in the FPGA Data Capture tool.

Configure these settings in the FPGA Data Capture tool:

  • Select Enable the capture control logic in the Capture Condition tab.

  • Use the Resized_Pix_Valid signal in the capture condition logic to ensure that the tool captures the data only when this signal goes high.

  • Select Immediately in the capture mode dropdown menu to enable immediate capture. This option is suitable for scenarios in which no specific triggers determine when the tool captures data.

Visualize Captured Data in Logic Analyzer

This timing diagram shows the resized pixel data and the pixel counts captured by the FPGA Data Capture tool. The tp_Resized_Pix_Valid signal is always high, unlike in the equivalent model simulations using Simulink software. This discrepancy is because the capture condition indicates that the FPGA Data Capture tool captures data only when tp_Resized_Pix_Valid is high.

The FPGA Data Capture tool creates the dataCaptureOut structure in the MATLAB workspace after it captures data. Visualize the resized image by extracting and concatenating the RGB image data from dataCaptureOut.

RData = reshape(dataCaptureOut.tp_Resized_Pix_Data_0,128,128);
BData = reshape(dataCaptureOut.tp_Resized_Pix_Data_2,128,128);
GData = reshape(dataCaptureOut.tp_Resized_Pix_Data_1,128,128);
resizedImage = cat(3,RData',GData',BData');
imshow(resizedImage)

Scenario 3: Handshaking Between Preprocessing DUT and DDR Memory

After the Preprocessing DUT resizes and normalizes the input image, it places the preprocessed image data in the DDR memory at the address it receives from the DL IP. The handshaking process comprises these steps:

  1. The Preprocessing DUT drives the wr_addr , wr_len, and wr_valid control signals in the AXIWriteCtrlOutDDR bus. The DUT also sends the preprocessed signal data through the AXIWriteDataDDR signal.

  2. The DDR memory samples these control signals and the preprocessed pixel data received from the Preprocessing DUT.

  3. Once all the data is placed in the DDR memory, the DDR memory acknowledges the Preprocessing DUT with a pulse on the wr_complete signal in the AXIWriteCtrlInDDR bus.

Signals Required for Debugging

The DLHandshakeLogicDebug model contains these signals.

  • wr_addr --- Control signal that is a part of the AXIWriteDataDDR bus. This signal is the address in the DDR memory at which the Preprocessing DUT places the data.

  • wr_len --- Control signal that is a part of the AXIWriteDataDDR bus. This signal is the size of data, in bytes, that the Preprocessing DUT places in the DDR memory starting from the wr_addr address location.

  • wr_valid --- Control signal that is a part of the AXIWriteDataDDR bus. This signal validates the data in the wr_addr, and wr_len signals of the same bus.

  • wr_complete --- Control signal that is a part of the AXIWriteCtrlInDDR bus. This signal is the acknowledgement sent from the DDR memory to the Preprocessing DUT containing an indication of the status of the data.

  • writeDone --- Output of the Write To DDR subsystem. This signal indicates whether the data transfer to the DDR memory is successful and triggers the DL IP to start reading that data from the DDR memory for further processing.

Timing Diagram

After the final rising edge on the wr_valid control signal occurs, the DDR memory sends a pulse on the wr_complete signal as an acknowledgement and a pulse sent on the writeDone internal signal. This timing diagram shows the sequence of events for this scenario.

Trigger Conditions in FPGA Data Capture

Configure these settings in the FPGA Data Capture tool:

  • Set Number of capture windows to 1 because these handshaking events happen towards the end of the transaction between Preprocessing DUT and the DDR memory. After these trigger conditions are satisfied, the signal data corresponding to the entire sample depth can be captured in a single window.

  • Set Number of trigger stages to 2 because this handshaking event comprises three events, of which two events occur simultaneously.

  • Set Trigger position option close to the end of the handshake to ensure the Logic Analyzer displays the complete handshake.

  • Set Capture mode to On Trigger.

The Trigger Stage 1 corresponds to a rising edge on wr_valid signal from the DDR memory.

The Trigger Condition 2 section captures an expected pulse on the wr_complete and writeDone signals. This stage uses logical and comparison operators.

Visualize Captured Data in Logic Analyzer

This timing diagram confirms that the handshaking between Preprocessing DUT and DDR memory happens as expected.

Use FPGA Data Capture and AXI Manager Features Simultaneously

As described in Design Considerations for Data Capture (HDL Verifier Support Package for Xilinx FPGA Boards), to use AXI manager and FPGA data capture features simultaneously, set the capture mode of FPGA data capture to nonblocking. Create an FPGADataCapture object in non-blocking mode and launch the FPGA Data Capture tool.

cd(fullfile('hdl_prj','ipcore','YOLOV2Pre_cs_ipv4_v1_0','fpga_data_capture'))
fpgadc = FPGADataCapture;
fpgadc.CaptureMode = 'nonblocking';
launchApp(fpgadc);

You must configure a few registers before sending a video frame as an input to the model. Set the DUTProcStart register of the Preprocessing DUT to 1. AXI manager can be leveraged to do this task. The YOLOv2DeployAndVerifyDetector function that is attached with the Deploy and Verify YOLO v2 Vehicle Detector on FPGA example has all the steps present in Verify Deployed YOLO v2 Vehicle Detector Using MATLAB section. The YOLOv2DeployAndVerifyDetector function uses writePort function to configure all the control registers. To use the AXI manager instead of writePort to configure the DUTProcStart register, use the helperUpdateYOLOv2DeployAndVerifyDetector function.

The helperUpdateYOLOv2DeployAndVerifyDetector function creates the DebugYOLOv2VehicleDetector function which is a modified version of the YOLOv2DeployAndVerifyDetector function and contains an object of the AXI manager. The helperUpdateYOLOv2DeployAndVerifyDetector function adds this code to the DebugYOLOv2VehicleDetector function, which you can use to access AXI manager feature.

Create an AXI manager object.

h = aximanager('Xilinx');

Use writememory function to write 1 into the DUTProcStart register. The address for this register can be found in the IP Core Generation report.

writememory(h, '0xA0040100',1);

Release the JTAG cable resource after writing into the DUTProcStart register to ensure that FPGA data capture can use the same JTAG interface to capture the data.

release(h)

To capture the required data corresponding to different scenarios, the FPGA Data Capture tool with the appropriate trigger conditions. This diagram shows the data capture process:

  1. Configure the FPGA Data Capture tool with the trigger conditions and then click the Capture Data button to start the data capture process. The tool captures the data when it observes triggers.

  2. Enter the command DebugYOLOv2VehicleDetector(hSOC) to start the workflow comprising all the steps from configuring the registers to reading back the processed data to MATLAB. Because you start the FPGA Data Capture tool before this step, the FPGA Data Capture tool detects all the events.

The AXI manager configures the DUTProcStart control register while the FPGA Data Capture tool waits for the trigger condition to be satisfied. You can simultaneously use both of these tools to capture all the required data.

Conclusions

In summary, this example shows how to instrument a Simulink model with debug hooks to allow visibility of signals after deploying your design to an FPGA or SoC board. You use AXI manager to configure the control registers in the deployed design from MATLAB and then specify the triggers in the FPGA Data Capture tool for capturing the signals of interest. You analyze the captured data and use the results to debug your application.

See Also

Related Topics