Main Content

Running Convolution-Only Networks by using FPGA Deployment

To understand and debug convolutional networks, running and visualizing data is a useful tool.This example shows how to deploy, run, and debug a convolution-only network by using FPGA deployment.


  • Xilinx Zynq ZCU102 Evaluation Kit

  • Deep Learning HDL Toolbox™ Support Package for Xilinx FPGA and SoC

  • Deep Learning Toolbox™

  • Deep Learning HDL Toolbox™

  • Deep Learning Toolbox™ Model for Resnet-50 Network

Resnet-50 Network

ResNet-50 is a convolutional neural network that is 50 layers deep. This pretrained network can classify images into 1000 object categories (such as keyboard, mouse, pencil, and more).The network has learned rich feature representations for a wide range of images. The network has an image input size of 224-by-224.

Load Resnet-50 Network

Load the ResNet-50 network.

rnet = resnet50;

To visualize the structure of the Resnet-50 network, at the MATLAB command prompt, enter:


Create Subset of Resnet-50 Network

To examine the outputs of the max_pooling2d_1 layer, create this network which is a subset of the ResNet-50 network:

layers = rnet.Layers(1:5);
outLayer = regressionLayer('Name','output');
layers(end+1) = outLayer;

snet = assembleNetwork(layers);

Create Target Object

Create a target object with a custom name and an interface to connect your target device to the host computer. Interface options are JTAG and Ethernet. To use JTAG, install Xilinx™ Vivado™ Design Suite 2019.2. To set the Xilinx Vivado toolpath, enter:

%hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'D:/share/apps/HDLTools/Vivado/2019.2-mw-0/Win/Vivado/2019.2\bin\vivado.bat');
hTarget = dlhdl.Target('Xilinx','Interface','Ethernet');

Create Workflow Object

Create an object of the dlhdl.Workflow class. When you create the object, specify the network and the bitstream name. Specify the saved pretrained ResNet-50 subset network, snet, as the network. Make sure that the bitstream name matches the data type and the FPGA board that you are targeting. In this example the target FPGA board is the Xilinx ZCU102 SOC board. The bitstream uses a single data type.

hW = dlhdl.Workflow('network', snet, 'Bitstream', 'zcu102_single','Target',hTarget);

Compile Modified Resnet-50 Series Network

To compile the modified ResNet-50 series network, run the compile function of the dlhdl.Workflow object.


dn = hW.compile
### Optimizing series network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer'
          offset_name          offset_address    allocated_space 
    _______________________    ______________    ________________

    "InputDataOffset"           "0x00000000"     "24.0 MB"       
    "OutputResultOffset"        "0x01800000"     "24.0 MB"       
    "SystemBufferOffset"        "0x03000000"     "28.0 MB"       
    "InstructionDataOffset"     "0x04c00000"     "4.0 MB"        
    "ConvWeightDataOffset"      "0x05000000"     "4.0 MB"        
    "EndOffset"                 "0x05400000"     "Total: 84.0 MB"
dn = struct with fields:
       Operators: [1×1 struct]
    LayerConfigs: [1×1 struct]
      NetConfigs: [1×1 struct]

Program Bitstream onto FPGA and Download Network Weights

To deploy the network on the Xilinx ZCU102 hardware, run the deploy function of the dlhdl.Workflow object. This function uses the output of the compile function to program the FPGA board by using the programming file. It also downloads the network weights and biases. The deploy function programs the FPGA device, displays progress messages, and the time it takes to deploy the network.

### FPGA bitstream programming has been skipped as the same bitstream is already loaded on the target FPGA.
### Deep learning network programming has been skipped as the same network is already loaded on the target FPGA.

Load Example Image

Load and display an image to use as an input image to the series network.

I = imread('daisy.jpg');

Run the Prediction

Execute the predict function of the dlhdl.Workflow object.

[P, speed] = hW.predict(single(I),'Profile','on');
### Finished writing input activations.
### Running single input activations.
              Deep Learning Processor Profiler Performance Results

                   LastLayerLatency(cycles)   LastLayerLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                    2813005                  0.01279                       1            2813015             78.2
    conv_module            2813005                  0.01279 
        conv1              2224168                  0.01011 
        max_pooling2d_1     588864                  0.00268 
 * The clock frequency of the DL processor is: 220MHz

The result data is returned as a 3-D array, with the third dimension indexing across the 64 feature images.

sz = size(P)
sz = 1×3

    56    56    64

To visualize all 64 features in a single image, the data is reshaped into 4 dimensions, which is appropriate input to the imtile function

R = reshape(P, [sz(1) sz(2) 1 sz(3)]);
sz = size(R)
sz = 1×4

    56    56     1    64

The input to imtile is normalized using mat2gray. All values are scaled so that the minimum activation is 0 and the maximum activation is 1.

J = imtile(mat2gray(R), 'GridSize', [8 8]);

To show these activations by using the imtile function, reshape the array to 4-D. The third dimension in the input to imtile represents the image color. Set the third dimension to size 1 because the activations do not have color. The fourth dimension indexes the channel. A gride size of 8x8 is selected because there are 64 features to display.


Bright features indicate a strong activation. To understand and debug convolutional networks, running and visualizing data is a useful tool.