Deploy YAMNet Networks to FPGAs with and without Cross-Layer Equalization
This example shows how to deploy a YAMNet network with and without cross-layer equalization to an FPGA. Cross -layer equalization can improve quantized network performance by reducing the variance of the network learnable parameters in the channels while maintaining the original network mapping. You can compare the accuracies of the network with and without cross-layer equalization.
Prerequisites
Xilinx® Zynq® Ultrascale+™ ZCU102 SoC development kit
Deep Learning HDL Toolbox™ Support Package for Xilinx FPGA and SoC
Audio Toolbox™
Deep Learning Toolbox™
Deep Learning HDL Toolbox™
Deep Learning Toolbox Model Quantization Library Support Package
GPU Coder™
GPU Coder Interface for Deep Learning Libraries
Load Pretrained YAMNet Network and Download Data
The YAMNet sound classification network is trained on the AudioSet data set to predict audio events from the AudioSet ontology. This example classifies sounds from air compressor. The AudioSet data set includes recordings from air compressors[1].
The data set is classified into one healthy state and seven faulty states, for a total of eight classes.
To download and load the pretrained YAMNet network and a set of air compressor sounds, run these commands.
url = 'https://ssd.mathworks.com/supportfiles/audio/YAMNetTransferLearning.zip'; AirCompressorLocation = pwd; dataFolder = fullfile(AirCompressorLocation,'YAMNetTransferLearning'); if ~exist(dataFolder,'dir') disp('Downloading pretrained network ...') unzip(url,AirCompressorLocation) end
Downloading pretrained network ...
addpath(fullfile(AirCompressorLocation,'YAMNetTransferLearning'))
The final element of the Layers
property is the classification output layer. The Classes
property of this layer contains the names of the classes learned by the network.
load("airCompressorNet.mat");
net = airCompressorNet;
net.Layers(end).Classes
ans = 8×1 categorical
Bearing
Flywheel
Healthy
LIV
LOV
NRV
Piston
Riderbelt
Use the analyzeNetwork
function to obtain information about the network layers. The function returns a graphical representation of the network that contains detailed parameter information for every layer in the network.
analyzeNetwork(net)
Create Calibration Data
The YAMNet network accepts inputs of size 96-by-64-by-1. Use the getyamnet_CLEInput
helper function to obtain preprocessed mel spectrograms of size 96-by-64-by-1. See Helper Functions. For best quantization results, the calibration data must be representative of actual inputs that are predicted by the YAMNet network. Expedite the calibration process by reducing the calibration data set to 20 mel spectrograms.
numSpecs = 20; [imOutCal, imdsCal] = getyamnet_CLEInput(numSpecs);
Calculate Accuracy of Quantized YAMNet Without Cross-Layer Equalization
Create a YAMNet network that does not use cross-layer equalization, quantize the network, and calculate its accuracy.
Calibrate and Quantize Network
Create a quantized network by using the dlquantizer
object. Set the target execution environment to FPGA.
dlQuantObj = dlquantizer(net,ExecutionEnvironment='FPGA');
Use the calibrate
function to exercise the network with sample inputs, collect the dynamic ranges of the weights and biases, and returns a table. Each row of the table contains range information for a learnable parameter of the quantized network.
calibrate(dlQuantObj,imdsCal);
Deploy the Quantized Network
Define the target FPGA board programming interface by using the dlhdl.Target
object. Specify that the interface is for a Xilinx board with an Ethernet interface.
hTarget = dlhdl.Target('Xilinx',Interface='Ethernet');
Prepare the network for deployment by creating a dlhdl.Workflow
object. Specify the network and the bitstream name. Ensure that the bitstream matches the data type and the FPGA board. In this example, the target FPGA is the Xilinx ZCU102 SOC board. The bitstream uses an int8
data type.
hW = dlhdl.Workflow(Network=dlQuantObj,Bitstream='zcu102_int8',Target=hTarget);
Run the compile
method of the dlhdl.Workflow
object to compile the network and generate the instructions, weights, and biases for deployment. Because the total number of frames exceeds the default value of 30, set the InputFrameNumberLimit
to 100 to run predictions in chunks of 100 frames to prevent timeouts.
dn = compile(hW,InputFrameNumberLimit=100)
### Compiling network for Deep Learning FPGA prototyping ... ### Targeting FPGA bitstream zcu102_int8. ### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer' ### The network includes the following layers: 1 'input_1' Image Input 96×64×1 images (SW Layer) 2 'conv2d' 2-D Convolution 32 3×3×1 convolutions with stride [2 2] and padding 'same' (HW Layer) 3 'activation' ReLU ReLU (HW Layer) 4 'depthwise_conv2d' 2-D Grouped Convolution 32 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' (HW Layer) 5 'activation_1' ReLU ReLU (HW Layer) 6 'conv2d_1' 2-D Convolution 64 1×1×32 convolutions with stride [1 1] and padding 'same' (HW Layer) 7 'activation_2' ReLU ReLU (HW Layer) 8 'depthwise_conv2d_1' 2-D Grouped Convolution 64 groups of 1 3×3×1 convolutions with stride [2 2] and padding 'same' (HW Layer) 9 'activation_3' ReLU ReLU (HW Layer) 10 'conv2d_2' 2-D Convolution 128 1×1×64 convolutions with stride [1 1] and padding 'same' (HW Layer) 11 'activation_4' ReLU ReLU (HW Layer) 12 'depthwise_conv2d_2' 2-D Grouped Convolution 128 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' (HW Layer) 13 'activation_5' ReLU ReLU (HW Layer) 14 'conv2d_3' 2-D Convolution 128 1×1×128 convolutions with stride [1 1] and padding 'same' (HW Layer) 15 'activation_6' ReLU ReLU (HW Layer) 16 'depthwise_conv2d_3' 2-D Grouped Convolution 128 groups of 1 3×3×1 convolutions with stride [2 2] and padding 'same' (HW Layer) 17 'activation_7' ReLU ReLU (HW Layer) 18 'conv2d_4' 2-D Convolution 256 1×1×128 convolutions with stride [1 1] and padding 'same' (HW Layer) 19 'activation_8' ReLU ReLU (HW Layer) 20 'depthwise_conv2d_4' 2-D Grouped Convolution 256 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' (HW Layer) 21 'activation_9' ReLU ReLU (HW Layer) 22 'conv2d_5' 2-D Convolution 256 1×1×256 convolutions with stride [1 1] and padding 'same' (HW Layer) 23 'activation_10' ReLU ReLU (HW Layer) 24 'depthwise_conv2d_5' 2-D Grouped Convolution 256 groups of 1 3×3×1 convolutions with stride [2 2] and padding 'same' (HW Layer) 25 'activation_11' ReLU ReLU (HW Layer) 26 'conv2d_6' 2-D Convolution 512 1×1×256 convolutions with stride [1 1] and padding 'same' (HW Layer) 27 'activation_12' ReLU ReLU (HW Layer) 28 'depthwise_conv2d_6' 2-D Grouped Convolution 512 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' (HW Layer) 29 'activation_13' ReLU ReLU (HW Layer) 30 'conv2d_7' 2-D Convolution 512 1×1×512 convolutions with stride [1 1] and padding 'same' (HW Layer) 31 'activation_14' ReLU ReLU (HW Layer) 32 'depthwise_conv2d_7' 2-D Grouped Convolution 512 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' (HW Layer) 33 'activation_15' ReLU ReLU (HW Layer) 34 'conv2d_8' 2-D Convolution 512 1×1×512 convolutions with stride [1 1] and padding 'same' (HW Layer) 35 'activation_16' ReLU ReLU (HW Layer) 36 'depthwise_conv2d_8' 2-D Grouped Convolution 512 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' (HW Layer) 37 'activation_17' ReLU ReLU (HW Layer) 38 'conv2d_9' 2-D Convolution 512 1×1×512 convolutions with stride [1 1] and padding 'same' (HW Layer) 39 'activation_18' ReLU ReLU (HW Layer) 40 'depthwise_conv2d_9' 2-D Grouped Convolution 512 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' (HW Layer) 41 'activation_19' ReLU ReLU (HW Layer) 42 'conv2d_10' 2-D Convolution 512 1×1×512 convolutions with stride [1 1] and padding 'same' (HW Layer) 43 'activation_20' ReLU ReLU (HW Layer) 44 'depthwise_conv2d_10' 2-D Grouped Convolution 512 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' (HW Layer) 45 'activation_21' ReLU ReLU (HW Layer) 46 'conv2d_11' 2-D Convolution 512 1×1×512 convolutions with stride [1 1] and padding 'same' (HW Layer) 47 'activation_22' ReLU ReLU (HW Layer) 48 'depthwise_conv2d_11' 2-D Grouped Convolution 512 groups of 1 3×3×1 convolutions with stride [2 2] and padding 'same' (HW Layer) 49 'activation_23' ReLU ReLU (HW Layer) 50 'conv2d_12' 2-D Convolution 1024 1×1×512 convolutions with stride [1 1] and padding 'same' (HW Layer) 51 'activation_24' ReLU ReLU (HW Layer) 52 'depthwise_conv2d_12' 2-D Grouped Convolution 1024 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' (HW Layer) 53 'activation_25' ReLU ReLU (HW Layer) 54 'conv2d_13' 2-D Convolution 1024 1×1×1024 convolutions with stride [1 1] and padding 'same' (HW Layer) 55 'activation_26' ReLU ReLU (HW Layer) 56 'global_average_pooling2d' 2-D Global Average Pooling 2-D global average pooling (HW Layer) 57 'dense' Fully Connected 8 fully connected layer (HW Layer) 58 'softmax' Softmax softmax (SW Layer) 59 'Sounds' Classification Output crossentropyex with 'Bearing' and 7 other classes (SW Layer) ### Notice: The layer 'input_1' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software. ### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software. ### Notice: The layer 'Sounds' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software. ### Compiling layer group: conv2d>>activation_26 ... ### Compiling layer group: conv2d>>activation_26 ... complete. ### Compiling layer group: global_average_pooling2d ... ### Compiling layer group: global_average_pooling2d ... complete. ### Compiling layer group: dense ... ### Compiling layer group: dense ... complete. ### Allocating external memory buffers: offset_name offset_address allocated_space _______________________ ______________ ________________ "InputDataOffset" "0x00000000" "8.0 MB" "OutputResultOffset" "0x00800000" "4.0 MB" "SchedulerDataOffset" "0x00c00000" "4.0 MB" "SystemBufferOffset" "0x01000000" "28.0 MB" "InstructionDataOffset" "0x02c00000" "4.0 MB" "ConvWeightDataOffset" "0x03000000" "32.0 MB" "FCWeightDataOffset" "0x05000000" "4.0 MB" "EndOffset" "0x05400000" "Total: 84.0 MB" ### Network compilation complete.
dn = struct with fields:
weights: [1×1 struct]
instructions: [1×1 struct]
registers: [1×1 struct]
syncInstructions: [1×1 struct]
constantData: {}
ddrInfo: [1×1 struct]
To deploy the network on the Xilinx ZCU102 SoC hardware, run the deploy
function of the dlhdl.Workflow
object. This function uses the programming file to program the FPGA board and downloads the network weights and biases. The deploy
function programs the FPGA device and displays progress messages, and the required time to deploy the network.
deploy(hW)
### Programming FPGA Bitstream using Ethernet... ### Attempting to connect to the hardware board at 192.168.1.101... ### Connection successful ### Programming FPGA device on Xilinx SoC hardware board at 192.168.1.101... ### Copying FPGA programming files to SD card... ### Setting FPGA bitstream and devicetree for boot... # Copying Bitstream zcu102_int8.bit to /mnt/hdlcoder_rd # Set Bitstream to hdlcoder_rd/zcu102_int8.bit # Copying Devicetree devicetree_dlhdl.dtb to /mnt/hdlcoder_rd # Set Devicetree to hdlcoder_rd/devicetree_dlhdl.dtb # Set up boot for Reference Design: 'AXI-Stream DDR Memory Access : 3-AXIM' ### Rebooting Xilinx SoC at 192.168.1.101... ### Reboot may take several seconds... ### Attempting to connect to the hardware board at 192.168.1.101... ### Connection successful ### Programming the FPGA bitstream has been completed successfully. ### Loading weights to Conv Processor. ### Conv Weights loaded. Current time is 06-Jul-2023 16:07:40 ### Loading weights to FC Processor. ### FC Weights loaded. Current time is 06-Jul-2023 16:07:40
Test Network
Prepare the test data for prediction. Use the entire data set consisting of preprocessed mel spectrograms. Compare the predictions of the quantized network to the predictions of Deep Learning Toolbox.
[imOutPred, imdPred] = getyamnet_CLEInput(88);
Calculate the accuracy of the predictions of the quantized network with respect to the predictions from Deep Learning Toolbox by using the getNetworkAccuracy
helper function. See Helper Functions.
quantizedAccuracy = getNetworkAccuracy(hW,imOutPred,net)
### Finished writing input activations. ### Running in multi-frame mode with 88 inputs. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 7978514 0.03191 88 701741839 31.4 conv2d 27553 0.00011 depthwise_conv2d 26068 0.00010 conv2d_1 80316 0.00032 depthwise_conv2d_1 27113 0.00011 conv2d_2 69613 0.00028 depthwise_conv2d_2 32615 0.00013 conv2d_3 126269 0.00051 depthwise_conv2d_3 21334 0.00009 conv2d_4 90407 0.00036 depthwise_conv2d_4 27856 0.00011 conv2d_5 171756 0.00069 depthwise_conv2d_5 22016 0.00009 conv2d_6 312844 0.00125 depthwise_conv2d_6 29744 0.00012 conv2d_7 617984 0.00247 depthwise_conv2d_7 29674 0.00012 conv2d_8 617664 0.00247 depthwise_conv2d_8 30024 0.00012 conv2d_9 617954 0.00247 depthwise_conv2d_9 30084 0.00012 conv2d_10 617534 0.00247 depthwise_conv2d_10 29724 0.00012 conv2d_11 617444 0.00247 depthwise_conv2d_11 26122 0.00010 conv2d_12 1207736 0.00483 depthwise_conv2d_12 38636 0.00015 conv2d_13 2409106 0.00964 global_average_pooling2d 20015 0.00008 dense 3231 0.00001 * The clock frequency of the DL processor is: 250MHz
quantizedAccuracy = 0.6477
Calculate Accuracy of Quantized YAMNet with Cross Layer Equalization
Create a YAMNet network with cross-layer equalization, quantize the network, and calculate its accuracy.
Create, Calibrate and Quantize a Cross-Layer Equalized Network
Create a cross layer equalized network by using the equalizeLayers
function.
netCLE = equalizeLayers(net);
Create a quantized YAMNet with cross layer equalization by using the dlquantizer
object. Set the target execution environment to FPGA.
dlQuantObjCLE = dlquantizer(netCLE,ExecutionEnvironment='FPGA');
Use the calibrate
function to exercise the network with sample inputs, collect the dynamic ranges of the weights and biases, and returns a table. Each row of the table contains range information for a learnable parameter of the quantized network.
calibrate(dlQuantObjCLE,imdsCal);
Deploy Quantized Network
Define the target FPGA board programming interface by using the dlhdl.Target
object. Specify that the interface is for a Xilinx board with an Ethernet interface.
hTargetCLE = dlhdl.Target('Xilinx',Interface='Ethernet');
Prepare the network for deployment by creating a dlhdl.Workflow
object. Specify the network and the bitstream name. Ensure that the bitstream matches the data type and the FPGA board. In this example, the target FPGA is the Xilinx ZCU102 SOC board. The bitstream uses an int8
data type.
hWCLE = dlhdl.Workflow(Network=dlQuantObjCLE,Bitstream='zcu102_int8',Target=hTargetCLE);
Run the compile
method of the dlhdl.Workflow
object to compile the network and generate the instructions, weights, and biases for deployment. Because the total number of frames exceeds the default value of 30, set the InputFrameNumberLimit
to 100 to run predictions in chunks of 100 frames to prevent timeouts.
dnCLE = compile(hWCLE,InputFrameNumberLimit=100)
### Compiling network for Deep Learning FPGA prototyping ... ### Targeting FPGA bitstream zcu102_int8. ### The network includes the following layers: 1 'input_1' Image Input 96×64×1 images (SW Layer) 2 'conv2d' 2-D Convolution 32 3×3×1 convolutions with stride [2 2] and padding 'same' (HW Layer) 3 'activation' ReLU ReLU (HW Layer) 4 'depthwise_conv2d' 2-D Grouped Convolution 32 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' (HW Layer) 5 'activation_1' ReLU ReLU (HW Layer) 6 'conv2d_1' 2-D Convolution 64 1×1×32 convolutions with stride [1 1] and padding 'same' (HW Layer) 7 'activation_2' ReLU ReLU (HW Layer) 8 'depthwise_conv2d_1' 2-D Grouped Convolution 64 groups of 1 3×3×1 convolutions with stride [2 2] and padding 'same' (HW Layer) 9 'activation_3' ReLU ReLU (HW Layer) 10 'conv2d_2' 2-D Convolution 128 1×1×64 convolutions with stride [1 1] and padding 'same' (HW Layer) 11 'activation_4' ReLU ReLU (HW Layer) 12 'depthwise_conv2d_2' 2-D Grouped Convolution 128 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' (HW Layer) 13 'activation_5' ReLU ReLU (HW Layer) 14 'conv2d_3' 2-D Convolution 128 1×1×128 convolutions with stride [1 1] and padding 'same' (HW Layer) 15 'activation_6' ReLU ReLU (HW Layer) 16 'depthwise_conv2d_3' 2-D Grouped Convolution 128 groups of 1 3×3×1 convolutions with stride [2 2] and padding 'same' (HW Layer) 17 'activation_7' ReLU ReLU (HW Layer) 18 'conv2d_4' 2-D Convolution 256 1×1×128 convolutions with stride [1 1] and padding 'same' (HW Layer) 19 'activation_8' ReLU ReLU (HW Layer) 20 'depthwise_conv2d_4' 2-D Grouped Convolution 256 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' (HW Layer) 21 'activation_9' ReLU ReLU (HW Layer) 22 'conv2d_5' 2-D Convolution 256 1×1×256 convolutions with stride [1 1] and padding 'same' (HW Layer) 23 'activation_10' ReLU ReLU (HW Layer) 24 'depthwise_conv2d_5' 2-D Grouped Convolution 256 groups of 1 3×3×1 convolutions with stride [2 2] and padding 'same' (HW Layer) 25 'activation_11' ReLU ReLU (HW Layer) 26 'conv2d_6' 2-D Convolution 512 1×1×256 convolutions with stride [1 1] and padding 'same' (HW Layer) 27 'activation_12' ReLU ReLU (HW Layer) 28 'depthwise_conv2d_6' 2-D Grouped Convolution 512 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' (HW Layer) 29 'activation_13' ReLU ReLU (HW Layer) 30 'conv2d_7' 2-D Convolution 512 1×1×512 convolutions with stride [1 1] and padding 'same' (HW Layer) 31 'activation_14' ReLU ReLU (HW Layer) 32 'depthwise_conv2d_7' 2-D Grouped Convolution 512 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' (HW Layer) 33 'activation_15' ReLU ReLU (HW Layer) 34 'conv2d_8' 2-D Convolution 512 1×1×512 convolutions with stride [1 1] and padding 'same' (HW Layer) 35 'activation_16' ReLU ReLU (HW Layer) 36 'depthwise_conv2d_8' 2-D Grouped Convolution 512 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' (HW Layer) 37 'activation_17' ReLU ReLU (HW Layer) 38 'conv2d_9' 2-D Convolution 512 1×1×512 convolutions with stride [1 1] and padding 'same' (HW Layer) 39 'activation_18' ReLU ReLU (HW Layer) 40 'depthwise_conv2d_9' 2-D Grouped Convolution 512 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' (HW Layer) 41 'activation_19' ReLU ReLU (HW Layer) 42 'conv2d_10' 2-D Convolution 512 1×1×512 convolutions with stride [1 1] and padding 'same' (HW Layer) 43 'activation_20' ReLU ReLU (HW Layer) 44 'depthwise_conv2d_10' 2-D Grouped Convolution 512 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' (HW Layer) 45 'activation_21' ReLU ReLU (HW Layer) 46 'conv2d_11' 2-D Convolution 512 1×1×512 convolutions with stride [1 1] and padding 'same' (HW Layer) 47 'activation_22' ReLU ReLU (HW Layer) 48 'depthwise_conv2d_11' 2-D Grouped Convolution 512 groups of 1 3×3×1 convolutions with stride [2 2] and padding 'same' (HW Layer) 49 'activation_23' ReLU ReLU (HW Layer) 50 'conv2d_12' 2-D Convolution 1024 1×1×512 convolutions with stride [1 1] and padding 'same' (HW Layer) 51 'activation_24' ReLU ReLU (HW Layer) 52 'depthwise_conv2d_12' 2-D Grouped Convolution 1024 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' (HW Layer) 53 'activation_25' ReLU ReLU (HW Layer) 54 'conv2d_13' 2-D Convolution 1024 1×1×1024 convolutions with stride [1 1] and padding 'same' (HW Layer) 55 'activation_26' ReLU ReLU (HW Layer) 56 'global_average_pooling2d' 2-D Global Average Pooling 2-D global average pooling (HW Layer) 57 'dense' Fully Connected 8 fully connected layer (HW Layer) 58 'softmax' Softmax softmax (SW Layer) 59 'Sounds' Classification Output crossentropyex with 'Bearing' and 7 other classes (SW Layer) ### Notice: The layer 'input_1' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software. ### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software. ### Notice: The layer 'Sounds' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software. ### Compiling layer group: conv2d>>activation_26 ... ### Compiling layer group: conv2d>>activation_26 ... complete. ### Compiling layer group: global_average_pooling2d ... ### Compiling layer group: global_average_pooling2d ... complete. ### Compiling layer group: dense ... ### Compiling layer group: dense ... complete. ### Allocating external memory buffers: offset_name offset_address allocated_space _______________________ ______________ ________________ "InputDataOffset" "0x00000000" "8.0 MB" "OutputResultOffset" "0x00800000" "4.0 MB" "SchedulerDataOffset" "0x00c00000" "4.0 MB" "SystemBufferOffset" "0x01000000" "28.0 MB" "InstructionDataOffset" "0x02c00000" "4.0 MB" "ConvWeightDataOffset" "0x03000000" "32.0 MB" "FCWeightDataOffset" "0x05000000" "4.0 MB" "EndOffset" "0x05400000" "Total: 84.0 MB" ### Network compilation complete.
dnCLE = struct with fields:
weights: [1×1 struct]
instructions: [1×1 struct]
registers: [1×1 struct]
syncInstructions: [1×1 struct]
constantData: {}
ddrInfo: [1×1 struct]
To deploy the network on the Xilinx ZCU102 SoC hardware, run the deploy
function of the dlhdl.Workflow
object. This function uses the programming file to program the FPGA board and downloads the network weights and biases. The deploy
function programs the FPGA device and displays progress messages, and the required time to deploy the network.
deploy(hWCLE)
### Programming FPGA Bitstream using Ethernet... ### Attempting to connect to the hardware board at 192.168.1.101... ### Connection successful ### Programming FPGA device on Xilinx SoC hardware board at 192.168.1.101... ### Copying FPGA programming files to SD card... ### Setting FPGA bitstream and devicetree for boot... # Copying Bitstream zcu102_int8.bit to /mnt/hdlcoder_rd # Set Bitstream to hdlcoder_rd/zcu102_int8.bit # Copying Devicetree devicetree_dlhdl.dtb to /mnt/hdlcoder_rd # Set Devicetree to hdlcoder_rd/devicetree_dlhdl.dtb # Set up boot for Reference Design: 'AXI-Stream DDR Memory Access : 3-AXIM' ### Rebooting Xilinx SoC at 192.168.1.101... ### Reboot may take several seconds... ### Attempting to connect to the hardware board at 192.168.1.101... ### Connection successful ### Programming the FPGA bitstream has been completed successfully. ### Loading weights to Conv Processor. ### Conv Weights loaded. Current time is 06-Jul-2023 16:13:55 ### Loading weights to FC Processor. ### FC Weights loaded. Current time is 06-Jul-2023 16:13:55
Test Network
Prepare the test data for prediction. Use the entire data set consisting of preprocessed mel spectrograms. Compare the predictions of the quantized network to the predictions of Deep Learning Toolbox.
[imOutPredCLE, imdPredCLE] = getyamnet_CLEInput(88);
Compare the accuracy of the quantized network predictions against the accuracy of the predictions from Deep Learning Toolbox, by using the getNetworkAccuracy
helper function. See Helper Functions.
quantizedAccuracyCLE = getNetworkAccuracy(hWCLE, imOutPredCLE, netCLE)
### Finished writing input activations. ### Running in multi-frame mode with 88 inputs. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 7977295 0.03191 88 701748593 31.4 conv2d 27523 0.00011 depthwise_conv2d 26158 0.00010 conv2d_1 80284 0.00032 depthwise_conv2d_1 27109 0.00011 conv2d_2 69334 0.00028 depthwise_conv2d_2 32244 0.00013 conv2d_3 126831 0.00051 depthwise_conv2d_3 20814 0.00008 conv2d_4 90320 0.00036 depthwise_conv2d_4 27841 0.00011 conv2d_5 171883 0.00069 depthwise_conv2d_5 22036 0.00009 conv2d_6 312914 0.00125 depthwise_conv2d_6 29674 0.00012 conv2d_7 618094 0.00247 depthwise_conv2d_7 29604 0.00012 conv2d_8 617444 0.00247 depthwise_conv2d_8 30064 0.00012 conv2d_9 617944 0.00247 depthwise_conv2d_9 30054 0.00012 conv2d_10 617384 0.00247 depthwise_conv2d_10 29704 0.00012 conv2d_11 617474 0.00247 depthwise_conv2d_11 26122 0.00010 conv2d_12 1207626 0.00483 depthwise_conv2d_12 38466 0.00015 conv2d_13 2408966 0.00964 global_average_pooling2d 20127 0.00008 dense 3179 0.00001 * The clock frequency of the DL processor is: 250MHz
quantizedAccuracyCLE = 0.8636
Using the cross-layer equalization improves the prediction accuracy of the network. The accuracy of the network with cross-layer equalization (netCLE)
is 86.36% and the accuracy of the network without cross-layer equalization (net)
is 64.77%.
Helper Functions
The getyamnet_CLEInput
helper function obtains preprocessed mel spectrograms of size 96-by-64-by-1.
function [imOut,imds] = getyamnet_CLEInput(NumOfImg) if NumOfImg > 88 error('Provide an input less than or equal to the size of this dataset of 88 images.'); end spectData = load('yamnetInput.mat'); spectData = spectData.melSpectYam; imOut = spectData(:,:,:,1:NumOfImg); imds = augmentedImageDatastore([96 64],imOut); end
The getNetworkAccuracy
helper function retrieves predictions from the FPGA and compares them to the predictions from Deep Learning Toolbox™.
function accuracy = getNetworkAccuracy(workFlow, data, network) % Function gets accuracy of predictions by workflow object with respect to predictions by DL toolbox % Predictions by workflow object hwPred = workFlow.predict(data,Profile="on"); % Matrix of probability for each class [hwValue, hwIdx] = max(hwPred,[],2); % Get value and index of class with max probability hwClasses = network.Layers(end).Classes(hwIdx); % Get class names from index % predictions by DL toolbox object dlPred = predict(network, data); % Matrix of probability for each class [dlValue, dlIdx] = max(dlPred,[],2); % Get value and index of class with max probability dlClasses = network.Layers(end).Classes(dlIdx); % Get class names from index accuracy = nnz(dlClasses == hwClasses)/size(data,4); % Number of same predictions/total predictions end
References
[1] Verma, Nishchal K., et al. “Intelligent Condition Based Monitoring Using Acoustic Signals for Air Compressors.” IEEE Transactions on Reliability, vol. 65, no. 1, Mar. 2016, pp. 291–309. DOI.org (Crossref), doi:10.1109/TR.2015.2459684.
See Also
dlhdl.Target
| dlhdl.Workflow
| compile
| deploy
| predict
| dlquantizer
| calibrate
| equalizeLayers