Evaluate Code Generation Inference Time of Compressed Deep Neural Network

Since R2023b

This example uses:

This example shows how to compare the inference time of a compressed deep neural network for battery state of charge estimation.

Battery state of charge (SOC) is the level of charge of an electric battery relative to its capacity measured as a percentage. SOC is critical information for the vehicle energy management system and must be accurately estimated to ensure reliable and affordable electrified vehicles (xEV). However, due to the nonlinear temperature, health, and SOC dependent behavior of Li-ion batteries, SOC estimation is still a significant automotive engineering challenge. Traditional approaches to this problem, such as electrochemical models, usually require precise parameters and knowledge of the battery composition as well as its physical response. In contrast, using neural networks is a data-driven approach that requires minimal knowledge of the battery or its nonlinear behavior [1].

The compressNetworkUsingProjection function compresses a network by projecting layers into smaller parameter subspaces. For optimal initialization of the projected network, the function projects the learnable parameters of projectable layers into a subspace that maintains the highest variance in neuron activations. After you compress a neural network using projection, you can then fine-tune the network to increase the accuracy.

In this example, you:

Train a recurrent neural network to predict the state of charge of a Li-ion battery, given time series data representing features of the battery.
Compress the network using projection.
Fine-tune the compressed network.
Generate library C++ code for making predictions using the original network and the compressed, fine-tuned network.
Compare the size and the performance of the original and compressed, fine-tuned network.

Compressing a network reduces the size of the network in memory and speeds up inference.

Download Data

Each file in the LG_HG2_Prepared_Dataset_McMasterUniversity_Jan_2020 data set contains a time series X of five predictors (voltage, current, temperature, average voltage, and average current) and a time series Y of one target (SOC). Each file represents data collected at a different ambient temperature. The predictors have been normalized to be in the range [0,1].

Specify the URL from where to download the data set. Alternatively, you can download this data set manually from https://data.mendeley.com/datasets/cp3473x7xv/3.

url = "https://data.mendeley.com/public-files/datasets/cp3473x7xv/files/ad7ac5c9-2b9e-458a-a91f-6f3da449bdfb/file_downloaded";

Set downloadFolder to where you want to download the ZIP file and the outputFolder to where you want to extract the ZIP file.

downloadFolder = tempdir;
outputFolder = fullfile(downloadFolder,"LGHG2@n10C_to_25degC");

Download and extract the LG_HG2_Prepared_Dataset_McMasterUniversity_Jan_2020 data set.

if ~exist(outputFolder,"dir")
    fprintf("Downloading LGHG2@n10C_to_25degC.zip (56 MB) ... ")
    filename = fullfile(downloadFolder,"LGHG2@n10C_to_25degC.zip");
    websave(filename,url);
    unzip(filename,outputFolder)
end

Downloading LGHG2@n10C_to_25degC.zip (56 MB) ...

Prepare Training and Validation Data

Set the subsequence length to 500 and the number of input features to 3. As this example uses an LSTM network that can learn long term trends, the remaining two features in the data set (the average voltage and average current) are not required.

chunkSize = 500;
numFeatures = 3;

Use the setupData function to prepare the training and validation data by splitting the sequences into subsequences of length 500. The setupData function is provided as a supporting function at the end of this example and splits the sequences into subsequences of a specified length.

trainingFile = fullfile(outputFolder,"Train","TRAIN_LGHG2@n10degC_to_25degC_Norm_5Inputs.mat");
[XTrain,YTrain] = setupData(trainingFile,chunkSize,numFeatures);

validationFile = fullfile(outputFolder,"Validation", "01_TEST_LGHG2@n10degC_Norm_(05_Inputs).mat");
[XVal,YVal] = setupData(validationFile,chunkSize,numFeatures);

Visualize one of the observations in the training data set by plotting the target SOC and the corresponding predictors.

responseToPreview = 30;

figure
plot(YTrain{responseToPreview})
hold on
plot(XTrain{responseToPreview}(1,:))
plot(XTrain{responseToPreview}(2,:))
plot(XTrain{responseToPreview}(3,:))

legend(["SOC" "Voltage" "Current" "Temperature"])
xlabel("Sample")
hold off

Define Network Architecture

Define the following LSTM network that predicts the battery state of charge. This example uses an LSTM network instead of a network with three hidden layers to demonstrate the effect of projection on a larger network.

For the sequence input, specify a sequence input layer with input size matching the number of features. Rescale the input to be in the range [-1,1] using the symmetric-rescale normalization name-value argument.
To learn long-term dependencies in the sequence data, include two LSTM layers with 128 and 64 hidden units respectively.
To reduce overfitting, include two dropout layers with a dropout probability of 0.2.
Include a fully connected layer with a size that matches the size of the output. To bound the output in the interval [0,1], include a sigmoid layer.

numHiddenUnits = 128;
numResponses = 1;
dropoutProb = 0.2;

layers = [...
    sequenceInputLayer(numFeatures,Normalization="rescale-symmetric")
    lstmLayer(numHiddenUnits)
    dropoutLayer(dropoutProb)
    lstmLayer(numHiddenUnits/2)
    dropoutLayer(dropoutProb)
    fullyConnectedLayer(numResponses)
    sigmoidLayer];

Specify the training options.

Train for 100 epochs with mini-batches of size 64 using the "adam" solver.
Specify an initial learning rate of 0.01, a learning rate drop period of 30 and a learning rate drop factor of 0.1.
To prevent the gradients from exploding, set the gradient threshold to 1.
Shuffle the training data every epoch.
Specify the validation data.
To avoid having to rearrange the training data, specify the input and target data formats as CTB (channel, time, batch).
Return the network with the lowest validation loss.
Display the training progress and suppress the verbose output.

options = trainingOptions("adam", ...
    MiniBatchSize = 64, ...
    MaxEpochs = 100, ...
    InitialLearnRate = 1e-2, ...
    LearnRateSchedule = "piecewise", ...
    LearnRateDropPeriod = 30, ...
    LearnRateDropFactor = 0.1, ...
    GradientThreshold = 1, ...
    Shuffle = "every-epoch", ...
    ValidationData = {XVal,YVal}, ...
    ValidationFrequency = 50, ...
    InputDataFormats="CTB", ...
    TargetDataFormats="CTB", ...
    OutputNetwork="best-validation-loss", ...
    Plots = "training-progress", ...
    Verbose = false);

Train Network

Train the network using the trainnet function, specifying the loss function as mean-squared error.

recurrentNet = trainnet(XTrain,YTrain,layers,"mean-squared-error",options);

Test Network

Evaluate the performance of the network on the test data set and compare the network predictions to the measured values.

testFile = fullfile(outputFolder,"Test","01_TEST_LGHG2@n10degC_Norm_(05_Inputs).mat");
S = load(testFile);

XTest = S.X(1:numFeatures,:);
XTest = dlarray(XTest,"CT");
YPredOriginal = predict(recurrentNet,XTest);

YTest = S.Y;
RMSEOriginal = rmse(YTest,extractdata(YPredOriginal))

RMSEOriginal = single

0.0336

Plot the predictions and the measured values.

figure
plot(YPredOriginal);
hold on;
plot(YTest,'k--',LineWidth=2); 
hold off
xlabel("Sample")
ylabel("Y")
legend("SOC estimated using original network","SOC ground truth",location="best");

Inspect the number of learnables in the network using the numLearnables function. The numLearnables function is provided at the end of this example.

learnables = numLearnables(recurrentNet)

learnables = 
117057

Save the trained network.

save("recurrentNet.mat","recurrentNet");

Explore Compression Levels

The compressNetworkUsingProjection function uses principal component analysis (PCA) to identify the subspace of learnable parameters that result in the highest variance in neuron activations by analyzing the network activations using a data set of training data. This analysis requires only the predictors of the training data to compute the network activations. It does not require the training targets.

The PCA step can be computationally intensive. If you expect to compress the same network multiple times (for example, when exploring different levels of compression), then perform the PCA step first and reuse the resulting neuronPCA object.

Create a mini-batch queue containing the training data.

Specify a mini-batch size of 64.
Specify that the output data has format "CTB" (channel, time, batch).
Preprocess the mini-batches by concatenating the sequences over the third dimension.

mbSize = 64;

mbq = minibatchqueue(...
    arrayDatastore(XTrain,OutputType="same",ReadSize=mbSize),...
    MiniBatchSize=mbSize,...
    MiniBatchFormat="CTB",...
    MiniBatchFcn=@(X) cat(3,X{:}));

Create the neuronPCA object. To view information about the steps of the neuron PCA algorithm, set the VerbosityLevel option to "steps".

npca = neuronPCA(recurrentNet,mbq,VerbosityLevel="steps");

Using solver mode "direct".
Computing covariance matrices for activations connected to: "lstm_1/in","lstm_1/out","lstm_2/in","lstm_2/out","fc/in","fc/out"
Computing eigenvalues and eigenvectors for activations connected to: "lstm_1/in","lstm_1/out","lstm_2/in","lstm_2/out","fc/in","fc/out"
neuronPCA analyzed 3 layers: "lstm_1","lstm_2","fc"

View the properties of the neuronPCA object.

npca

npca = 
  neuronPCA with properties:

                  LayerNames: ["lstm_1"    "lstm_2"    "fc"]
      ExplainedVarianceRange: [0 1]
    LearnablesReductionRange: [0.6358 0.9770]
            InputEigenvalues: {[3×1 double]  [128×1 double]  [64×1 double]}
           InputEigenvectors: {[3×3 double]  [128×128 double]  [64×64 double]}
           OutputEigenvalues: {[128×1 double]  [64×1 double]  [6.3770]}
          OutputEigenvectors: {[128×128 double]  [64×64 double]  [1]}

The explained variance of a network details how well the space of network activations can capture the underlying features of the data. To explore different amounts of compression, iterate over different values of the ExplainedVarianceGoal option of the compressNetworkUsingProjection function and compare the results.

numValues = 10;
explainedVarGoal = 1 - logspace(-3,0,numValues);
explainedVariance = zeros(1,numValues);
learnablesReduction = zeros(1,numValues);
accuracy = zeros(1,numValues);
XValCompression = dlarray(cat(3,XVal{:}),"CBT");
YValCompression = cat(3,YVal{:});

for i = 1:numValues
    varianceGoal = explainedVarGoal(i);

    [trialNetwork,info] = compressNetworkUsingProjection(recurrentNet,npca, ...
        ExplainedVarianceGoal=varianceGoal, ...
        VerbosityLevel="off");

    explainedVariance(i) = info.ExplainedVariance;
    learnablesReduction(i) = info.LearnablesReduction;

    YPredProjected = predict(trialNetwork,XValCompression);
    YPredProjected = extractdata(YPredProjected);
    accuracy(i) = rmse(YValCompression,YPredProjected,"all");
end

Plot the RMSE of the compressed networks against their explained variance goal.

figure
tiledlayout("flow")
nexttile
plot(learnablesReduction,accuracy,'+-')
ylabel("RMSE")
title("Effect of Different Compression Levels")

nexttile
plot(learnablesReduction,explainedVariance,'+-')
ylim([0.8 1])
ylabel("Explained Variance")
xlabel("Learnable Reduction")

The graph shows that an increase in learnable reduction has a corresponding increase in RMSE (decrease in accuracy). A learnable reduction value of around 94% shows a good compromise between compression amount and RMSE.

Compress Network Using Projection

Compress the network using the neuronPCA object with a learnable reduction goal of 94% using the compressNetworkUsingProjection function. To ensure that the projected network supports library-free code generation, specify that the projected layers are unpacked.

recurrentNetProjected = compressNetworkUsingProjection(recurrentNet,npca,LearnablesReductionGoal=0.94,unpackProjectedLayers=true);

Compressed network has 94.8% fewer learnable parameters.
Projection compressed 3 layers: "lstm_1","lstm_2","fc"

Inspect the number of learnables in the projected network.

learnablesProjected = numLearnables(recurrentNetProjected)

learnablesProjected = 
6092

Evaluate the projected network´s performance on the test data set.

YPredProjected = predict(recurrentNetProjected,XTest);
RMSEProjected = rmse(YTest,extractdata(YPredProjected))

RMSEProjected = single

0.0595

Compressing the network has increased the root-mean-square error of the predictions.

Fine-Tune Compressed Network

You can improve the accuracy by retraining the network.

Reduce the number of training epochs and the number of epochs between drops in the learning rate.

options.MaxEpochs = options.MaxEpochs/2;
options.LearnRateDropPeriod = options.LearnRateDropPeriod/2;

Train the network using the trainnet function, specifying the loss function as mean squared error.

recurrentNetProjected = trainnet(XTrain,YTrain,recurrentNetProjected,"mean-squared-error",options);

Evaluate the fine-tuned projected network´s performance on the test data set.

YPredProjected = predict(recurrentNetProjected,XTest);
RMSEProjected = rmse(YTest,extractdata(YPredProjected))

RMSEProjected = single

0.0349

Save the fine-tuned projected network.

save("recurrentNetProjected.mat","recurrentNetProjected");

Generate C++ Code

Generate C++ code based on the original network and the fine-tuned compressed network.

Create Entry-Point Function for Code Generation

An entry-point function is a top-level MATLAB function from which you generate code. Write an entry-point function in MATLAB that:

Uses the coder.loadDeepLearningNetwork function to load a deep learning model and to construct. For more information, see Load Pretrained Networks for Code Generation (GPU Coder).
Calls the predict function to predict the responses.

The entry-point functions recurrentNetPredict.m and recurrentNetProjectedPredict.m are provided as supporting files with this example. To access these files, open the example as a live script.

Inspect the entry-point functions.

type recurrentNetPredict.m

function Y = recurrentNetPredict(x)

persistent net;
if isempty(net)
    net = coder.loadDeepLearningNetwork("recurrentNet.mat");
end

Y = predict(net,x);

end

type recurrentNetProjectedPredict.m

function Y = recurrentNetProjectedPredict(x)

persistent net;
if isempty(net)
    net = coder.loadDeepLearningNetwork("recurrentNetProjected.mat");
end

Y = predict(net,x);

end

Generate Code

To configure build settings, such as output file name, location, and type, create a coder configuration object. To create the object, use the coder.config function and specify that the output should be a MEX file.

cfg = coder.config("mex");

Set the language to use in the generated code to C++.

cfg.TargetLang = "C++";

To generate code that does not use any third-party libraries, set the target deep learning library to none.

cfg.DeepLearningConfig = coder.DeepLearningConfig("none");

Create example values that define the size and class of input to the generated code.

matrixInput = coder.typeof(single(XTest),size(XTest),[false false]);

Generate code for the original network and the fine-tuned compressed network. The MEX files recurrentNetPredict_mex and recurrentNetProjected_mex are created in your current folder.

codegen -config cfg recurrentNetPredict -args {matrixInput}

Code generation successful.

codegen -config cfg recurrentNetProjectedPredict -args {matrixInput}

Code generation successful.

You can view the resulting code generation report by clicking View Report in the MATLAB Command Window. The report is displayed in the Report Viewer window. If the code generator detects errors or warnings during code generation, the report describes the issues and provides links to the problematic MATLAB code.

Run the Generated Code

Run the generated code. To ensure that the generated code performs as expected, check that the root-mean-squared errors are unchanged.

YPredOriginal = recurrentNetPredict_mex(single(XTest));
RMSEOriginal = rmse(YTest,extractdata(YPredOriginal))

RMSEOriginal = single

0.0336

YPredProjected = recurrentNetProjectedPredict_mex(single(XTest));
RMSEProjected = rmse(YTest,extractdata(YPredProjected))

RMSEProjected = single

0.0349

Compare Original and Compressed Networks

Plot the predictions from each network and the measured values.

figure
plot(YPredOriginal);
hold on;
plot(YPredProjected);
plot(YTest,'k--',LineWidth=2); 
hold off
xlabel("Sample")
ylabel("Y")
legend("SOC estimated using original network","SOC estimated using projected network","SOC ground truth",location="best");

Compare the error, model size, and inference time of the networks.

% Plot root-mean-squared errors.
figure
tiledlayout(1,3)
nexttile
bar([RMSEOriginal RMSEProjected])
xticklabels(["Original" "Fine-Tuned Projected"])
ylabel("RMSE")

% Plot numbers of learnables.
nexttile
bar([learnables learnablesProjected])
xticklabels(["Original" "Fine-Tuned Projected"])
ylabel("Number of Learnables")

% Calculate and plot inference time using the generated code.
originalNNCompTime = timeit(@()recurrentNetPredict_mex(single(XTest)));
projectedNNCompTime = timeit(@()recurrentNetProjectedPredict_mex(single(XTest)));
nexttile
bar([originalNNCompTime projectedNNCompTime])
xticklabels(["Original" "Fine-Tuned Projected"])
ylabel("Desktop Codegen Inference Time (s)")

Compared to the original network, the projected network is significantly smaller and has reduced inference time, while incurring a minor reduction in prediction accuracy.

Networks compressed using projection can be used inside Simulink® models. As there is no equivalent to projected LSTM or projected GRU layers in the MKL-DNN library, networks containing these layers cannot take advantage of code generation to improve simulation speed as described in Improve Performance of Deep Learning Simulations in Simulink. You can take advantage of the reduced network size and reduced inference time when you deploy the generated code to hardware or run software-in-the-loop (SIL) or processor-in-the-loop (PIL) simulations of your model.

Supporting Functions

`numLearnables`

The numLearnables function receives a network as input and returns the total number of learnables in that network.

function N = numLearnables(net)
    N = 0;
    for i = 1:size(net.Learnables,1)
        N = N + numel(net.Learnables.Value{i});
    end
end

`setupData`

The setupData function loads a structure stored in the MAT file filename, extracts the first numFeatures features of the sequence data and the target values, and splits the data into subsequences of length chunkSize.

function [X,Y] = setupData(filename,chunkSize,numFeatures)

    S = load(filename);
    nSamples = length(S.Y);
    nElems = floor(nSamples/chunkSize);
    X = cell(nElems,1);
    Y = cell(nElems,1);
    
    for ii = 1:nElems
        idxStart = 1+(ii-1)*chunkSize;
        idxEnd = ii*chunkSize;
        X{ii} = S.X(1:numFeatures,idxStart:idxEnd);
        Y{ii} = S.Y(idxStart:idxEnd);
    end

end

References

[1] Kollmeyer, Phillip, Carlos Vidal, Mina Naguib, and Michael Skells. “LG 18650HG2 Li-Ion Battery Data and Example Deep Neural Network XEV SOC Estimator Script.” Mendeley, March 5, 2020. https://doi.org/10.17632/CP3473X7XV.3.