Unable to train() Neural Network in single precision

13 views (last 30 days)
Hello, I am trying to train a shallow neural network of fitnet type in single precision. As recommended by an answer on the forums, to set single precision training, nnGPU() function needs to be used.
I have 3 lines of code, the first one is the typical use case and works. The second is set with nnGPU to double precision and works without issue, the third fails. What gives?
[net, ~] = train(net, inputs, targets, 'usegpu', 'yes'); % Works
[net, ~] = train(net, inputs, targets, nnGPU('precision','double')); %WORKS
[net, ~] = train(net, inputs, targets, nnGPU('precision','single')); %Fails
Failure type:
Computing Resources:
GPU device #1, GeForce GTX 1650, CUDA
Warning: Error occurred while executing the listener callback for event TrainingUpdated defined for class nnet.guis.NNTrainToolModel:
Error using matlab.ui.control.internal.model.AbstractProgressIndicator/set.Value
'Value' must be a double scalar within the range of '[0, 1]'.
Error in
nnet.guis.EmbeddedTrainToolView/updateTraining (line 333)
this.EpochProgressBar.Value = this.calculateEpochProgress(epoch);
Error in
nnet.guis.StandaloneTrainToolView/updateTraining (line 376)
this.EmbeddedView.updateTraining(epoch, time, metrics);
Error in
nnet.guis.StandaloneTrainToolPresenter/updateTrainTool (line 93)
this.StandaloneTrainToolView.updateTraining(this.TrainToolModel.Epoch,...
Error in
nnet.guis.StandaloneTrainToolPresenter>@(varargin)this.updateTrainTool(varargin{:}) (line 57)
this.TrainToolModelListeners{end+1} = addlistener(this.TrainToolModel, "TrainingUpdated", @this.updateTrainTool);
Error in
nnet.guis.NNTrainToolModel/updateValues (line 138)
this.notify("TrainingUpdated");
Error in
nnet.train.TrainToolFeedback/updateOutsideSPMDImpl (line 100)
this.TrainToolModel.updateValues(statusValues, net, tr, data);
Error in
nnet.train.FeedbackHandler/updateOutsideSPMD (line 40)
this.updateOutsideSPMDImpl(net,tr,options,data,calcLib,calcNet,bestNet,status,statusValues);
Error in
nnet.train.MultiFeedback/updateOutsideSPMDImpl (line 47)
this.Handlers{i}.updateOutsideSPMD(net,tr,options,data,calcLib,calcNet,bestNet,status,statusValues);
Error in
nnet.train.FeedbackHandler/update (line 25)
this.updateOutsideSPMDImpl(net,tr,options,data,calcLib,calcNet,bestNet,status,statusValues);
Error in
nnet.train.trainNetwork>trainNetworkInMainThread (line 60)
feedback.update(archNet,tr, ...
Error in
nnet.train.trainNetwork (line 27)
[archNet,tr] = trainNetworkInMainThread(archNet,rawData,calcLib,calcNet,tr,feedback,localFcns);
Error in
trainscg>train_network (line 145)
[archNet,tr] = nnet.train.trainNetwork(archNet,rawData,calcLib,calcNet,tr,localfunctions);
Error in
trainscg (line 55)
[out1,out2] = train_network(varargin{2:end});
Error in
network/train (line 374)
[net,tr] = feval(trainFcn,'apply',net,data,calcLib,calcNet,tr);
Error in
train_dummyNN (line 41)
[net, ~] = train(net, inputs, targets, nnGPU('precision','single')); %Fails
> In nnet.guis/NNTrainToolModel/updateValues (line 138)
In
nnet.train/TrainToolFeedback/updateOutsideSPMDImpl (line 100)
In
nnet.train/FeedbackHandler/updateOutsideSPMD (line 40)
In
nnet.train/MultiFeedback/updateOutsideSPMDImpl (line 47)
In
nnet.train/FeedbackHandler/update (line 25)
In
nnet.train.trainNetwork>trainNetworkInMainThread (line 60)
In
nnet.train.trainNetwork (line 27)
In
trainscg>train_network (line 145)
In
trainscg (line 55)
In
network/train (line 374)
In
train_dummyNN (line 41)
Elapsed time is 1.478077 seconds.
Performance of the network: 0.638181

Answers (1)

Gayathri
Gayathri on 10 Feb 2025
I do not have access to a GPU to reproduce this issue. But, please confirm that "inputs" and "targets" are single precision datas while using the below code.
[net, ~] = train(net, inputs, targets, nnGPU('precision','single')); %Fails
You can also train with single precision GPU using the "nndata2gpu" function as shown below.
% Here inputs,targets are original double precision data
net = configure(net,inputs,targets);
sx = nndata2gpu(inputs,'single');
st = nndata2gpu(targets,'single');
[net,~] = train(net,sx,st,'useGPU','yes');
For more information on the "nndata2gpu" function, please refer to the below documentationlink.
You can also refer to the following MATLAB Answers link below to get a clear idea of the above methods.
  3 Comments
Gayathri
Gayathri on 12 Feb 2025
Edited: Gayathri on 12 Feb 2025
I tried running a dummy code in MATLAB R2023b, and it runs fine at my end. Please look into the below code which I had run to look into the issue.
neurons=10;
xvars=rand(700000,6);
yvar=rand(700000,1);
x = single(xvars');
t = single(yvar');
trainFcn='trainscg';
net = fitnet(neurons,trainFcn);
net.input.processFcns = {'removeconstantrows','mapminmax'};
net.output.processFcns = {'removeconstantrows','mapminmax'};
net.trainParam.showWindow = 0;
net.divideFcn = 'dividerand'; % Divide data randomly
net.divideMode = 'sample'; % Divide up every sample
net.divideParam.trainRatio = 60/100;
net.divideParam.valRatio = 20/100;
net.divideParam.testRatio = 20/100;
net.trainParam.max_fail = 10;
net.performFcn = 'mse'; % Mean Squared Error
net.trainParam.epochs=100;
[net,tr] = train(net,x,t,nnGPU('precision','single'));
y = net(x)';
I tried running the other method as well, mentioned as "single B" in your comment. That also worked fine.
With this code the network gets trained, without any errors. This is the code which I had taken from the MATLAB Answers questioned mentioned below.
Ivan Rodionov
Ivan Rodionov on 12 Feb 2025
Hello @Gayathri and thank you for your reply. For me this is unfortunately not working:
On the gtx1650 when the show window is disabled, it works
train_example_single_matlab
Computing Resources:
GPU device #1, GeForce GTX 1650, CUDA
However, this code objects on my k40 gpu:
train_example_single_matlab
Computing Resources:
GPU device #2, Tesla K40c, CUDA
Warning: Encountered unexpected error during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS
In
Warning: Encountered unexpected error during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS
In
Warning: Encountered unexpected error during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS
In
Error using gather
Encountered unexpected error during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS
Error in
Perfs_and_N = gather(hints.Perfs_and_N);
Error in
lib.calcMode.perfsGrad(calcNet,lib.calcData,lib.calcHints);
Error in
[worker.perf,worker.vperf,worker.tperf,worker.gWB,worker.gradient] = calcLib.perfsGrad(calcNet);
Error in
worker = localFcns.initializeTraining(archNet,calcLib,calcNet,tr);
Error in
[archNet,tr] = trainNetworkInMainThread(archNet,rawData,calcLib,calcNet,tr,feedback,localFcns);
Error in
[archNet,tr] = nnet.train.trainNetwork(archNet,rawData,calcLib,calcNet,tr,localfunctions);
Error in
[out1,out2] = train_network(varargin{2:end});
Error in
[net,tr] = feval(trainFcn,'apply',net,data,calcLib,calcNet,tr);
Error in
[net,tr] = train(net,x,t,nnGPU('precision','single'));
In addition, disabling the gui seems like a bandaid solution to a deeper underlying bug. Is there a way to visualize the training performance and state then without the gui?
The cuda bug persists until I restart matlab:
Error using gpuDevice
Encountered unexpected error during CUDA execution. The CUDA error was:
CUDA-capable device(s) is/are busy or unavailable
To complicate the matter, the original code I have, runs on the tesla k40 and if ignoring the display bugs seems to output a reasonble NN without crashing! I dont understand why.
%% Data Preparation
steps = 1000;
inputs = linspace(0, 10, steps); % Input data
targets = inputs.^2; % Target data (squared values)
% Convert data to double precision -> WORKS
inputs = double(inputs);
targets = double(targets);
%Shuffle data
rand_indices = randperm(steps);
inputs = inputs(rand_indices);
targets = targets(rand_indices);
clear rand_indices;
%% Create and Configure Neural Network
hidden_layer_size = [10];
net = fitnet(hidden_layer_size); % Create network with specified hidden layer size
net.trainParam.showWindow = 0;
% Set up training parameters
net.trainFcn = 'trainscg';
net.trainParam.epochs = 1E3;
net.trainParam.goal = 0;
net.trainParam.max_fail = 6;
% Data division (80:20 split for training and testing)
net.divideParam.trainRatio = 0.8;
net.divideParam.testRatio = 0.1;
net.divideParam.valRatio = 0.1; % No validation set used
%% Train the Network
tic
% %Stock -> works
% [net, ~] = train(net, inputs, targets, 'usegpu', 'yes'); % Works
% %Double -> works
% inputs = double(inputs);
% targets = double(targets);
% [net, ~] = train(net, inputs, targets, nnGPU('precision','double')); %WORKS
%Single A -> FAILS
inputs = single(inputs);
targets = single(targets);
[net, ~] = train(net, inputs, targets, nnGPU('precision','single'));
% %Single B -> FAILS
% net = configure(net,inputs,targets);
% sx = nndata2gpu(inputs,'single');
% st = nndata2gpu(targets,'single');
% [net,~] = train(net,sx,st,'useGPU','yes');
toc
%% Test the Network
% Redefine inputs and targets for testing
inputs_test = linspace(0, 10, steps); % Test input data
targets_test = inputs_test.^2; % Test target data
% Get network outputs
outputs = net(inputs_test);
performance = perform(net, targets_test, outputs); % Evaluate performance
fprintf('Performance of the network: %f', performance); % Format to 4 decimal places
%% Plot Results
figure;
% Plot actual targets (blue line)
plot(inputs_test, targets_test, 'b', 'LineWidth', 2);
hold on;
% Plot network outputs (red line)
plot(inputs_test, outputs, 'r', 'LineWidth', 2);
% Add labels, title, and legend
legend('Target', 'Network Output');
title('Inputs vs Targets and Network Outputs');
xlabel('Input');
ylabel('Output');
grid on;
hold off;

Sign in to comment.

Categories

Find more on Image Data Workflows in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!