Custom loss function (based on error multiplication rather than sum) in classification neural network

Hi everyone,
First, thank you! This is a fantastic community from which I’m learning so much. This is my first question (hopefully, I’ll be able to contribute answers in the future!).
I have a system consisting of 10 elements, where each element can exist in one of four states (or classes). This means the system has 410410 possible states. For each element, I have 61 features that can be used to predict its state. I’ve experimented with different neural networks (FF networks have worked well so far), mainly focusing on predicting the labels of individual elements.
However, I’ve encountered some challenges:
  1. The classes are naturally imbalanced.
  2. The problem is non-deterministic, meaning two identical feature vectors can correspond to different labels.
I’ve been addressing these issues with relative success by applying techniques such as downsampling, oversampling, data augmentation, and soft labels (the latter has been the most effective).
Now, I want to predict the probability of the entire system being in each of its 410410 states. One issue I’ve noticed is that a misclassification error of 0.05 has minimal impact when the classification is close to random (e.g., 0.25), but it has a significant impact when probabilities are closer to 1 or 0.
What I’d like to do next is implement a loss function that considers the entire system rather than individual elements, while still being based on predictions for each element. My idea is to:
  1. Take batches of 10 observations (corresponding to the 10 elements of the system).
  2. Compute the probability of each element belonging to each of the 4 classes.
  3. Calculate the probability of the system being in each of its 410410 possible states based on these predictions.
  4. Sort these probabilities and use the known labels to find the index of the correct state.
  5. Minimize this index.
Does this approach make sense? Is it feasible? And if so, how could it be implemented?
Many thanks!
David

2 Comments

I have a system consisting of 10 elements, where each element can exist in one of four states (or classes). This means the system has 410410 possible states.
4^10
ans = 1048576
You would appear to have over 1 million states, not four hundred thousand.
Absolutely, I tried to copy-paste 4^10 from elsewhere. Sorry for the unchecked mistake.

Sign in to comment.

 Accepted Answer

Using trainnet you can provide any loss function you wish. However, a multiplicative loss function sounds like a doubtful idea. Multiplications of many terms can quickly underflow or overflow, which is why people normally try to optimize loglikelihoods instead of likelihoods.

3 Comments

Well, technically it would only multiply 10 terms (a 1000000 times) for each mini-batch and find the index with the right label. The loss would be abs(1-index).
I may sound too much computing right? Would it be computationally feasible if I aim for only 5 terms?
I may sound too much computing right?
There's no way to tell. You haven't given us any way of estimating the computation time to create your 1e6x10 probability matrix. Certainly the loss computation itself doesn't look very arduous, once that matrix has been calculated.
pmatrix=rand(1e6,10);
tic; [~,index]=min(prod(pmatrix,2)); loss=index-1; toc
Elapsed time is 0.003210 seconds.
However...
The loss would be abs(1-index)
...that is not a differentiable (or even a continuous) loss function. It's piecewise constant. None of the standard gradient-based training algorithms are going to be able to handle it.

Sign in to comment.

More Answers (0)

Categories

Find more on Deep Learning Toolbox in Help Center and File Exchange

Products

Release

R2024b

Asked:

on 21 Jan 2025

Commented:

on 22 Jan 2025

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!