It is being said that Resnet model requires less training time but when I used resnetLayer function of matLab to create a residual network why it takes more time

7 views (last 30 days)
It is being said that Resnet model requires less training time as it eliminate vanishing gradient problem but when I used resnetLayer function of matLab to create a residual network and do the training it takes more time in comparison to CNN-LSTM model why it is so?

Answers (1)

Hari
Hari on 15 Sep 2023
Hi Debojit,
I understand that you have observed, the “ResNet” model is taking more time to train compared to the “CNN-LSTM” model, contrary to the expectation that “ResNet” should have faster training due to its ability to address the vanishing gradient problem.
The “ResNet” model is known for its ability to mitigate the vanishing gradient problem, which can occur in deep neural networks during training.
However, the actual training time of a model can be influenced by various factors, including the specific architecture, dataset, hyperparameters, and implementation details. It's important to note that the “ResNet” architecture itself does not guarantee faster training time in all scenarios compared to other models like “CNN-LSTM”.
Here are a few reasons why you might observe longer training time with the “ResNet” model compared to the CNN-LSTM model in your specific case:
  1. Model complexity: “ResNet” models can have a larger number of parameters compared to CNN-LSTM models, especially if you use deeper “ResNet” variants like ResNet-50 or ResNet-101. This increased complexity may require more computational resources and training time.
  2. Dataset characteristics: The characteristics of your dataset, such as size, complexity, and class imbalance, can affect training time. If your dataset is particularly large or contains complex patterns, it may require more time to train regardless of the model architecture.
  3. Hyperparameters: The choice of hyperparameters, such as learning rate, batch size, and regularization techniques, can impact training time. Suboptimal hyperparameter settings may result in slower convergence or require more iterations to achieve good performance.
  4. Implementation details: The efficiency of the implementation, including the software framework and hardware used, can affect training time. Different frameworks or hardware configurations may have varying levels of optimization, which can influence the overall training speed.
Refer to the documentation of “Sequence Classification Using CNN-LSTM Network” for more information.
Refer to the documentation of “resnetLayers” for more information.
I hope this helps.
Thanks,
Hari.

Categories

Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!