What is the reason that GoogleNet takes less time than MobileNetv2 and EfficientNet-b0 for feature extraction from a set of images?

26 views (last 30 days)
I am using 3 Pre-trained networks namely GoogLeNet, MobileNetv2 and EfficientNet-b0 for feature extraction task on a set of images. Time taken (in seconds per frame) by GoogLeNet among these three networks is the least, whereas EfficientNet-b0 took the longest time. MobileNetv2 fell in between these two. In summary, we can show time taken (in seconds per frame) by these networks as following:
GoogLeNet < MobileNetv2 < EfficientNet-b0
I am trying to find a valid reason for this. Can this be attributed to the fact that depth of GoogLeNet is 22, being less than the depth of MobileNetv2 ( 53) and depth of EfficientNet-b0 being 82 ?
Can someone please put some light on this?
I have also gone through the plot shown in the following link that shows that realtive prediction time of GoogLeNet is less than MobileNetv2 which is less than EfficientNet-b0
Can anyone explain the reason behind the realtive prediction time of MobileNetv2 and EfficientNet-b0 being higher than GoogleNet?

Answers (1)

Shubham
Shubham on 27 May 2024
Hi Navneet,
The inference time or the time taken per frame for feature extraction by different pre-trained networks like GoogLeNet, MobileNetv2, and EfficientNet-b0 can indeed be influenced by several factors, including but not limited to the depth of the network. Here are some reasons why GoogLeNet might be faster than MobileNetv2 and EfficientNet-b0, and why MobileNetv2 is faster than EfficientNet-b0:
1. Network Depth:
  • GoogLeNet has a depth of 22 layers, MobileNetv2 has 53 layers, and EfficientNet-b0 has 82 layers. Generally, deeper networks require more computational resources for both forward and backward passes. The number of layers directly impacts the number of matrix multiplications and other operations, which can increase the inference time.
2. Model Complexity:
  • Model Architecture: Beyond just the depth, the architecture of these networks plays a crucial role. GoogLeNet introduces the inception module, which, despite increasing the depth and width of the network, was designed to keep the computational budget constant. MobileNetv2 uses depthwise separable convolutions, which significantly reduce the number of parameters and computational complexity compared to standard convolutions but might still be more computationally intensive than some operations in GoogLeNet. EfficientNet-b0 uses a compound scaling method that carefully scales the width, depth, and resolution of the network, leading to higher computational complexity.
  • Parameter Count: A higher number of parameters can lead to longer inference times. EfficientNet-b0, despite its efficiency in balancing depth, width, and resolution for improved accuracy, has more parameters and a more complex structure than GoogLeNet and MobileNetv2, contributing to longer inference times.
3. Computational Efficiency:
  • Operation Types: Different types of operations (e.g., depthwise separable convolutions in MobileNetv2, squeeze and excitation blocks in EfficientNet) have different computational requirements. Some operations are more efficiently executed on certain hardware architectures.
  • Optimization and Implementation: The way these models are implemented and optimized for specific hardware (GPUs, CPUs, TPUs) can also affect inference time. Some architectures might be more optimized for parallel processing on GPUs, while others might not fully leverage the hardware's capabilities.
4. Input Resolution:
  • The input resolution to the network can also affect inference time. EfficientNets are designed to scale up not just in depth and width but also in resolution, which can increase the computational load. If the EfficientNet-b0 is processing higher resolution images compared to GoogLeNet and MobileNetv2, this could further explain the increased inference time.
Summary:
While the depth of the network is a significant factor in determining the inference time, it's not the only one. The architectural decisions, types of operations used, model complexity, parameter count, and how well the model is optimized for the hardware it runs on all play crucial roles. GoogLeNet being faster than MobileNetv2, and MobileNetv2 being faster than EfficientNet-b0, can be attributed to a combination of these factors, with depth being one of the many considerations.

Categories

Find more on Deep Learning Toolbox in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!