CNNのフィルタ数とフィルタサイズの決め方について

Question

0 votes

プログラミング初心者でCNNの概念についてつまづいております。

下記リンクにつきまして相談がございます。

https://jp.mathworks.com/help/deeplearning/examples/create-simple-deep-learning-network-for-classification.html

リンクではインプットレイヤーのサイズが28×28×1、始めのconvolution layerのサイズが3×3, フィルタ数8、スライド2となり、次の層にいくとフィルタ数が倍(16)となっていき最終的にフィルタ数32で終了しているかと思います。(convolution layerが合計3つあります。)

この場合、たとえばconvolution layerを二つフィルタ数を16で終了しても問題ないでしょうか。それともconvolution layerのフィルタ数が必ずインプットレイヤーのサイズを超えなければいけないなどの決まりがあるのでしょうか。

また、たとえばより大きなinput layerのサイズ(300×300×3)の場合、始めのconvolution layerのサイズが3×3, フィルタ数8、スライド2となり、次の層ではconvolution layerのサイズが3×3, フィルタ数16、スライド2、あとはfully connected layerにつなぐなどのアプローチもできるのでしょうか？それともフィルタ数を倍々にしていき、512としてinput layerのサイズを超えなければならないのでしょうか。

要点を得ない質問となってしまい恐縮ですがどうぞよろしくお願いいたします。

6 Comments
Show 4 older comments Hide 4 older comments

ssk on 7 Sep 2019

Edited: ssk on 7 Sep 2019

度々のご質問失礼いたします。スライドとフィルタ数の概念で再度質問がございます。

先日ご質問いたしましたinput layerのサイズ(300×300×3)の場合、始めのconvolution layerのサイズが3×3, フィルタ数8、スライド2と仮定すると、

サイズが3×3のフィルタが2ピクセルずつスライドして(300×300×3)のinput layerの特徴マップを作る場合、おそらく(始めのフィルタでは)数百回畳み込みを行い、その都度特徴マップを生成するという認識でお間違えないでしょうか？この内容で畳み込み自体がinput layerのサイズを全てきちんとカバーできているのか疑問に思いましたのでご連絡しました。

また、フィルタ数が8枚ということは一回目の畳み込みフィルタで演算した箇所と違う箇所からランダムにスタートし畳み込みを計8回行っているということで、それを重ね合わせて顕微鏡のようにチャネルを長くしているということでしょうか？一度コードを走らせてみたところ、300×300×3が300×300×8と変化しておりますがどのようにフィルタ数を活用しているのか気になっております。

Kenta on 8 Sep 2019

こんにちは。

>>その都度特徴マップを生成するという認識でお間違えないでしょうか？

もしかしたら、少しちがっているかもしれません。以下のように解釈すると、

「300×300×3が300×300×8と変化しておりますがどのようにフィルタ数を活用しているのか」

ということのイメージにもつながるかもしれません。

例えば、インプットサイズが、2×2で、フィルターサイズが1×1、ストライドが1、フィルター数が3という場合を考えます。

インプットの値が [1 2; 3 4]で、フィルターの値がそれぞれ、3、6、9であれば、

畳み込み演算と同様の操作をすれば、アウトプットが、[3 6; 9 12], [6 12; 18 24]、[9 18; 27 36]になります。

つまり、左上から、順番にフィルター演算をしていけば、フィルター数と同じ数だけアウトプット（例：300×300×8）が得られると思います。

ご質問いただいた例に戻ると、フィルター１つに対して、１つの特徴マップが得られる→それが8つ分→8チャンネル分の特徴マップが得られる、という流れと考えています。

なお、畳み込みは、いつも左上から、右下まで、（特に制約がなければ）全領域を通過していくイメージと思います。

ssk on 8 Sep 2019

Edited: ssk on 8 Sep 2019

畳み込みの仕方につきまして、8つある全てのフィルターで左上から右下に順番に畳み込みを行うものの、それぞれのフィルターの値が異なることから8チャンネルのそれぞれ異なる特徴マップを得られるといった認識でしょうか。(8つあるフィルターの値をこちら側で設定あるいは確認はできるのでしょうか。それともフィルターの値はランダムで割り振られるのでしょうか。)

このあたりの概念があいまいでモデルの作成にも影響が出そうでしたので、詳しく知ることができて大変参考になりました。本当にありがとうございます。

Kenta on 9 Sep 2019

>>8つある全てのフィルターで左上から右下に順番に畳み込みを行うものの、それぞれのフィルターの値が異なることから8チャンネルのそれぞれ異なる特徴マップを得られるといった認識でしょうか。

はい、その通りです。

>>8つあるフィルターの値をこちら側で設定あるいは確認はできるのでしょうか。

はい、計算後、どういう値になったかという確認は容易にできます。https://jp.mathworks.com/matlabcentral/answers/478011-workspace-cnn-bias-weight

また、「既定では、畳み込み層と全結合層の重みの初期値は、平均 0、標準偏差 0.01 のガウス分布からランダムに生成されます。初期バイアスは既定で 0 に等しくなります。」とあります。https://jp.mathworks.com/help/deeplearning/ug/setting-up-parameters-and-training-of-a-convnet.html

自分でも初期値についてカスタマイズできますが、はじめは既定のままでよいかと思います。

>>フィルターの値はランダムで割り振られるのでしょうか

上のように、正規分布から取り出されるランダムな値となります。

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Dinesh Yadav on 26 Aug 2019

1 vote

The purpose of convolution operation in images is to preserve spatial information and features. The size and stride of the filter determines how well you want to preserve the spatial information. For example, a 3x3 filter with stride one will contain more finer information than 3x3 with stride 2. Similarly, a 3x3 filter contains more spatially correlated information than a 5x5 filter.

Now the purpose of strided convolutions or max pooling after convolutions is downsizing the image and reducing the size of input for the neural network to reduce computations while preserving information.

Now coming to how to choose number of filters. As you have observed the size of image halves, but number of filters gets doubled. It is a general thumb rule, not a hardcoded rule. You can experiment by changing number of filters.

But let’s say I have 40x40x128 matrix and I must downsize it to 20x20, so what is the best way to preserve the information? To preserve the information, we increase the number of filters to 256. Even in this operation you will observe that for next step we have only half the computations as compared to previous step. If you can display the feature maps as heatmaps you can observe the features corresponding to an object in the original image.

3 Comments
Show 1 older comment Hide 1 older comment

Dinesh Yadav on 27 Aug 2019

I doubt just using 12 or 16 filters would help your cause. Kindly go through some famous convolutional neural network architectures like AlexNet, VGG16/19, GoogLeNet etc to get more indepth idea. The minimum I have seen is 32 or 64 filters in the first layer itself. However if you want to reduce the third dimension i.e lets say 40x40x256 to 40x40x128 there is a technique of using 1x1 convolutions. In this example you will use 1x1x256 filter and 128 such filters. If you choose 64 such filters the output will be 40x40x64. But then again reducing the third dimension too much can lead to loss of information. Read more about 1x1 convolutions and see how it can fit your cause.

ssk on 29 Aug 2019

Hi, Dinesh!

Thanks for your reply. I would try deeper filter.

Sign in to comment.

CNNのフィルタ数とフィルタサイズの決め方について

6 Comments
Show 4 older comments Hide 4 older comments

Accepted Answer

3 Comments
Show 1 older comment Hide 1 older comment

More Answers (0)

Categories

Tags

Community Treasure Hunt

CNNのフィルタ数と​フィルタサイズの決め​方について

6 Comments Show 4 older comments Hide 4 older comments

Accepted Answer

3 Comments Show 1 older comment Hide 1 older comment

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

CNNのフィルタ数とフィルタサイズの決め方について

6 Comments
Show 4 older comments Hide 4 older comments

3 Comments
Show 1 older comment Hide 1 older comment