Split training images based on label in csv sheet

1 view (last 30 days)
I have collected 1000 images to study deep learning. These images are stored in a single folder, and they have sequential names as:
Image0001.png
Image0002.png
...
Image1000.png
I have another csv sheet recording the categories of these images. The sheet has the following contents:
ImageName Category
Image0001 Dog
Image0002 Cat
Image0003 Panda
...
Image1000 Panda
There are a total of 3 categories: Dog, Cat, Panda. Out of 1000 images, there are
300 dog images
500 cat images
200 panda images
So there are less panda images than others. In order to do a fair training, I would like to
  1. Randomly select 200 dog images
  2. Randomly select 200 cat images
Then I will have 200 images for each category. For training, I would like to select 120 images from each category, leaving 80 image of each category for testing.
My question is that is there an easy way (i.e. a simple command) to randomly select equal number of images for each category according to the csv sheet? Further, what's the function I shall use to split the selected images to training and testing data set with the portion of each category the same?
  1 Comment
jonas
jonas on 24 Jul 2018
I can help you with the first part of the question if you upload the excelfile with strings. There is to my knowledge no single command that will do everything for you, but the process should only involve a few general commands for grouping and randomization.

Sign in to comment.

Answers (1)

abdallah hegazy
abdallah hegazy on 27 Jan 2020
if you solve this issue i will appreciate sharing your code

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!