How do I create cell/struct array with the specific lines taken from a text file?

this is the code that i currently working with:
% open the file
fid = fopen('images_list1.txt');
% initialize a lineNumer
lineNum = 0;
% keep reading the file
while 1;
% get a line of text
tline = fgetl(fid); lineNum = lineNum + 1;
% exit if the line is empty
if tline == -1;
break;
end
% check modulus of lineNum for every 2nd and 11th line
if rem(lineNum,13) == 2;
tline_2nd = tline;
%disp(tline);
elseif rem(lineNum,13) == 11;
tline_11th = tline;
%disp(tline);
end
end
May I know how do I create cell/struct arrays with the every two lines I have extracted from my text file. For the first line, I'm extracting the image name with a format of "12345678.jpg" and for the second line, i'm extracting the no. of likes, which i would want it to be in a matrix if it is possible. Much help needed.. thanks!

2 Comments

Can you upload the data file please. It is useful for us to try code on. You can upload a file by clicking the paperclip button and then both the Choose file and Attach file buttons.

Sign in to comment.

 Accepted Answer

EDIT: See the improved code at the end on this answer
% read the file as a string:
str = fileread('images_list1.txt');
% define newline:
nwl = '[\n\r]{1,2}';
% split string into blocks: '#number\nfilename.ext\n...'
fmt1 = ['^#(\d+)',nwl,'(\d+\.[a-zA-Z]{1,4})',nwl,'(.+?)(?=^#)'];
C = regexp(str,fmt1,'lineanchors','tokens');
C = vertcat(C{:});
% convert block number to numeric:
N = cellfun(@str2double,C(:,1));
%find(diff(N)~=1) % to locate incorrectly parsed blocks.
% split all 'The XXX is:' substrings of the remaining text:
fmt2 = ['^The ([^\n]+) is:',nwl,'([^\n]+)'];
D = regexp(C(:,3),fmt2,'lineanchors','tokens');
D = vertcat(D{:})';
D = vertcat(D{:})';
D = reshape(D,[],size(C,1))';
% convert number of likes to numeric:
nLikes = cellfun(@str2double,D(:,8));
This code imports all of the data in your example file, and converts the "number of likes" into a numeric vector:
>> nLikes
nLikes =
25
56
9197
11559
4
8557
2417
13
22
2820
... etc
You can explore the rest of the data in the MATLAB workspace browser. Note that I made some assumptions about the data (e.g. that the filenames consist only of digits), as per your example file.
EDIT fixed bug to read last group within file:
% read file into string:
fid = fopen('images_list1.txt','rt');
str = fscanf(fid,'%c');
fclose(fid);
% break string into blocks:
fmt1 = '^#(\d+)\n(\w+\.\w{1,4})\n([^#]+)';
C = regexp(str,fmt1,'lineanchors','tokens');
C = vertcat(C{:});
% convert block numbers into numeric:
N = cellfun(@str2double,C(:,1));
%find(diff(N)~=1) % to locate incorrectly parsed blocks.
% split block data into fields:
fmt2 = '^The ([^\n]+) is:\n([^\n]+)';
D = regexp(C(:,3),fmt2,'lineanchors','tokens');
D = vertcat(D{:})';
D = vertcat(D{:})';
D = reshape(D,[],size(C,1))';
% convert number on likes into numeric:
nLikes = cellfun(@str2double,D(:,8));

9 Comments

Hi Stephen Cobeldick,
Thanks for the answers. the number of likes works fine. but the name of the image is not reflected properly. I thought that it might be this line that is affecting the answer: N = cellfun(@str2double,C(:,1));
however i have changed it to N = cellfun(@str2double,C(:,2)); but it is still not reflected right:(
The filenames of the images are stored in the second column of the cell array C, which you can access using C(:,2). These strings cannot be trivially converted to numeric because they also include the file extension (e.g. .jpg). What form do you want these filenames in?
I would like it to be in the same format as it is (i.e 123456789.jpg) Is it possible to be done?
as it will be save into a matlab file to be used for other operations. the filenames need to be associated to the filenames in another matlab file. therefore it has to be the same. Is it possible?
is it got to do with the regular expression? I have no knowledge on regular expression.
It is not clear to me what the problem is.
The filenames have already been extracted by my code. They can be found in C(:,2). You do not need to make any changes, you just need to look at the variables in your workspace (exactly as I wrote before). The regular expression that I used works correctly for the file that you uploaded, and should work for other similar files.
Why do you think that you need to change the regular expression?
Earlier I wrote that the filenames "you can access using C(:,2)": have you tried this? They are all there, as strings in a cell array, which looks like this:
>> C(:,2)
ans =
'147028902.jpg'
'146157879.jpg'
'142012471.jpg'
'141074177.jpg'
'147404147.jpg'
'142276174.jpg'
'143934755.jpg'
... etc
Can you please explain clearly what is incorrect about these filenames. Do you mean that:
  1. the code incorrectly misses some filenames
  2. the code only partially identifies some filenames
  3. you cannot find the filenames
  4. you are not sure how to use the filenames
  5. something else that I missed?
Important you should use the new code that I put in my answer, because my first code attempt misses the last group on the file. The new code is more robust too. Here are its outputs:
>> nLikes
nLikes =
25
56
9197
11559
4
8557
2417
13
22
... etc
>> D
D =
'URL' [1x89 char] 'label' [1x38 char] 'price' '265 SGD' 'Number of likes' '25'
'URL' [1x101 char] 'label' [1x59 char] 'price' '425 SGD' 'Number of likes' '56'
'URL' [1x80 char] 'label' [1x52 char] 'price' '24 SGD' 'Number of likes' '9197'
'URL' [1x82 char] 'label' [1x52 char] 'price' '28 SGD' 'Number of likes' '11559'
'URL' [1x114 char] 'label' [1x48 char] 'price' '170 SGD' 'Number of likes' '4'
'URL' [1x118 char] 'label' [1x32 char] 'price' '82 SGD' 'Number of likes' '8557'
'URL' [1x78 char] 'label' [1x31 char] 'price' '430 SGD' 'Number of likes' '2417'
... etc
and of course the filenames:
>> C(:,2)
ans =
'147028902.jpg'
'146157879.jpg'
'142012471.jpg'
'141074177.jpg'
'147404147.jpg'
'142276174.jpg'
... etc
Hi,
So sorry, I was referring to the N variable in my workspace. it doesn't show the filenames in the N variable but instead just the number 1 to 540.
If I want to load this matlab file inside another matlab file to associate the filenames, how should i use it? attached is the other matlab file that i'm extracting features from images. what i need to achieve is the filenames of the extracted features associating with filenames from this text file.
You are quite right that N does not contain the filenames. That is because N contains the block numbers of the file data. I gave you this data because it could be useful to check the file parsing, or for your data processing.
I never stated that N contains the filenames. However I referred to the filenames multiple times, and each time I told you that they are stored in C(:,2). I even printed C(:,2) in my command window and showed you this too.
Please look at C(:,2), not N.
Yes yes. So sorry. I thought N is where filenames will be contained. Sorry that i assumed.
May i ask another qn? If I want to load this matlab file inside another matlab file to associate the filenames, how should i use it? attached above(cnn_classify.m) is the other matlab file that i'm extracting features from images. what i need to achieve is the filenames of the extracted features associating with filenames from this text file.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!