MATLAB not calculate all my data

Hi, im now having some problem with my code in matlab. Firstly, i have 19 sets of subjects with each of it has 25 data in N x 3 matrix (3D data). My problem arise when i want to calculate the distance between the data it only calculate until 19th data instead of the total which is 25 for each of the sets. Means every set, it only calculate until 19th data and the rest which start from 20th until 25th data is not been calculated. The calculation is the same for all the data. May i know how could i fix my code and what should i do? I really appreciate anybody who could help to solve my problem. Thank you so much. This is how my code looks like:
for A = 1: size(set,2)
for B = 1: size(fil,2)
i = double(set{:,A}{:,B});
out{A} = squareform(pdist(i));
temp{A}= out{A};
end
d{A} = temp;
list{A} = setname{A};
end

18 Comments

dpb
dpb on 29 Apr 2018
Edited: dpb on 30 Apr 2018
for A = 1: size(set,2)
for B = 1: size(fil,2)
i = double(set{:,A}{:,B});
out{A} = squareform(pdist(i));
...
Your loop on A only goes to 19 so you're only going to get 19 elements from
i = double(set{:,A}{:,B});
Okay. Is it possible to calculate until the 25th data? Because A has 19 elements and B has 25 from each of the elements. May I know if its possible to be calculate and if yes, what should I do to fix it? Thank you so much.
Certainly it's possible, it wasn't clear just what set consists of but as you've written it, need a double loop inside the outer for each set that ends up taking all pairs of columns.
I think it's probable can eliminate some of the loops using the builtin vector-awareness of pdist but need to see an actual data set the way it is stored and what it is that you're trying to calculate for sure to write specific code.
A subset of the data would be all needed; show a small section of the other data for specific storage pattern for just a couple "sets" to get the layout.
Actually, i try to calculate the euclidean distance for my 3D data. This is how i store my 19 sets:
Here how the 25 element stored in each of 19 set:
Here is how the data for each set:
Images are nearly useless; my eyes aren't good enough to read them and can't operate on the data...
Ohh I see. Does it mean I need to give you my data and coding files so you can get better view how my codes work?
dpb
dpb on 30 Apr 2018
Edited: dpb on 30 Apr 2018
Attach a .mat file with a subset is easiest, yes...there's enough code already posted; if can have a dataset to operate on that starts in the same form is most helpful; can make assumptions but they may not be fulfilled.
Nurul Atifah
Nurul Atifah on 1 May 2018
Edited: Nurul Atifah on 1 May 2018
Here i attached the workspace i had save in matlab with the data in it, in .mat file as you requested. Once again, i am really appreciate that you help me. thank you so much.
OK, got it...wowsers!!! That's deeply nested, indeed!!! Why so many layers down to get to the data? That has a lot to do with how difficult it is to operate on the data.
At the very bottom is a dataset which is a remnant of an earlier incarnation of what is now the builtin table class; it is recommended to use it instead. Was this code inherited from some time back, perhaps?
Secondly, maybe we don't have enough code posted after all; I'd prefer to see if couldn't solve the problem at a higher level of revisiting how the data are created instead of leaving as is and having to deal with that...are you allowed to change whatever you want or are there restrictions to what you're allowed to do here?
Let's see the code that loads the data and see about making it more efficient from that point instead...this can be dealt with, but it would essentially be taking what the current storage is and rearranging into something different so why not arrange it that way from the beginning?
OBTW, using set as a variable name is a bad idea; that's the builtin function for munging on objects of all sorts, primarily graphics and to alias it is likely to cause grief elsewhere...
Yes, it quite nested because the data was in a sub folder of a folder. Here i attached the code how i load the data.
dpb
dpb on 2 May 2018
Edited: dpb on 2 May 2018
No, the data are deeply nested because you kept adding layers of {} on top of existing cell arrays...then topped it off by using dataset. :) Give me a little while to recast that some...I'll start by consolidating the return from textread into array of double instead of a cell array and then consider from there given the rest of the structure in the input files....
Alright. Okay. Just want to say thank you so much for helping me. Im stuck with this for quite some time. :)
dpb
dpb on 2 May 2018
Edited: dpb on 2 May 2018
Almost there with first cut...do you know the size of each array ahead of time; it appears they're all the same length in the given dataset, can that always be assumed to be true? And, if so, is the length known a priori or must it be determined from the data itself?
If the data are same-sized, I'd suggest using a 3D array of
nObs X 25*3 X nFiles
as the most efficient holding pattern as well as simplest to traverse. That way, there are no cell arrays to dereference and each file is simply one plane of the 3D array and consequently computations across or within files can be far more easily vectorized using the power of Matlab syntax...but if the data arrays aren't all of the same length, then do need a cell array to hold disparate sizes (of course, then you're going to run into issues in comparing across files when/where that happens).
For confirmation, please explain precisely which sets of position measurements are to be cross-compared.
Nurul Atifah
Nurul Atifah on 2 May 2018
Edited: Nurul Atifah on 2 May 2018
First of all, I have 19 subjects with each has 25 expression data. The total amount of data is the same for each of the of the subject (25 data) which has 83 points in 3D. It means that every 25 data has 83 x 3 points. I want to calculate the euclidean distance between those datasets to identify the subject. So, later when I want to train my data, it can have predictor and response to identify the subject. As example, i want to calculate euclidean distance of the data to show that it is anger expression for the subject. Do you want me to give the sample folder of data to show you how I sorted it? And is it my explaination is well enough?
pdist computes the pairwise distances of each element of a given array so using it on each of the 25 position-data arrays for a given subject will return 25 83*82/2=3403 distance vectors (ignoring squareform orientation that effectively doubles the size by creating the symmetric matrix from the upper triangular elements vector).
Consequently, for 19 (or however many there happened to be) subjects you would end up with 25 such vectors each, correct?
Nurul Atifah
Nurul Atifah on 2 May 2018
Edited: dpb on 2 May 2018
Yes, that's correct. I will end up with 25 for each of subject. One more thing, in your opinion which one is better for calculating the distance for this large datasets, is it using pairwise or euclidean distance? The code I might be use for euclidean is A = bsxfun(@minus,m,[1,2,3]); C = sqrt(sum(A.^2.2));
OK, just wanted to be sure understood specifically which distance measure you actually wanted; within variable or between (or both, maybe).
The latter question is a detail can test when have the rest working; in general it's faster to use straight-ahead vectorized functions over bsxfun but that can be a tried alternative if the first is shown to be performance lacking (I doubt that will be the case albeit when dealing with sizable data sizes, computation time is inevitably going to be noticeable).
I don't have time this instant to finish up; should be able to find a few spare moments later tonight...but I do now think I know the problem definition and believe the implementation is now straightforward and quite a lot simplified from your first try--not that that is intended as criticism; it's easy to get lost in the weeds in cell arrays when textread is insistent upon returning everything as a cell array even when it isn't needed (or would be better if it weren't). Not much in the documentation that helps the beginner understand this (as in nothing :( ).
Nurul Atifah
Nurul Atifah on 3 May 2018
Edited: Nurul Atifah on 3 May 2018
Okay. May I know do you mean straight ahead vectorize function is the one that I use earlier which is out = squareform(pdist(m)), right? I want to calculate between points to another points within the variable. I would like to use the euclidean distance if possible. And one more thing, is it gonna effect the code later if I somehow would like to try to calculte using another algorithm? Its okay, I can take criticsm because im myself quite lacking and new in this and thats why I need help from you guys. :)

Sign in to comment.

Answers (1)

dpb
dpb on 3 May 2018
Edited: dpb on 3 May 2018
Was too tired last night; building fence is tough work for old men... :) Got 3 mi in; only 3 mi more to go... :)
Try the following to see if will read your data successfully...you'll end up with a 4D array of a series of 3D arrays which are stacked (planes) of 25 81x3 2D arrays if I didn't screw things up.
% calling folder and data directory
N=81; M=3; % size of data array of position data
projdir = 'FaceData';
d=dir(projdir); % use a name for the dir() struct
d=d(~ismember({d.name},{'.','..'})); % remove the current, parent references
s=d([files.isdir]); % the subdirectories to traverse (shorter name)
S=length(s); % number subdirs (19)
f=dir(fullfile(s(1).folder,s(1).name, '*.DATA') ); % directory for first to find F
F=length(f); % number files each subject (25)
data=zeros(N,M,S,F);
for j=1:S
f=dir(fullfile(s(j).folder,s(j).name, '*.DATA') ); % directory for each
for k=1:F % over found files in the folder
fname=fullfile(f(k).folder,f(k).name); % fully-qualified filename
data(:,:,k,j)=dlmread(fname,'delimiter',' '); % read data; save in 4D array
end
end
dataname= {s.name}.'; % save the subdirectory names
NB: I really don't like the hardcoded sizes as a general rule but it appeared that your cases are pre-ordained to be of a known size so it's probably not too bad. If there is need to have variable numbers/sizes, we can deal with that going forward one way or another; if absolutely had to could go back to cell arrays but just not so deeply entwined; just one level instead of three! :)
To then do the distances, we'll just walk through the array backwards from the last dimension across the 19 subject sets to the 25 observations from which can call pdist for each array in turn.
But, let's fix any typos/etc., I've made here first...

4 Comments

Nurul Atifah
Nurul Atifah on 3 May 2018
Edited: Nurul Atifah on 3 May 2018
Oh I see. Its okay. Take your time. Good luck on building the fences. It must be so tough. :) And im sorry but may i know where is actually the following that you ask me to try? Is it like a link or etc?
Oh dang!!! You would want it, wouldn't you? :) My bad, I forgot to add the code...it's short enough I'll just paste in into the Answer above.
Thanks; spring has come and we finally got some rain so pastures are beginning to green up and now things are in a rush...and I don't move as quickly as once't upon a time... :(
Nurul Atifah
Nurul Atifah on 3 May 2018
Edited: Nurul Atifah on 3 May 2018
Haha. Its okay if you dont move as quickly as ever, at least you still done a great job. Your help means so much to me. Its really good thing you can help me unsolve this as im barely new and first time facing this kind of problem. Thank you so much for your time and hard work. Im really appreciate it. Nice to learn from you. Wishing that spring blooms your each day with lots of happiness. Thanks once again and happy spring. :)
OK, came in for a break; try the above and see if it does read your data and return the data and datename arrays. It would be surprising if I didn't make a typo or other gaffe without having any data to test, but should be close...if it does happen to run, then make a .mat file of that array and attach it and I can work on the next step with real data instead of made up. If it doesn't work and you can't see an obvious error and correct it, attach the full error text and I'll try to look back in later on tonight.

Sign in to comment.

Asked:

on 29 Apr 2018

Commented:

dpb
on 3 May 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!