Massive minimisation to fill in matrix
Show older comments
I have the following (excel) dataset:
m | r
----|------
2.0 | 3.3
0.8 |
| 4.0
1.3 |
2.1 | 5.2
| 2.3
| 1.9
2.5 |
1.2 | 3.0
2.0 | 2.6
I want to fill in missing values using the following condition:
Denote as R the pairwise correlation between the above two columns (around 0.68). Denote as R* the correlation after the empty cells have been filled in. Fill in the table so that (R - R*)^2 = 0. This is, I want to keep the correlation structure of the data intact.
This is my code:
clear all;
m = xlsread('data.xlsx','A2:A11') ;
r = xlsread('data.xlsx','B2:B11') ;
rho = corr(m,r,'rows','pairwise');
x0 = [1,1,1,1,1,1];
lb = [0,0,0,0,0,0];
f = @(x)my_correl(x,rho);
SOL = fmincon(f,x0,[],[],[],[],lb)
where the function my_correl is:
function X = my_correl(x,rho)
sum_m = (11.9 + x(1) + x(2) + x(3));
sum_r = (22.3 + x(1) + x(2) + x(3));
avg_m = (11.9 + x(1) + x(2) + x(3))/8;
avg_r = (22.3 + x(4) + x(5) + x(6))/8;
rho_num = 8*(26.32 + 4*x(1) + 2.3*x(2) + 1.9*x(3) +...
0.8*x(4) + 1.3*x(5) + 2.5*x(6)) - sum_m*sum_r;
rho_den = sqrt(8*(22.43 + (4*x(1))^2 + (2.3*x(2))^2 + ...
(1.9*x(3))^2) - sum_m^2)*sqrt(8*(78.6 + (0.8*x(4))^2 +...
(1.3*x(5))^ + (2.5*x(6))^2) - sum_r^2);
X = (rho - rho_num/rho_den)^2;
end
This function computes the correlation manually, where every missing data is a variable x(i).
The problem is that my actual dataset has more than 20,000 observations. There is no way I can create that rho formula manually.
How can I fill in my dataset?
Answers (1)
Use vectorized mean and summation commands, e.g.
f = @(x)my_correl(x,rho,m,mIndices, r,rIndices);
function X = my_correl(x,rho,m,mIndices, r,rIndices)
sum_m = sum(m,'omitnan') + sum(x(mIndices));
sum_r = sum(r,'omitnan') + sum(x(rIndices));
...
Categories
Find more on Audio I/O and Waveform Generation in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!