Most efficent way of finding submatrices of a matrix

107 views (last 30 days)
Say we have a matrix of zeros and ones
0 1 1 1 0 0 0
1 1 1 1 0 1 1
0 0 1 0 0 1 0
0 1 1 0 1 1 1
0 0 0 0 0 0 1
0 0 0 0 0 0 1
and we want to find all the submatrices (we just need the row indices and column indices of the corners) with these properties:
  1. contain at least L ones and L zeros
  2. contain max H elements
i.e. take the previous matrix with L=1 and H=5, the submatrix 1 2 1 4 (row indices 1 2 and column indices 1 4)
0 1 1 1
1 1 1 1
satisfies the property 1 but has 8 elements (bigger than 5) so it is not good;
the matrix 4 5 1 2
0 1
0 0
is good because satisfies both the properties.
The objective is then to find all the submatrices with min area 2*L, max area H and containg at least L ones and L zeros.
If we consider a matrix as a rectangle it is easy to find all the possibile subrectangles with max area H and min area 2*L by looking at the divisors of all the numbers from H to 2*L.
For example, with H=5 and L=1 all the possibile subrectangles/submatrices are given by the divisors of
  • H=5 -> divisors [1 5] -> possibile rectangles of area 5 are 1x5 and 5x1
  • 4 -> divisors [1 2 4] -> possibile rectangles of area 4 are 1x4 4x1 and 2x2
  • 3 -> divisors [1 3] -> possibile rectangles of area 3 are 3x1 and 1x3
  • 2*L=2 -> divisors [1 2] -> possibile rectangles of area 2 are 2x1 and 1x2
I wrote this code, which, for each number finds its divisors and cycles over them to find the submatrices. To find the submatrices it does this: take for example a 1x5 submatrix, what the code does is to fix the first line of the matrix and move step by step (along all the columns of the matrix) the submatrix from the left edge of the matrix to the right edge of the matrix, then the code fixes the second row of the matrix and moves the submatrix along all the columns from left to right, and so on until it arrives at the last row.
It does this for all the 1x5 submatrices, then it considers the 5x1 submatrices, then the 1x4, then the 4x1, then the 2x2, etc.
The code do the job in 2 seconds (it finds all the submatrices) but for big matrices, i.e. 200x200, a lot of minutes are needed to find all the submatrices. So I wonder if there are more efficient ways to do the job, and eventually which is the most efficient.
This is my code:
clc;clear all;close all
%%INPUT
P= [0 1 1 1 0 0 0 ;
1 1 1 1 0 1 1 ;
0 0 1 0 0 1 0 ;
0 1 1 0 1 1 1 ;
0 0 0 0 0 0 1 ;
0 0 0 0 0 0 1];
L=1; % a submatrix has to containg at least L ones and L zeros
H=5; % max area of a submatrix
[R,C]=size(P); % rows and columns of P
sub=zeros(1,6); % initializing the matrix containing the indexes of each submatrix (columns 1-4), their area (5) and the counter (6)
counter=1; % no. of submatrices found
%%FIND ALL RECTANGLES OF AREA >= 2*L & <= H
%
% idea: all rectangles of a certain area can be found using the area's divisors
% e.g. divisors(6)=[1 2 3 6] -> rectangles: 1x6 6x1 2x3 and 3x2
tic
for sH = H:-1:2*L % find rectangles of area H, H-1, ..., 2*L
div_sH=divisors(sH); % find all divisors of sH
disp(['_______AREA ', num2str(sH), '_______'])
for i = 1:round(length(div_sH)/2) % cycle over all couples of divisors
div_small=div_sH(i);
div_big=div_sH(end-i+1);
if div_small <= R && div_big <= C % rectangle with long side <= C and short side <= R
for j = 1:R-div_small+1 % cycle over all possible rows
for k = 1:C-div_big+1 % cycle over all possible columns
no_of_ones=length(find(P(j:j-1+div_small,k:k-1+div_big))); % no. of ones in the current submatrix
if no_of_ones >= L && no_of_ones <= sH-L % if the submatrix contains at least L ones AND L zeros
% row indexes columns indexes area position
sub(counter,:)=[j,j-1+div_small , k,k-1+div_big , div_small*div_big , counter]; % save the submatrix
counter=counter+1;
end
end
end
disp([' [', num2str(div_small), 'x', num2str(div_big), '] submatrices: ', num2str(size(sub,1))])
end
if div_small~=div_big % if the submatrix is a square, skip this part (otherwise there will be duplicates in sub)
if div_small <= C && div_big <= R % rectangle with long side <= R and short side <= C
for j = 1:C-div_small+1 % cycle over all possible columns
for k = 1:R-div_big+1 % cycle over all possible rows
no_of_ones=length(find(P(k:k-1+div_big,j:j-1+div_small)));
if no_of_ones >= L && no_of_ones <= sH-L
sub(counter,:)=[k,k-1+div_big,j,j-1+div_small , div_big*div_small, counter];
counter=counter+1;
end
end
end
disp([' [', num2str(div_big), 'x', num2str(div_small), '] submatrices: ', num2str(size(sub,1))])
end
end
end
end
fprintf('\ntime: %2.2fs\n\n',toc)
  2 Comments
cui,xingxing
cui,xingxing on 11 Jul 2020
Edited: cui,xingxing on 11 Jul 2020
good question! Similar to the question I asked, but how to get an analytical solution in a mathematical sense? I think matlab has advantages in symbolic math optimization to solve this kind of problem。
John D'Errico
John D'Errico on 11 Jul 2020
@cui - your question, if about solving a nonlinear system of equations, has nothing to do with the question here. You cannot use symbolic tools to efficiently help solve this problem.

Sign in to comment.

Accepted Answer

John D'Errico
John D'Errico on 16 Apr 2018
Edited: John D'Errico on 16 Apr 2018

Might I suggest you are doing this the wrong way? And fairly massively so?

A = [ 0     1     1     1     0     0     0
   1     1     1     1     0     1     1
   0     0     1     0     0     1     0
   0     1     1     0     1     1     1
   0     0     0     0     0     0     1
   0     0     0     0     0     0     1];

If you want to find all 2x2 submatrices of A that have at least 1 zero and at least one 1, then consider the following matrix, computed from A:

cA = conv2(A,ones(2,2),'valid')
cA =
   3     4     4     2     1     2
   2     3     3     1     2     3
   1     3     2     1     3     3
   1     2     1     1     2     3
   0     0     0     0     0     2

So the upper left corner of all such 2x2 matrices resides at the following locations:

[i,j] = find((cA >= 0) & (cA < 4));
[i,j]
ans =
   1     1
   2     1
   3     1
   4     1
   5     1
   2     2
   3     2
   4     2
   5     2
   2     3
   3     3
   4     3
   5     3
   1     4
   2     4
   3     4
   4     4
   5     4
   1     5
   2     5
   3     5
   4     5
   5     5
   1     6
   2     6
   3     6
   4     6
   5     6

The other corner of those matrices is pretty easy to find now, since you know they are 2x2 sub-matrices.

The point is, even for a relatively large matrix A (I would not even consider 200x200 even remotely large here) the above computation would be trivial, and almost immediate. For example:

A = double(rand(200,200) < 0.5);
timeit(@() conv2(A,ones(2,2),'valid'))
ans =
   0.00021959

The find will be also trivially fast. But the point is, a large amount of the code you painstakingly wrote above could be replaced by two efficient lines.

  1 Comment
giannit
giannit on 17 Apr 2018
Edited: giannit on 17 Apr 2018
I've never heard about matrix convolution, thank you very much for letting me know about this. Your way is incredibly fast and smart.
Using conv2 on a 1000x1000 matrix (with L=6, H=14) it found ~4.5 million submatrices in less than 3 seconds, really impressive! My old code would have took hours.
In the final code I had to add the if-else because I noticed that when the convolution generates a 1 row matrix, the i and j vectors generated by the find function are generated as row vectors instead of columns. Do you think the code is good wrote in this way?
Again thank you so much for the help!
A = round(rand(1000));
L = 6;
H = 14;
sub = [];
for sH = H:-1:2*L
div_sH = divisors(sH);
for k = 1:length(div_sH)
a = div_sH(k);
b = div_sH(end-k+1);
cA = conv2(A,ones(a,b),'valid');
[i,j] = find((cA >= L) & (cA <= sH-L));
if ~isempty([i,j])
if size([i,j],1) ~= 1
% rows columns areas
sub = [sub;[i,i-1+a , j,j-1+b , a*b*ones(numel(i),1)]];
else
sub = [sub;[i',i'-1+a , j',j'-1+b , a*b*ones(numel(i),1)]];
end
end
end
end

Sign in to comment.

More Answers (0)

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!