Find rows with common values using intersect
10 views (last 30 days)
Show older comments
I have a dataset f with numbers from 1 to 110 and I want to find out which rows have two numbers in common. When I compare the found rows to each other to see which ones have which numbers in common. However, intersect of course also gives me output C when there is only 1 value in common. Is there a different way to do this so I only get the output when the rows have two numbers in common as for example: C = [1,2]?
1 Comment
Matt J
on 28 Mar 2022
Are you only interested in pairs of rows with exactly 2 elements in common? What if two rows have 3 or 4 elements in common?
Accepted Answer
Matt J
on 28 Mar 2022
Edited: Matt J
on 28 Mar 2022
Perahps as follows.
A=readmatrix('example.txt');
[m,n]=size(A);
discard=1:m+1:m^2;
p=perms(1:n); np=size(p,1);
IJ=cell(np,1);
for i=1:np
B=A(:,p(i,:));
count=sum(reshape(A,m,1,n)==reshape(B,1,m,n),3);
count(discard)=0;
[I,J]=find(count==2);
IJ{i}=[I,J];
end
IJ=cell2mat(IJ);
IJ=sort(IJ,2);
IJ=unique(IJ,'rows'); %row pairs
C=cell2mat(arrayfun( @(i,j) intersect(A(i,:),A(j,:)) ,IJ(:,1), IJ(:,2),'uni',0 ));
T=table(IJ,C,'Var',{'Row Indices','Common Elements'})
More Answers (1)
John D'Errico
on 28 Mar 2022
Edited: John D'Errico
on 28 Mar 2022
Intersect is the wrong way to approach this, I think. And at the same time, the solution is probably going to seem easy, once you understand what I write here.
Instead, I would build a matrix, sort of a graph that links rows of your original array, with the numbers found in them. I'll need to zap away the NaNs of course.
I = repmat((1:120)',[1 4]);I = I(:);
J = data(:);
I(isnan(J)) = [];
J(isnan(J)) = [];
Now, build an array. Mine will be sparse, since I don't know how large your real data is, and the sparse function is perfect to buld the array I want. (I could use accumarray too, but I want this to be sparse.)
G = sparse(I,J,1,120,110);
spy(G)
So the rows of the matrix plotted correspond to rows of your original array. The columns of the matrix are the numbers in each row.
Now, how will we find which rows have 2 or MORE rows in common? EASY PEASY! A matrix multiply will do the trick. Essentially, it counts the number of intersections between any pair of rows, then represents that as a matrix. (Think about how this works. It does EXACTLY what you need. Remember that every element of G is either 1 or zero. So the matrix multiply is perfect here.)
GG = G'*G;
So if the (i,j) element of GG is zero, then there were NO common elements between row i and row j. There will be as many as 4 elements in common between rows. But I would guess, without looking carefully at your data, than only the diagonal elements of GG will be as large as 4, or even 3. And a diagonal element of GG is useless, since that tells me how many elements row i has with row i.
GG = GG - diag(diag(GG)); % Kill off all diagonal elements of GG
GG = GG > 1; % we only care about pairs of rows with at least 2 elements in common
spy(GG)
GG is a 120x120 matrix. It is of course symmetric. Every non-zero element of GG indicates a pair of rows that have 2 or more elements in common.
And since GG is symmetric, this tells us that row i and row j are related in the saye way that row j and row i were related. Triu will trash half of those pairs.
[i1,i2] = find(triu(GG));
spy(triu(GG))
And now the solution should be easy to visualize.
rowswithcommonelements = [i1,i2]
rowswithcommonelements =
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
10 11
1 12
2 12
3 13
12 13
4 14
13 14
5 15
14 15
6 16
15 16
7 17
16 17
8 18
17 18
9 19
18 19
10 20
11 20
19 20
1 21
12 21
13 22
21 22
14 23
22 23
15 24
23 24
16 25
24 25
17 26
25 26
18 27
26 27
19 28
27 28
11 29
20 29
28 29
1 30
21 30
22 31
30 31
23 32
31 32
24 33
32 33
25 34
33 34
26 35
34 35
27 36
35 36
28 37
36 37
11 38
29 38
37 38
1 39
30 39
31 40
39 40
32 41
40 41
33 42
41 42
34 43
42 43
35 44
43 44
36 45
44 45
37 46
45 46
11 47
38 47
46 47
1 48
39 48
40 49
48 49
41 50
49 50
42 51
50 51
43 52
51 52
44 53
52 53
45 54
53 54
46 55
54 55
11 56
47 56
55 56
1 57
48 57
49 58
57 58
50 59
58 59
51 60
59 60
52 61
60 61
53 62
61 62
54 63
62 63
55 64
63 64
11 65
56 65
64 65
1 66
57 66
58 67
66 67
59 68
67 68
60 69
68 69
61 70
69 70
62 71
70 71
63 72
71 72
64 73
72 73
11 74
65 74
73 74
1 75
66 75
67 76
75 76
68 77
76 77
69 78
77 78
70 79
78 79
71 80
79 80
72 81
80 81
73 82
81 82
11 83
74 83
82 83
1 84
75 84
76 85
84 85
77 86
85 86
78 87
86 87
79 88
87 88
80 89
88 89
81 90
89 90
82 91
90 91
11 92
83 92
91 92
1 93
84 93
85 94
93 94
86 95
94 95
87 96
95 96
88 97
96 97
89 98
97 98
90 99
98 99
91 100
99 100
11 101
92 101
100 101
1 102
2 102
93 102
3 103
94 103
102 103
4 104
95 104
103 104
5 105
96 105
104 105
6 106
97 106
105 106
7 107
98 107
106 107
8 108
99 108
107 108
9 109
100 109
108 109
10 110
11 110
101 110
109 110
0 Comments
See Also
Categories
Find more on Creating and Concatenating Matrices in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!