Find sets of consistent patterns with a variable pattern index

Question

Daniel on 25 Jul 2012

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/44556-find-sets-of-consistent-patterns-with-a-variable-pattern-index

Suppose I have a matrix in which the rows index multiple runs of a clustering algorithm and the columns index the clusters that each data point is assigned to. The algorithm clusters the data points but does not always use consistent names across runs (i.e., all of the data points which belong in the same cluster will typically be clustered together - with some probability - but, assuming there are 3 clusters, whether this cluster is tagged as 1 or 2 or 3 will vary from run to run).

For example:

X = [1 1 2 2 1 1 1 3 3 3;

1 1 2 2 1 1 1 3 3 3;

2 2 1 1 2 2 2 3 3 3;

3 3 2 2 3 3 3 1 1 1;

2 2 3 3 2 2 2 1 1 1];

In this matrix, columns [1 2 5 6 7] are always tagged with the same index number, columns [2 3] are always tagged with the same index number (but a different number than is used for the other clusters) and columns [8 9 10] are always tagged with the same index number (again different from the other two clusters).

Is there a way that I can identify which columns are consistently (or are probabilistically more likely to be) clustered together, ignoring the actual index that is used.

I've considered using find to index items within a row that are the same for each different cluster number and then using intersect to find the sets of column indexes which are consistent. I haven't, however, come up with an efficient method. Any suggestions would be greatly appreciated.

Thanks,

Dan

5 Comments
Show 3 older commentsHide 3 older comments

Daniel on 26 Jul 2012

Open in MATLAB Online

@Image Analyst: the clustering algorithm is probabilistic so the same items will be allocated to the same clusters some (even most) of the time (depending on the variability of the data being clustered) but not necessarily all of the time. Consequently, repmat won't help in this case. The matrix I posted was just a simple example of the problem.

@Matt Kindig: My above reply explains why the rows all come out the same. Nevertheless, I think you're method will work so long as the first column is consistently given the same index. A problem arises if the data point indexed by the first column is probabilistically assigned. Then you wind up changing the indexes every time column 1 changes clusters. For example in the following, the first column is always grouped with columns [2 5 6 7] except in the last two rows:

if true
  % code
  >> X

X =

   1     2     2     1     1     1     3     3     3
   1     2     2     1     1     1     3     3     3
   2     1     1     2     2     2     3     3     3
   3     2     2     3     3     3     1     1     1
   2     3     3     2     2     2     1     1     1
   3     2     2     3     3     3     1     1     1
   2     3     3     2     2     2     1     1     1

>> Xmodified

Xmodified =

   1     2     2     1     1     1     3     3     3
   1     2     2     1     1     1     3     3     3
   1     2     2     1     1     1     3     3     3
   1     2     2     1     1     1     3     3     3
   1     2     2     1     1     1     3     3     3
   2     3     3     2     2     2     1     1     1
   2     1     1     2     2     2     3     3     3
  end

This is a good starting point though. Thanks,

Dan

Matt Kindig on 26 Jul 2012

Well using my method would force column 1 to always be assigned to cluster 1, by definition. You can then count the number of rows that contain a 1 for columns 2-end to determine the probability of matches with column 1. Similarly, you could count the number of 2's that occurs in columns 3-end to get the probability of matches with column 3, and so on.

Image Analyst on 26 Jul 2012

Yes, Daniel you explanation is what I was expecting, though your initial example didn't show that. However your latest example does show that. So for X, if it did pick consistent cluster label numbers, it would have given Xmodified. But it doesn't. So the problem is for any given row, let's say the last row, how do we know that the 3 in X should really be a 1, the 2 should stay a 2, and the 1 should really be a 3, versus already being the actual numbers they're supposed to be, like the first 2 rows were?

Or take the next to the last row. It looks like the 1 is right but 2 and three are swapped. OR, are they all right and the 2 and 3 are just misclassifications due to your probabilistic nature of your classification algorithm?

Sign in to comment.

Sign in to answer this question.

Find sets of consistent patterns with a variable pattern index

5 Comments
Show 3 older commentsHide 3 older comments

Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

Find sets of consistent patterns with a variable pattern index

5 Comments Show 3 older commentsHide 3 older comments

Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

5 Comments
Show 3 older commentsHide 3 older comments