Implementing Breadth first search using connectivity matriX

37 views (last 30 days)
I have a connectivity matrix in which each row corresponds to two connected nodes or one line segment. Some of these points are connected to each other resulting in several number of clusters. I want to find the number of clusters and number of nodes in each cluster. I know this can be done using Breadth-first search(BFS) or Depth first search(DFS) but the available bfs/dfs codes are specifically written for biographs. I can't use biographs and want to use only connectivity matrix to give me the information. connectivity matrix=[1 2; 2 3; 2 4; 3 4; 3 5;4 1; 2 6 ;2 7];

Answers (3)

Walter Roberson
Walter Roberson on 16 May 2015
cmat = [1 2; 2 3; 2 4; 3 4; 3 5;4 1; 2 6 ;2 7];
numnode = max(cmat(:));
adj = zeros(numnode,numnode); %adjacency matrix
%for each pair (i,j) in the connectivity list, set adj(i,j) to 1
fromlist = cmat(:,1);
tolist = cmat(:,2);
fromtoidx = sub2ind([numnode,numnode], fromlist, tolist); %as linear indices
adj(fromtoidx) = 1;
tofromidx = sub2ind([numnode,numnode], tolist, fromlist); %connections are bidirectional
adj(tofromidx) = 1;
You now have a matrix of 0's and 1's, with adj(i,j) = 1 meaning there is a link between i and j.
With this adjacency matrix, you can calculate transitions. consider (adj)^N where ^ is matrix power, adj^N being adj*adj*adj...*adj a total of N times. In that adj^N, if there is a non-zero entry at (i,j) then there is a way to get from i to j in N steps.
An advantage of this representation is that it allows you to perform your searches in parallel.
It has been a good 30 years since I read the algorithm to determine cut-edges from adjacency graphs. Scanning some literature it looks like perhaps it isn't trivial.
I think at this point it is worth confirming your definition of "cluster". Does a cluster occur exactly when there is a single cut-edge that partitions the graph into two pieces? If so, what if there are multiple cut-edges? Do you prefer to define in terms of cut-vertices? If there are two sub-graphs, then what is the maximum number of cut-edges permitted between the two sub-graphs before you determine that they are part of the same cluster?
I worked on data classification for several years. The data we were given did not have explicit connectivity information, just high-dimensional coordinates (e.g., concentration measurements for 500 different chemicals). The algorithms we used to cluster were never able to assign restricted adjacency graphs that could say that there was no connection between any two given vertices; they could only find probabilities that a vertex was part of a cluster with particular attributes, such as finding the hyperplane that best divided the vertices into two clusters.
In this forum we used to semi-regularly get questions about how to determine the "right" number of clusters to ask a routine such as k-means to divide the data into. The answer to that is that the best classification is always achieved by placing every node into its own cluster. For any lesser number of clusters to be considered "better" you needed to define a penalty function that encouraged fewer clusters -- and that penalty function had to be problem-dependent.
If you are lucky enough to have a situation with an actual adjacency list, then before we can tell you how many clusters there are, you need to give us a formal definition of how to tell whether whether two vertices are in different clusters. Because anything beyond there being a single cut-edge or single cut-vertex that partitions the graph is no longer obvious and is probably starting to get problem dependent.
  2 Comments
nitin arora
nitin arora on 19 May 2015
Edited: Walter Roberson on 19 May 2015
Thanks Walter. I have a set of coordinates of object A and B in a given space. I found the distance of all points from each other, i.e A-A, A-B, B-B distance using pdist function. If the distance between two objects i(A or B) and j (A or B) is less than a threshold(manually set), then i connect i and j using a line segment on a figure. So, my definition of cluster is all the points that are connected by line segments. I am running 3 loops(A-A, A-B, B-B) for connecting i and j. In the same loop, I recorded i and j object index values in a connectivity matrix such that each row represents a connection between i and j. I wrote the following BFS(breadth first search) code but it has several limitations such as
  1. Identifying the last node in a cluster: It does not detect 5 as a part of 1st cluster
  2. Repeats some index(It repeats the index 4)
Please advice. I am really stuck and I am on a tight deadline to figure this out.
clc;
clear all;
close all;
conncomp=[1 2; 2 3; 2 4; 3 4; 3 5];
%// Find all possible node IDs
%//Nitin finds unique number in an array without repition
nodeIDs = unique(conncomp(:));
%// Flag that tells us if there are any nodes we should visit
nodeIDList = false(1,numel(nodeIDs));
%// Stores our list of clusters
clusterList = {};
%// Keeps track of how many clusters we have
counter = 1;
%// queue - initialize to empty-keeps track of the connected nodes
queue = [];
nodescluster=[];
while (~all(nodeIDList))
% Find any node and make it currently working node
currentnode = find(nodeIDList == false, 1);
% Initialize our queue to contain this node and add this node to
% cluster
queue=currentnode;
nodescluster=currentnode;
%// While our queue is not empty; iteration stops when queue is empty
%and all the nodes have been visited
while (~isempty(queue))
% Grab the current node from previous iteration and remove from
% queue
queue(1)=[];
if (nodeIDList(currentnode))
continue;
end
%// Mark the current node as visited
nodeIDList(currentnode) = true;
%// Remove those already visited
visitedNodes = nodeIDList(true);
UnvistitedNodes = nodeIDList(false);
%// Retrieve all nodes connected to the current node
connectedNodes = conncomp(conncomp(:,1) == currentnode, 2);
queue = [queue,connectedNodes.'];
nodescluster = [nodescluster,connectedNodes.'];
%move to one of these connected nodes and set it as current node
if(numel(queue)~= 0)
currentnode=queue(1);
end
end
%// Add connected components to its own cluster
clusterList{counter} = nodescluster;
counter = counter + 1;
end
celldisp(clusterList)

Sign in to comment.


Alfonso Nieto-Castanon
Alfonso Nieto-Castanon on 20 May 2015
Edited: Alfonso Nieto-Castanon on 20 May 2015
If I am interpreting correctly, something like the following might be what you are after:
c = [1 2; 2 3; 2 4; 3 4; 3 5;4 1; 2 6 ;2 7]; % edges
n = max(c(:)); % number of nodes
C = sparse(c(:,1),c(:,2),1,n,n); % adjacency matrix
C = C|C'|speye(n); % forces symmetry
[idx,~,p] = dmperm(C); % magic
clusters = mat2cell(idx,1,diff(p)); % clusters
N = numel(clusters); % # of clusters
s = cellfun('length',clusters); % # of nodes per cluster

Shan Sham
Shan Sham on 2 Mar 2018
Now only starts work with matlab. Am a mathematics student. I need a algorithm for the following Question. Let G and H be two directed graph with V(G)=V(H). Let S and T be a source and sink of the directed graph G. Adj(G) and Adj(H) denotes the adjacency matrices of G and H. I want to find out a path from a vertex of S to a vertex of T in such a way that the first arc must be in G and second arc in H and also the last arc must be in G. Suppose $P:x_1,x_2,\dots,x_i $ is a desired path. Then x_1 \in S and x_i \in T. {x_1 x_2, x_3 x_4,...,x_{i-1}x_i} \subseteq A(G) and {x_2 x_3, x_4 x_5,...,x_{i-2}x_{i-1}} \subseteq A(H). For the below graphs, S={1,2} and T={5,3}. Example of a such a desired path is * 2145* here 2 is in s and 5 is in T and 21,45 is in G and 14 is in H.
Help me to find a efficient algorithm using BFS.

Categories

Find more on Startup and Shutdown in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!