Disjoint Set

 DISJOINT SETS: 

A disjoint sets data structure represents a collection of sets that are  disjoint: that is, no item is found in more than one set. The collection  of disjoint sets is called a partition, because the items are partitioned  among the sets. A disjoint-set data structure, also called a union–find  data structure or merge–find set, is a data structure that keeps track of  a set of elements partitioned into a number of disjoint (non overlapping)  subsets. 

You have a collection of n items, which you number from 0 to n-1.  These items will be partitioned into some number of sets. The sets  are "disjoint" which means that no item belongs to more than one  set. All items belong to some set though (hence the use of the word  

"partition."). 

Operations on Sets 

Let x be an object. We wish to support the following operations. 

MAKE-SET( x) creates a new set whose only member is pointed by x; Note  that x is not in the other sets. 

UNION( x,y) unites two dynamic sets containing objects x and y,  say Sx and Sy , into a new set that Sx Sy , assuming that Sx ∩ Sy =; FIND-SET( x) returns a pointer to the representative of the set containing x. INSERT( a,S) inserts an object a to S, and returns S{a}. 

DELETE( a,S) deletes an object a from S, and returns S-{a}. 

SPLIT( a,S) partitions the objects of S into two sets S1 and S2 such  that S1 ={b | b≤a & bS}, and S2 =S- S1 . 

MINIMUM( S) returns the minimum object in S. 

Applications of Disjoint-set Data Structures 

Here we show two application examples. 

Connected components (CCs) 

Minimum Spanning Trees (MSTs)

Algorithm for Connected Components 

CONNECTED-COMPONENTS (G) 

1 for each v

2 do MAKE-SET (v) 

3 for every edge (u,v)

4 do if FIND-SET (u) ≠ FIND-SET (v) 

5 then UNION (u,v) 

SAME-COMPONENTS (u,v) 

1 if FIND-SET (u) = FIND-SET (v) 

2 then return TRUE 

3 return FALSE 

Figure 1: A graph with four connected components: {a,b,c,d}, {e,f,g}, {h,i},  and {j}. 

Edge processed Collection of disjoint sets 

initial sets {a} {b} {c} {d} {e} {f} {g} {h} {i} {j} 

(b,d) {a} {b,d} {c} {e} {f} {g} {h} {i} {j} 

(e,g) {a} {b,d} {c} {e,g} {f} {h} {i} {j} 

(a,c) {a,c} {b,d} {e,g} {f} {h} {i} {j} 

(h,i) {a,c} {b,d} {e,g} {f} {h,i} {j} 

(a,b) {a,b,c,d} {e,g} {f} {h,i} {j} 

(e,f) {a,b,c,d} {e,f,g} {h,i} {j} 

(b,c) {a,b,c,d} {e,f,g} {h,i} {j} 

This table shows the state of the collection of disjoint sets as each edge is  processed. The processed edge is listed on the left and the rest of the columns show  the state of the collection. 

The Linked-list Representation 

A set can be represented by a linked list. In this representation, each node has a  pointer to the next node and a pointer to the first node.

Linked-list representation. Each disjoint set can be represented by a linked-list.  Each node has a pointer to the head node and a pointer to the next node 

There are two operations that you can perform: 

int Find(i): This takes the number of an item and returns its "set id." This is  a number 

between zero and n-1. You don't really care what the number is; however,  if i and j belong to the same disjoint set, then Find(i) will equal Find(j). 

int Union(id1, id2): This takes the set id's of two different sets, and performs  the union operation on them, coalescing them into one set. It returns the set id  of this new set. Now, when you call Find() on any item that was previously in  either of these sets, it will return this new set id. 

Example: 

We initialize an instance of disjoint sets with 10 items. Each item is a node  with a number from 0 to 9. Each node has a NULL link, which we depict by  not drawing any arrows from the node: 

Again, each node is in its own set, and each node's set id is its number. Suppose  we call Union(0, 1), Union(2, 3) and Union(4, 5). These will each set one of the  node's link to the other node. We'll talk about how that gets done later. However,  suppose this is the result:

Node 0's link has been set to 1. Both of these nodes' set ids are now 1,  which means Find(0) equals Find(1) equals one. Similarly, Find(2) equalsFind(3)  equals three. This gives you a clue about implementing Find(). When you call  Find(n), what you do is keep setting n to n's link, until n's link is NULL. At that  point, you are at the root of the set, and you return n. 

Union is pretty simple, too, but you have some choices about how to determine  which node sets its link field to the other. We use three methods to do this: 

1. Union by size: Make the node whose set size is smaller point to the node  whose set size is bigger. Break ties arbitrarily. 

2. Union by height: The height of a set is the longest number of links that it  takes to get from one of its items to the root. With union by height, you  make the node whose height is smaller point to the node whose height is  bigger. 

3. Union by rank with path compression: This works exactly like union by  height, except you don't really know the height of the set. Instead, you use  the set's rank, which is what its height would be, were you using union by  height. The reason that you don't know the height is that when you call  Find(i) on an element, you perform "path compression."

There are two sets, with set id's 5 and 9. Now, suppose you call Find(0). It will  return five, but along the way to the root node of its set, it encounters nodes 1 and  3. Before returning five, it sets the links to 0, 1 and 3 to five: 

Running Time: 

We assume the number of items in the instance of disjoint sets is n: Initializing an instance of disjoint sets: O(n) 

Performing Union() in any of the implementations: O(1) 

Performing Find() in Union by size or height: O(log(n)) 

Performing Find() in Union by rank with path compression: O(α(n))

You have to perform two operations: 

1. Union(A, B): Connect two elements A and B 

2. Find(A, B): Find whether the two elements A and B are  connected 

Example You have a set of elements S = {0, 1, 2, 3, 4, 5, 6, 7,  8, 9}. Here there are 10 elements (N = 10). Use an array arr to  manage the connectivity of the elements. Arr[ ] that is indexed  by elements of sets, which are of size N (as N elements in set),  can be used to manage the operations of union and find

Assumption

A and B objects are connected only if arr[ A ] = arr[ B ]. Implementation 

To implement the operations of union and find, do the following: 

1. Find(A, B): Check whether arr[ A ] = arr[ B ] 

2. Union(A, B): Connect A to B and merge the components  that comprise A and B by replacing elements that have a  value of arr[ A ] with the value of arr[ B ]. 

Initially there are 10 subsets and each subset has one element  in it. 

When each subset contains only one element, the array arr is as follows: 

Let’s perform some operations. 

1. Union(2, 1)

The arr is as  

follows: 

1. Union(4, 3) 

2. Union(8, 4) 

3. Union(9, 3) 




The arr is as follows: 

1. Union(6, 5)

The arr is as  

follows: 

After performing some operations of Union (A ,B), there are  now 5 subsets as follows: 

1. First subset comprises the elements {3, 4, 8, 9} 

2. Second subset comprises the elements {1, 2} 

3. Third subset comprises the elements {5, 6} 

4. Fourth subset comprises the elements {0} 

5. Fifth subset comprises the elements {7} 

The elements of a subset, which are connected to each other  directly or indirectly, can be considered as the nodes of a  graph. Therefore, all these subsets are called connected  components

The union-find data structure is useful in graphs for performing  various operations like connecting nodes, finding connected  components etc.

Let’s perform some find(A, B) operations. 

1. Find(0, 7): 0 and 7 are disconnected, and therefore, you  will get a false result 

2. Find(8, 9): Although 8 and 9 are not connected directly,  there is a path that connects both the elements, and  therefore, you will get a true result 

When these operations are viewed in terms of components,  then: 

1. Union(A, B): Replace components that comprise the two  objects A and B with their union 

2. Find(A, B): Check whether the two objects A and B are in  the same component 

If you perform the operation union(5, 2) on the components  mentioned above, then it will be as follows: 




The arr is as follows:



As the loop in the union function iterates through all the N  elements for connecting two elements, performing this  operation on N objects will take O(N2) time, which is quite  inefficient. 

Try another approach 

Arr[ A ] is a parent of A. 

Consider the root element of each subset, which is only a  special element in that subset having itself as the parent.  Assume that R is the root element, then arr[ R ] = R. 

For more clarity, consider the subset S = {0, 1, 2, 3, 4, 5} 

Initially each element is the root of itself in all the subsets  because arr[ i ] = i, where i is the element in the set. Therefore  root(i) = i. 



Performing union(1, 0) will connect 1 to 0 and will set root (0)  as the parent of root (1). As root(1) = 1 and root(0) = 0, the  value of arr[ 1 ] will change from 1 to 0. Therefore, 0 will be the  root of the subset that contains the elements {0, 1}.





Performing union (0, 2), will indirectly connect 0 to 2 by setting  root(2) as the parent of root(0). As root(0) is 0 and root(2) is 2,  it will change value of arr[ 0 ] from 0 to 2. Therefore, 2 will be  the root of the subset that contains the elements {2, 0, 1}. 




Performing union (3, 4) will indirectly connect 3 to 4 by setting  root(4) as the parent of root(3). As root(3) is 3 and root(4) is 4,  it will change the value of arr[ 3 ] from 3 to 4. Therefore, 4 will  be the root of the subset that contains the elements {3, 4}.




Performing union (1, 4) will indirectly connect 1 to 4 by setting  root(4) as the parent of root(1). As root(4) is 4 and root(1) is 2,  it will change value of arr[ 2 ] from 2 to 4. Therefore, 4 will be  the root of the set containing elements {0, 1, 2, 3,4}. 





After each step, you will see the change in the array arr also. 

After performing the required union(A, B) operations, you can  perform the find(A, B) operation easily to check whether A and  B are connected. This can be done by calculating the roots of  both A and B. If the roots of A and B are the same, then it  means that both A and B are in the same subset and are  connected. 

Calculating the root of an element 

Arr[ i ] is the parent of i (where i is the element of the set). The  root of i is Arr[ Arr[ Arr[ …...Arr[ i ]...... ] ] ] until arr[ i ] is not  equal to i. You can run a loop until you get an element that is a  parent of itself.

Note This can only be done when there is no cycle in the  elements of the subset, else the loop will run infinitely. 

1. Find(1, 4): 1 and 4 have the same root i.e. 4. Therefore, it  means that they are connected and this operation will give  the result True

2. Find(3, 5): 3 and 5 do not have same root because root(3)  is 4 and root(5) is 5. This means that they are not  connected and this operation will give the result False.


Comments