Disjoint Set

DISJOINT SETS:

A disjoint sets data structure represents a collection of sets that are disjoint: that is, no item is found in more than one set. The collection of disjoint sets is called a partition, because the items are partitioned among the sets. A disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that keeps track of a set of elements partitioned into a number of disjoint (non overlapping) subsets.

You have a collection of n items, which you number from 0 to n-1. These items will be partitioned into some number of sets. The sets are "disjoint" which means that no item belongs to more than one set. All items belong to some set though (hence the use of the word

"partition.").

Operations on Sets

Let x be an object. We wish to support the following operations.

∙ MAKE-SET( x) creates a new set whose only member is pointed by x; Note that x is not in the other sets.

∙ UNION( x,y) unites two dynamic sets containing objects x and y, say Sx and Sy , into a new set that Sx ∪ Sy , assuming that Sx ∩ Sy =∅; ∙ FIND-SET( x) returns a pointer to the representative of the set containing x. ∙ INSERT( a,S) inserts an object a to S, and returns S∪{a}.

∙ DELETE( a,S) deletes an object a from S, and returns S-{a}.

∙ SPLIT( a,S) partitions the objects of S into two sets S1 and S2 such that S1 ={b | b≤a & b∈S}, and S2 =S- S1 .

∙ MINIMUM( S) returns the minimum object in S.

Applications of Disjoint-set Data Structures

Here we show two application examples.

∙ Connected components (CCs)

∙ Minimum Spanning Trees (MSTs)

Algorithm for Connected Components

CONNECTED-COMPONENTS (G)

1 for each v∈V

2 do MAKE-SET (v)

3 for every edge (u,v)∈E

4 do if FIND-SET (u) ≠ FIND-SET (v)

5 then UNION (u,v)

SAME-COMPONENTS (u,v)

1 if FIND-SET (u) = FIND-SET (v)

2 then return TRUE

3 return FALSE

Figure 1: A graph with four connected components: {a,b,c,d}, {e,f,g}, {h,i}, and {j}.

Edge processed Collection of disjoint sets

initial sets {a} {b} {c} {d} {e} {f} {g} {h} {i} {j}

(b,d) {a} {b,d} {c} {e} {f} {g} {h} {i} {j}

(e,g) {a} {b,d} {c} {e,g} {f} {h} {i} {j}

(a,c) {a,c} {b,d} {e,g} {f} {h} {i} {j}

(h,i) {a,c} {b,d} {e,g} {f} {h,i} {j}

(a,b) {a,b,c,d} {e,g} {f} {h,i} {j}

(e,f) {a,b,c,d} {e,f,g} {h,i} {j}

(b,c) {a,b,c,d} {e,f,g} {h,i} {j}

This table shows the state of the collection of disjoint sets as each edge is processed. The processed edge is listed on the left and the rest of the columns show the state of the collection.

The Linked-list Representation

A set can be represented by a linked list. In this representation, each node has a pointer to the next node and a pointer to the first node.

Linked-list representation. Each disjoint set can be represented by a linked-list. Each node has a pointer to the head node and a pointer to the next node

There are two operations that you can perform:

∙ int Find(i): This takes the number of an item and returns its "set id." This is a number

between zero and n-1. You don't really care what the number is; however, if i and j belong to the same disjoint set, then Find(i) will equal Find(j).

int Union(id1, id2): This takes the set id's of two different sets, and performs the union operation on them, coalescing them into one set. It returns the set id of this new set. Now, when you call Find() on any item that was previously in either of these sets, it will return this new set id.

Example:

We initialize an instance of disjoint sets with 10 items. Each item is a node with a number from 0 to 9. Each node has a NULL link, which we depict by not drawing any arrows from the node:

Again, each node is in its own set, and each node's set id is its number. Suppose we call Union(0, 1), Union(2, 3) and Union(4, 5). These will each set one of the node's link to the other node. We'll talk about how that gets done later. However, suppose this is the result:

Node 0's link has been set to 1. Both of these nodes' set ids are now 1, which means Find(0) equals Find(1) equals one. Similarly, Find(2) equalsFind(3) equals three. This gives you a clue about implementing Find(). When you call Find(n), what you do is keep setting n to n's link, until n's link is NULL. At that point, you are at the root of the set, and you return n.

Union is pretty simple, too, but you have some choices about how to determine which node sets its link field to the other. We use three methods to do this:

1. Union by size: Make the node whose set size is smaller point to the node whose set size is bigger. Break ties arbitrarily.

2. Union by height: The height of a set is the longest number of links that it takes to get from one of its items to the root. With union by height, you make the node whose height is smaller point to the node whose height is bigger.

3. Union by rank with path compression: This works exactly like union by height, except you don't really know the height of the set. Instead, you use the set's rank, which is what its height would be, were you using union by height. The reason that you don't know the height is that when you call Find(i) on an element, you perform "path compression."

There are two sets, with set id's 5 and 9. Now, suppose you call Find(0). It will return five, but along the way to the root node of its set, it encounters nodes 1 and 3. Before returning five, it sets the links to 0, 1 and 3 to five:

Running Time:

We assume the number of items in the instance of disjoint sets is n: ∙ Initializing an instance of disjoint sets: O(n)

∙ Performing Union() in any of the implementations: O(1)

∙ Performing Find() in Union by size or height: O(log(n))

∙ Performing Find() in Union by rank with path compression: O(α(n))

You have to perform two operations:

1. Union(A, B): Connect two elements A and B

2. Find(A, B): Find whether the two elements A and B are connected

Example You have a set of elements S = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. Here there are 10 elements (N = 10). Use an array arr to manage the connectivity of the elements. Arr[ ] that is indexed by elements of sets, which are of size N (as N elements in set), can be used to manage the operations of union and find.

Assumption

A and B objects are connected only if arr[ A ] = arr[ B ]. Implementation

To implement the operations of union and find, do the following:

1. Find(A, B): Check whether arr[ A ] = arr[ B ]

2. Union(A, B): Connect A to B and merge the components that comprise A and B by replacing elements that have a value of arr[ A ] with the value of arr[ B ].

Initially there are 10 subsets and each subset has one element in it.

When each subset contains only one element, the array arr is as follows:

Let’s perform some operations.

1. Union(2, 1)

The arr is as

follows:

1. Union(4, 3)

2. Union(8, 4)

3. Union(9, 3)

The arr is as follows:

1. Union(6, 5)

The arr is as

follows:

After performing some operations of Union (A ,B), there are now 5 subsets as follows:

1. First subset comprises the elements {3, 4, 8, 9}

2. Second subset comprises the elements {1, 2}

3. Third subset comprises the elements {5, 6}

4. Fourth subset comprises the elements {0}

5. Fifth subset comprises the elements {7}

The elements of a subset, which are connected to each other directly or indirectly, can be considered as the nodes of a graph. Therefore, all these subsets are called connected components.

The union-find data structure is useful in graphs for performing various operations like connecting nodes, finding connected components etc.

Let’s perform some find(A, B) operations.

1. Find(0, 7): 0 and 7 are disconnected, and therefore, you will get a false result

2. Find(8, 9): Although 8 and 9 are not connected directly, there is a path that connects both the elements, and therefore, you will get a true result

When these operations are viewed in terms of components, then:

1. Union(A, B): Replace components that comprise the two objects A and B with their union

2. Find(A, B): Check whether the two objects A and B are in the same component

If you perform the operation union(5, 2) on the components mentioned above, then it will be as follows:

The arr is as follows:

As the loop in the union function iterates through all the N elements for connecting two elements, performing this operation on N objects will take O(N2) time, which is quite inefficient.

Try another approach

Arr[ A ] is a parent of A.

Consider the root element of each subset, which is only a special element in that subset having itself as the parent. Assume that R is the root element, then arr[ R ] = R.

For more clarity, consider the subset S = {0, 1, 2, 3, 4, 5}

Initially each element is the root of itself in all the subsets because arr[ i ] = i, where i is the element in the set. Therefore root(i) = i.

Performing union(1, 0) will connect 1 to 0 and will set root (0) as the parent of root (1). As root(1) = 1 and root(0) = 0, the value of arr[ 1 ] will change from 1 to 0. Therefore, 0 will be the root of the subset that contains the elements {0, 1}.

Performing union (0, 2), will indirectly connect 0 to 2 by setting root(2) as the parent of root(0). As root(0) is 0 and root(2) is 2, it will change value of arr[ 0 ] from 0 to 2. Therefore, 2 will be the root of the subset that contains the elements {2, 0, 1}.

Performing union (3, 4) will indirectly connect 3 to 4 by setting root(4) as the parent of root(3). As root(3) is 3 and root(4) is 4, it will change the value of arr[ 3 ] from 3 to 4. Therefore, 4 will be the root of the subset that contains the elements {3, 4}.

Performing union (1, 4) will indirectly connect 1 to 4 by setting root(4) as the parent of root(1). As root(4) is 4 and root(1) is 2, it will change value of arr[ 2 ] from 2 to 4. Therefore, 4 will be the root of the set containing elements {0, 1, 2, 3,4}.

After each step, you will see the change in the array arr also.

After performing the required union(A, B) operations, you can perform the find(A, B) operation easily to check whether A and B are connected. This can be done by calculating the roots of both A and B. If the roots of A and B are the same, then it means that both A and B are in the same subset and are connected.

Calculating the root of an element

Arr[ i ] is the parent of i (where i is the element of the set). The root of i is Arr[ Arr[ Arr[ …...Arr[ i ]...... ] ] ] until arr[ i ] is not equal to i. You can run a loop until you get an element that is a parent of itself.

Note This can only be done when there is no cycle in the elements of the subset, else the loop will run infinitely.

1. Find(1, 4): 1 and 4 have the same root i.e. 4. Therefore, it means that they are connected and this operation will give the result True.

2. Find(3, 5): 3 and 5 do not have same root because root(3) is 4 and root(5) is 5. This means that they are not connected and this operation will give the result False.

joy_of_learing

Search This Blog

Disjoint Set

Comments

Post a Comment