03 Jul 2019

Algorithms - Union Find


I used to solve many algorithm problems by using DFS / BFS, but Union-Find can solve many of these problems.

It’s a useful algorithm, and it’s introduced by the book Algorithms in Chapter 1.


Union-Find API

There are some typical problems that we can solve by Union-Find algorithm. For example, Union-Find tag of Leetcode.

Questions are like: we have a set of items, some of the items are connected, for those are connected we call them a component, and the question is how many components we have.

Thus, the Union-Find APIs are:

  • The union() method merges two components if the two items are in different components;

  • The find() method returns an integer component identifier for a given item;

  • The connected() method determines whether two items are in the same component;

  • The count() method returns the number of components.


Union-Find Implementation

Basic API

The codes are simple and direct:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
public class UF {

    // access to component id (site indexed)
    private int[] id;

    // number of components
    private int count;

    /**
     * initialize N sites with integer names (0 to N-1)
     *
     * @param N
     */
    public UF(int N) {
        count = N;
        id = new int[N];
        for (int i = 0; i < N; i++) {
            id[i] = i;
        }
    }

    /**
     * add connection between p and q
     *
     * @param p
     * @param q
     */
    public void union(int p, int q) {
        // TODO: will be introduced below
    }

    /**
     * component identifier for p (0 to N-1)
     *
     * @param p
     * @return
     */
    public int find(int p) {
        // TODO: will be introduced below
    }

    /**
     * return true if p and q are in the same component
     *
     * @param p
     * @param q
     * @return
     */
    public boolean connected(int p, int q) {
        return find(p) == find(q);
    }

    /**
     * number of components
     *
     * @return
     */
    public int count() {
        return count;
    }
}


Quick-Find

One approach is to maintain the invariant that p and q are connected if and only if id[p] is equal to id[q].

In other words, all items in a component must have the same value in id[].

This method is called quick-find because find(p) just returns id[p], which immediately implies that connected(p, q) reduces to just the test id[p] == id[q] and returns true if and only if p and q are in the same component.

Here is the implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
public void union(int p, int q) {  // Put p and q into the same component.
    int pID = find(p);
    int qID = find(q);

    // Nothing to do if p and q are already in the same component.
    if (pID == qID) return;

    // Rename p’s component to q’s name.
    for (int i = 0; i < id.length; i++) {
        if (id[i] == pID) {
            id[i] = qID;
        }
    }    
    count--;
}

public int find(int p) {
    return id[p];
}

The disadvantage of this implementation is that the union() needs to scan through the whole id[] array for each input pair, so it is typically not useful for large problems.


Quick-Union

It is based on the same data structure — the site-indexed id[] array.

Specifically, the id[] entry for each item is the name of another item in the same component (possibly itself) - we refer to this connection as a link.

To implement find(), we start at the given item, follow its link to another site, follow that item’s link to yet another item, and so forth, following links until reaching a root, an item that has a link to itself.

Two items are in the same component if and only if this process leads them to the same root.

Here is the implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
public void union(int p, int q) {
    int pRoot = find(p);
    int qRoot = find(q);
    if (pRoot == qRoot) {
        return;
    }
    // Give p and q the same root.
    id[pRoot] = qRoot;
    count--;
}

public int find(int p) {
    // Find component name.
    while (p != id[p]) {
        p = id[p];
    }
    return p;
}

There are some improvements to Quick-Union algorithm because the link from the root can be tall, which may cause find() expensive.

One of the improvements is path compression, and it only has one extra line of code:

1
2
3
4
5
6
7
public int find(int p) {
    while (p != id[p]) {
        id[p] = id[id[p]]; // please notice this line
        p = id[p];
    }
    return p;
}


References