Kosaraju’s Algorithm

Kosaraju’s algorithm (Wikipedia) finds a graph’s strongly connected components in linear time.

The algorithm runs in two phases:

Phase 1:

  1. grab an unseen node and perform depth-first search.
  2. add node to stack after children processed.
  3. while unseen nodes exist…

Phase 2:

  1. pop node off stack.
  2. graph traversed against directed edges.
  3. all within traversal belong to one strongly-connected component.
  4. while stack is not empty…

This algorithm is used in my solving of SPOJ 13990. Web islands WEBISL; implementation is in git and below:
Continue reading

Binary-Indexed Tree

A Binary-Indexed Tree is a standard array with extra range indexes, and looks like the following:


Each single cell (shown as an empty square on each row) is an array item, indexed from [1, n]. Each vertical bar (striped part + non-striped top cell) is an index range.

Index Ranges and Responsibility:
Each index range is named after it’s top cell, and its responsibility is given by the last 1-digit (and following 0s). Range 01100 has responsibility 100, meaning it covers from 01100 down to 01100 - 100 + 1. In other words, take the last 1 and move it to the end e.g. 01010 to 01001, 10000 to 00001, 00111 to 00111.

Then, to add into any range, start by adding to each index range that lies completely within the add range. To add X to delim{[}{00001, 10000}{]}, well, that’s covered completely by delim{[}{00001, 10000}{]} itself, so X gets added to range 10000, fini. To add X to delim{[}{00111, 01000}{]}, that’s covered by delim{[}{00111, 00111}{]} union delim{[}{01000, 01000}{]}. The latter — delim{[}{01000, 01000}{]} — is not a valid range! How’s that handled?

Index Values Too
Each item in the structure is both a range and an individual item, and its actual value is a sum of this item along with all responsible (covering) ranges. E.g. the value of item 01001 is its individual value + range values 01001, 01010, 01100, 10000. (Draw a line from the item straight left and see which ranges are hit.)

So, to answer the previous question, when we encounter our invalid range delim{[}{01000, 01000}{]} we simply add our value to item 01000 itself; that’s the best we can do.

This implementation is suitable for:

– adding values to all indices across a range
– setting individual values
– querying individual values

Not suitable for:
– querying sum totals across a range

int64 get(id) O(lg n)
void set(id, int64 value) O(lg n)
void add(ida, idb, int64 value) O(lg n)

Reference: TopCoder – Binary Indexed Trees
N.B. This reference enables a cumulative-sum query from individual inserts/additions. My implementation flips this, enabling addition to entire ranges with querying kept to individual values. (I’ll work on the former soon.)

My implementation is in git and below:
Continue reading

Exact Cover Problem and Knuth’s Algorithm X

I‘ve put together an implementation of Knuth’s Algorithm X (Wikipedia), which solves the Exact Cover problem (Wikipedia).

The example from Algorithm X’s Wikipedia page is as follows:

Set A = delim{lbrace}{1, 4, 7}{rbrace}
Set B = delim{lbrace}{1, 4}{rbrace}
Set C = delim{lbrace}{4, 5, 7}{rbrace}
Set D = delim{lbrace}{3, 5, 6}{rbrace}
Set E = delim{lbrace}{2, 3, 6, 7}{rbrace}
Set F = delim{lbrace}{2, 7}{rbrace}

Is there a set of sets that covers each column exactly once? In this example, the solution is delim{lbrace}{B,D,F}{rbrace}, since that yields delim{lbrace}{1, 4}{rbrace}union delim{lbrace}{3, 5, 6}{rbrace}union delim{lbrace}{2, 7}{rbrace} — none left out and none duplicated.

Algorithm X is a recursive back-tracking search; it simply tries one set, prunes down the search space, rolls back if it hits a dead end, tries another set. The secret sauce, however, is the data structure and the concept called Dancing Links (my little write-up here).

Implementation reference: Dancing Links, Knuth. If you want to understand my implementation, this reference is a must-read.

My implementation is in git and below:
Continue reading

Quick Union Find

Let’s say you have 9 vials of liquid and the list of pairs to combine as follows:

1 & 2 (new batch)
5 & 8 (new batch)
9 & 4 (new batch)
9 & 6 (new 6 goes into existing 9)
3 & 5 (new 3 goes into existing 5)
7 & 1 (new 7 goes into existing 1)
2 & 5 (combine separate existing batches)
2 & 3 (same batch already)

How many separate batches do you have after each move? Are 2 & 4 in the same one?


This is solved with a Quick Union data structure; this structure has the following operations:

insert(a, b) Ensures two IDs exist in the same set

find(a) This belongs to which set?

count() How many total sets?

Where m = links_num and n = items_num, the running time is O(m lg n).

My implementation is in git and below:
Continue reading

2-Dimensional String Search

In my previous two posts, I implemented KMP search with continuation and the Aho-Corasick dictionary. Now we can put it all together and perform a 2D search, finding a rectangular needle in a larger rectangular haystack. This implementation runs in O(haystack_size + needle_size).

The basic steps:

1. Create Aho-Corasick dictionary from needle rows.

2. Haystack states: for each row of the haystack, for each character run it through our dictionary and record each state. (That’s one state per character.)

3. Needle fingerprint: for each row of the needle, run it through the dictionary and record the very final state. (That’s one state per row.)

4. Finally, search the haystack states down the columns for the needle fingerprint using KMP.

The implementation is in git and below:
Continue reading

Aho-Corasick String Matching Dictionary

This is an implementation of the Aho-Corasick String Matching Dictionary (Wikipedia).

This algorithm lets you search for (all instances of) multiple substrings in a single pass through the haystack. This is done by first compiling the substrings into a finite-state machine (tree) which is fed each haystack character in turn, adjusts its (specific) state, and reports any matches. This runs in O(haystack_size + reported_size).

The implementation is in git and below:
Continue reading

Dancing Links

Here is a demonstration of the Dancing Links idea (Wikipedia definition).

The idea is this: if you remove an item from a doubly-linked list, you can put it right back in O(1) as long as you hold on to it. And, if you remove items A B C, you can put them right back as long as you reverse the order exactly, C B A.

This is done as follows:

// Remove: `curr` is forgotten by neighbors
curr->prev->next = curr->next;
curr->next->prev = curr->prev;

// Reinsert: 'curr' knows where to go, and steps right back in!
curr->next->prev = curr;
curr->prev->next = curr;

// Works over and over, as long as it's done in stack order:
// remove=push, reinsert=pop.

This works well where you’re drilling in to search but need to backtrack too — you can do so without the need to work with multiple copies.

And finally, this is demonstrated in git and as follows:
Continue reading