Concurrent processing is a quickly becoming one of the most important disciplines in software development. With the advent of commodity (a.k.a. the cloud) and grid computing, it is possible to scale applications vertically across many nodes, instead of horizontally increasing power within a single machine. The rise of Google and fall of Sun have proven there is an enormous efficiency benefit to relying on large numbers of cheap, managed machines versus “big iron” single servers. With the advent of Amazon’s EC2, small players can now harness large volumes of computing resources and only pay for the quantity of services they utilize.
Here at Open Initiative Consulting, we recognize the benefits of these new technologies, but have also encountered a few problems. One problem is determining when a unit of work is complete. A work unit represents the combined total of segmented independent processing tasks, that run on separate machines. These processing tasks have no direct communication with a “parent” once they have been allocated, but can record internal state on their progress. After all of these processing efforts are complete, the results are aggregated at a single “reducer”. Normally, we could record the number of children nodes when they are spawned and consider the unit complete once all children had reported. Unfortunately, our requirements were not so simple; children have the ability to generate more than one additional child. Here is an example (notice the third child on the right):

At first glance, we thought about complex algorithms to trace chains back to the root to determine how many possible children were still processing. However, by tracking the number of children each child spawns, we can use simple math to calculate the “weight” of each independent child against the composite work unit. From our previous example:

Now we can do some simple math to determine the percentage of completed work as each child completes:
| Event | Calculation | Complete Percent |
|---|---|---|
| Start the work unit | No calculations, we are just starting | 0 |
| Child one completes | We take each number in the child’s weight and divide. Child one is [1,3] so our calculation is (1 / 3) * 100 or 33.333 percent. | 0 + 33.333% = 33.333% |
| Child four completes | Child four is [1,3,2] so our calculation is (1 / 3 / 2) * 100 or 16.666 percent. | 33.333% + 16.666 % = 50% |
| Child two completes | Child two is [1,3] so our calculation is (1 / 3) * 100 or 33.333 percent. | 50% + 33.333% = 83.333% |
| Child three completes | Child three is [1,3,2] so our calculation is (1 / 3 / 2) * 100 or 16.666 percent. | 83.333% + 16.666% = 100% |
This calculation does carry some caveats. There is a certain margin of error while performing floating point division in most programming languages. In java for example:
1 2 3 4 5 6 7 | public class Test { public static void main(String[] args) { double calc = (1.0 / 3.0 / 2.0) * 100 + (1.0 / 3.0) * 100 + (1.0 / 3.0) * 100 + (1.0 / 3.0 / 2.0) * 100; System.out.println(calc); } } |
produces “99.99999999999997″. To avoid miscalculation, we multiply by 1000, round the result, and divide by 1000:
1 2 3 4 5 6 7 8 9 | import java.lang.Math; public class Test { public static void main(String[] args) { double calc = (1.0 / 3.0 / 2.0) * 100 + (1.0 / 3.0) * 100 + (1.0 / 3.0) * 100 + (1.0 / 3.0 / 2.0) * 100; System.out.println(Math.round(calc * 1000) / 1000); } } |
