worker completion calculation
August 8, 2010 | by rob @ 11:12 am | comments (0) | filed under [cloud]

Concurrent processing is a quickly becoming one of the most important disciplines in software development. With the advent of commodity (a.k.a. the cloud) and grid computing, it is possible to scale applications vertically across many nodes, instead of horizontally increasing power within a single machine. The rise of Google and fall of Sun have proven there is an enormous efficiency benefit to relying on large numbers of cheap, managed machines versus “big iron” single servers. With the advent of Amazon’s EC2, small players can now harness large volumes of computing resources and only pay for the quantity of services they utilize.

Here at Open Initiative Consulting, we recognize the benefits of these new technologies, but have also encountered a few problems. One problem is determining when a unit of work is complete. A work unit represents the combined total of segmented independent processing tasks, that run on separate machines. These processing tasks have no direct communication with a “parent” once they have been allocated, but can record internal state on their progress. After all of these processing efforts are complete, the results are aggregated at a single “reducer”. Normally, we could record the number of children nodes when they are spawned and consider the unit complete once all children had reported. Unfortunately, our requirements were not so simple; children have the ability to generate more than one additional child. Here is an example (notice the third child on the right):

At first glance, we thought about complex algorithms to trace chains back to the root to determine how many possible children were still processing. However, by tracking the number of children each child spawns, we can use simple math to calculate the “weight” of each independent child against the composite work unit. From our previous example:

Now we can do some simple math to determine the percentage of completed work as each child completes:

Event Calculation Complete Percent
Start the work unit No calculations, we are just starting 0
Child one completes We take each number in the child’s weight and divide. Child one is [1,3] so our calculation is (1 / 3) * 100 or 33.333 percent. 0 + 33.333% = 33.333%
Child four completes Child four is [1,3,2] so our calculation is (1 / 3 / 2) * 100 or 16.666 percent. 33.333% + 16.666 % = 50%
Child two completes Child two is [1,3] so our calculation is (1 / 3) * 100 or 33.333 percent. 50% + 33.333% = 83.333%
Child three completes Child three is [1,3,2] so our calculation is (1 / 3 / 2) * 100 or 16.666 percent. 83.333% + 16.666% = 100%

This calculation does carry some caveats. There is a certain margin of error while performing floating point division in most programming languages. In java for example:

1
2
3
4
5
6
7
public class Test {
    public static void main(String[] args) {
        double calc = (1.0 / 3.0 / 2.0) * 100 + (1.0 / 3.0) * 100 + (1.0 / 3.0) * 100 +
            (1.0 / 3.0 / 2.0) * 100;
        System.out.println(calc);
    }
}

produces “99.99999999999997″. To avoid miscalculation, we multiply by 1000, round the result, and divide by 1000:

1
2
3
4
5
6
7
8
9
import java.lang.Math;
 
public class Test {
    public static void main(String[] args) {
        double calc = (1.0 / 3.0 / 2.0) * 100 + (1.0 / 3.0) * 100 + (1.0 / 3.0) * 100 + 
            (1.0 / 3.0 / 2.0) * 100;
        System.out.println(Math.round(calc * 1000) / 1000);
    }
}
complex network applications on a single ec2 instance
January 23, 2010 | by rob @ 4:52 pm | comments (0) | filed under [cloud]

Under a constrained non-profit budget, we sometimes find it necessary to run complex network services such as media and mail servers on the same EC2 instance. Unfortunately, these systems tend to require overlapping ports, particularly the HTTP port 80. In our older Solaris virtualized world, we would just create two zones that shared processors and memory, allowing each service to access it’s own virtual network address. To work around the limitation of a single assigned address from Amazon, we create “virtual” internal addresses in the 192.168.1.0/24 range. Obviously these addresses won’t route from Amazon’s network, but we can now start the appropriate services on their own address. This also allows us to forward requests with iptables from the original “real” address, reducing the amount of accidental unsafe holes in the firewall. Here is a sample script that configures the firewall and forwards a few ports:

http://github.com/greyrl/generaltools/blob/master/firewall.sh

This script also only allows connections to SSH from our “safe” networks. We rely on iptables for security with this configuration, forwarding all requests 0 through 65535 from Amazon’s “Security Groups”.

creating EC2 machine images
January 8, 2010 | by rob @ 12:23 pm | comments (0) | filed under [cloud]

Learning how to create custom Ubuntu machines images or “AMIs” was a tricky process. We couldn’t find a clear series of steps in any of the technical documents. What worked on the East region didn’t seem to work on the West region. This post will hopefully outline the procedure we cobbled together at the Center for HIV Information.

Configure an instance
We started off the with the alestic images but apparently Ubuntu has released official images. We also run custom configurations of MySQL, Jetty and Apache, all configured through Upstart, so our first steps were to remove the default init symbolic links from the “/etc/rc2.d” directory. Why run custom configurations? For MySQL we use the same scripts to manage replication across several nodes which are version controlled and stored on an EBS. Jetty has several versioned XML configuration files that are custom to our installation. We also keep as much custom data as possible in the EBS and keep the operating systems “skeletal”.

Remove sensitive data
Another thing to keep in mind, is to scrub or remove sensitive data such as passwords from a variety of places in the file system. Presumably, the AMIs are only available privately but it is a good practice. Before creating an image, we remove the “/root/.bash_history” file, the “/root/.subversion” directory, the “/root/.ssh/id_dsa” file, along with several others.

Generate the image
There are a few variables that will be used with several of the commands:

  • arch – the architecture, for example “i386″
  • bucket – the S3 bucket name where the image will be stored
  • prefix – the prefix that will identify this AMI
  • region – the region, currently “us-west-1″, “us-east-1″, or “eu-west-1″
  • AWS_USER_ID – your numeric AWS username, can be found here listed as “Account Number”
  • AWS_ACCESS_KEY_ID – can be found here
  • AWS_SECRET_ACCESS_KEY – can be found here
  • export JAVA_HOME – the tools are written in Java and need to know the location of the JVM. Make sure you export this variable
  • You will also need to generate an X.509 certificate, assuming you haven’t done so already. You can manage key pairs here under the “X.509 certificate” tab. The private key and certificate should be transferred to an excluded directory (see “-e” variable below) on the instance where you will build the AMI.

    Now you can run the “ec2-bundle-vol” command to generate the volume that will be uploaded to S3. Make sure you mark any directories you don’t want in the image (such as EBS mounts) with the “-e” variable:

    ec2-bundle-vol -r $arch -d /mnt -p $prefix -u $AWS_USER_ID -k <location of your private key> -c <location of your certificate> -e /ebsmount,/mnt,/tmp,/root/.ssh,/root/tmp -s 2048

    This will create the image in the “/mnt” directory. If your instance not in the “us-east-1″ region, you should also run the “ec2-migrate-manifest” command to migrate the kernel and ramdisk to an available option:

    ec2-migrate-manifest -k /<location of your private key> -c <location of your certificate&gt -m $prefix.manifest.xml –region $region -a $AWS_ACCESS_KEY_ID -s $AWS_SECRET_ACCESS_KEY

    Now you can upload the bundle to S3:

    ec2-upload-bundle -b $bucket -m /mnt/$prefix.manifest.xml -a $AWS_ACCESS_KEY_ID -s $AWS_SECRET_ACCESS_KEY –location $region

    And finally register the AMI:

    ec2-register -K /<location of your private key> -C <location of your certificate&gt $bucket/$prefix.manifest.xml –region $region