<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Open Initiative Consulting</title>
	<atom:link href="http://openinit.com/c/feed/" rel="self" type="application/rss+xml" />
	<link>http://openinit.com/c</link>
	<description>Meeting all your technology needs!</description>
	<lastBuildDate>Sun, 07 Nov 2010 22:51:13 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>looking back on six months</title>
		<link>http://openinit.com/c/2010/10/24/looking-back-on-six-months/</link>
		<comments>http://openinit.com/c/2010/10/24/looking-back-on-six-months/#comments</comments>
		<pubDate>Sun, 24 Oct 2010 16:49:52 +0000</pubDate>
		<dc:creator>rob</dc:creator>
				<category><![CDATA[consulting]]></category>

		<guid isPermaLink="false">http://openinit.com/c/?p=659</guid>
		<description><![CDATA[The end of October marks my six month anniversary as an independent consultant. This post is a reflection, with a goal of documenting my many mistakes and ideas on moving forward. If you find yourself in a similar situation, I&#8217;d appreciate any comments about how you approach marketing your consulting or freelance business. My Background [...]]]></description>
			<content:encoded><![CDATA[<p>The end of October marks my six month anniversary as an independent consultant. This post is a reflection, with a goal of documenting my many mistakes and ideas on moving forward. If you find yourself in a similar situation, I&#8217;d appreciate any comments about how you approach marketing your consulting or freelance business.</p>
<p><strong>My Background</strong></p>
<p>I have a background as a Java/J2EE programmer and application server administrator. Since you&#8217;re reading this post, you are probably aware of the issues the Java platform has faced during the slow demise of Sun Microsystems. There is a sense that Oracle is taking the reigns to complete a long overdue release of Java7, but no concrete terms have been established. After such a debacle, it&#8217;s hard to imagine Java being a first choice for any developer with a perspective on the field.</p>
<p>With that uncertainty in mind, I&#8217;ve explored a variety of other languages over the years, including Groovy, Python, C#, and Erlang, completing projects in several. I assume most Java folks have taken this route to remain competitive in the market.</p>
<p>I founded the consulting business hoping to market those broad skills as a general troubleshooter, architect, and developer. Someone who could arrive at an organization and suggest and implement some best practices. My focus was efficiency: cutting development costs by utilizing commoditized resources and streamlining inefficient processes.</p>
<p>However, I didn&#8217;t have much experience in the business and marketing world. Developing people skills can be exceptionally difficult for someone who has spent most of their professional years behind a computer screen in a cubicle. After several months of building a business, I truly believe the sales process is phenomenally more difficult than completing the actual work. </p>
<p>The rest of this article details a few of the key issues I&#8217;ve encountered while learning (or at least attempting to learn) the art of business building.</p>
<p><strong>Start By Leveraging Your Existing Network</strong></p>
<p>A family friend once told me &#8220;it&#8217;s not what you know, it&#8217;s who you know&#8221;. A different friend mocked this suggestion, believing the commentor was lazy and only capable of succeeding because of his ability to exploit relationships. Looking back, the friend who mocked has excelled in the construction business by leveraging his experiences in prior jobs.</p>
<p>As Americans, we hold dearly the concept of self-determination. Hard work and a go-getter attitude will open every door. However, it is impossible to discount the impact of mutual effort in any significant undertaking. Human beings are social creatures who build networks to better confront the challenges of nature. </p>
<p>I decided to start the business after moving across the country, back to the medium sized city where I had lived and worked for two years before moving to California. I assumed that my old colleagues from so many years before would be a great source for new opportunities. Unfortunately, this was an extremely egotistical notion, believing that people I had known as acquaintances from so many years before would jump at the opportunity to help me succeed. As if they didn&#8217;t have lives, careers, and families of their own. To this day, I&#8217;ve only managed to have lunch with one of those old co-workers. This person was also a business developer and probably more obligated to relationship building; otherwise, I honestly don&#8217;t think he would have taken the time.</p>
<p>My first stable client was my former employer from California. Even with a country separating us, the level of trust was much more established between people with solid, recent memories and associations. A former college roommate gave me an opportunity after reminiscing about college experiences at a wedding. </p>
<p>If you plan to freelance, you will find much more success engaging your current contacts, friends, and family.</p>
<p><strong>Extending Your Network</strong></p>
<p>If you find yourself without a strong network to draw from, you&#8217;ll need to get out and start building one. I started by attending &#8220;networking meetings&#8221;, found on sites like <a href="http://www.meetup.com/">Meetup.com</a>. Some are more informal than others. The most famous (and probably infamous) of these is <a href="http://www.bni.com/">BNI</a>, a group that charges an annual fee and has a strict schedule. Their central premise is that any new business opportunities should be channeled through group members. The members are paying, both in time  and money, for the network. I attended a few of these meetings, both with BNI and <a href="http://www.professionalsreferralorganization.com/">PRO</a> (a group local to Richmond). </p>
<p>Unfortunately, I found that most of the services offered by the members were business to consumer oriented. For example, there were a lot of insurance sales people and realtors. These groups make sense for people in those industries, but my technology services were geared more towards enterprise. It might have been possible to sell some small to medium size businesses, but I would strongly recommend anyone following this approach to look for more business-to-business oriented groups. </p>
<p>In technology, users groups tend to be better sources of contacts. For instance, I&#8217;ve met several folks at Linux User Groups who not only share interests, but can provide connections into the companies where they work.</p>
<p><strong>Prioritizing</strong></p>
<p>Another big issue is what I term &#8220;thrashing&#8221;. In technology terms, this indicates a machine that is spending more time managing resources than accomplishing actual work. In the consulting world, I define &#8220;thrashing&#8221; as continuously developing efforts that never turn into billable hours. Many tasks could be included in this category. </p>
<p>Unless you are a full time business developer, attending networking events that are inappropriate for your offered services is huge time drain. If you decide to approach these groups, try to talk to some members before spending the full hour to two hours, travel included, required to participate.</p>
<p>Be very wary when approaching new clients. In today&#8217;s economy, people are looking to squeeze dollars. I&#8217;ve found that a strict limit on the amount of non-billable development time is absolutely necessary in a new engagement. Presenting a good strategy and starting point quickly in the process, along with clear expectations, will save a lot of potentially wasted time.</p>
<p>Don&#8217;t get stuck with a large amount of small projects. I think two to three clients, certainly as a sole proprietor, should be a limit for any responsible consultant. If you are finding you wind up and down a large amount of business within two or three weeks and you spend more time developing relationships than billing, then you should reconsider your strategy. Of course if you have a process or product geared towards that strategy, you might find more success. However, trying to take on too much is a disservice to both you and your customers.</p>
<p><strong>Find a Niche</strong></p>
<p>The roles of contract programmer and systems administrator are largely commoditized. International freelance websites and large full service IT shops have largely cornered these markets. As a consultant, you need to find a specialty that is not widely available and will greatly impact the bottom line. Examples might be reducing redundant processes or support costs.</p>
<p>If you are having problems defining that niche, find ways to communicate with business leaders in your field of expertise. I spent several years working in healthcare, mostly around HIV and the associated research. A <a href="http://www.score.org/index.html">SCORE</a> counselor suggested I contact local hospitals to determine where there are lacking in service providers. Of course, you will need to define what you plan to offer, but after a few engagements you might determine a developable option.</p>
<p>Once you’ve determined that niche, target your marketing and development efforts. My business offering is very general, not targeted to efficiency or cloud processes, and dilutes my pitch behind unrelated services. To date, I haven&#8217;t turned down any clients, even when the work doesn’t relate to my business. Starting to do this has been difficult to consider, certainly with rent bills due, but should strengthen my brand and expertise over the long haul.</p>
<p><strong>Accept Defeat</strong></p>
<p>Stubborn people have a hard time accepting failure and learning from their mistakes. I certainly fall into this category. However, I am now searching for contract work through recruiters so I can begin to retire clients that don’t fit my business model. The long term goal is to reduce my client load to one while I redesign my web site, business cards, and sales strategy. </p>
<p>I hope some of these ideas will help focus your business initiatives, feel free to leave a comment if you have any additional ideas.</p>
]]></content:encoded>
			<wfw:commentRss>http://openinit.com/c/2010/10/24/looking-back-on-six-months/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>worker completion calculation</title>
		<link>http://openinit.com/c/2010/08/08/worker-completion-calculation/</link>
		<comments>http://openinit.com/c/2010/08/08/worker-completion-calculation/#comments</comments>
		<pubDate>Sun, 08 Aug 2010 18:12:08 +0000</pubDate>
		<dc:creator>rob</dc:creator>
				<category><![CDATA[cloud]]></category>

		<guid isPermaLink="false">http://openinit.com/c/?p=631</guid>
		<description><![CDATA[Concurrent processing is a quickly becoming one of the most important disciplines in software development. With the advent of commodity (a.k.a. the cloud) and grid computing, it is possible to scale applications vertically across many nodes, instead of horizontally increasing power within a single machine. The rise of Google and fall of Sun have proven [...]]]></description>
			<content:encoded><![CDATA[<p>Concurrent processing is a quickly becoming one of the most important disciplines in software development. With the advent of commodity (a.k.a. the cloud) and grid computing, it is possible to scale applications vertically across many nodes, instead of horizontally increasing power within a single machine. The rise of Google and fall of Sun have proven there is an enormous efficiency benefit to relying on large numbers of cheap, managed machines versus &#8220;big iron&#8221; single servers. With the advent of Amazon&#8217;s EC2, small players can now harness large volumes of computing resources and only pay for the quantity of services they utilize.</p>
<p>Here at Open Initiative Consulting, we recognize the benefits of these new technologies, but have also encountered a few problems. One problem is determining when a unit of work is complete. A work unit represents the combined total of segmented independent processing tasks, that run on separate machines. These processing tasks have no direct communication with a &#8220;parent&#8221; once they have been allocated, but can record internal state on their progress. After all of these processing efforts are complete, the results are aggregated at a single &#8220;reducer&#8221;. Normally, we could record the number of children nodes when they are spawned and consider the unit complete once all children had reported. Unfortunately, our requirements were not so simple; children have the ability to generate more than one additional child. Here is an example (notice the third child on the right):</p>
<div style="text-align: center"><img src="http://openinit.com/c/wp-content/uploads/2010/08/step_one.png"/></div>
<p>At first glance, we thought about complex algorithms to trace chains back to the root to determine how many possible children were still processing. However, by tracking the number of children each child spawns, we can use simple math to calculate the &#8220;weight&#8221; of each independent child against the composite work unit. From our previous example:</p>
<div style="text-align: center"><img src="http://openinit.com/c/wp-content/uploads/2010/08/step_two.png"/></div>
<p>Now we can do some simple math to determine the percentage of completed work as each child completes:</p>
<table class="table">
<thead>
<tr class="row">
<th class="tablecolumn">Event</th>
<th class="tablecolumn">Calculation</th>
<th class="tablecolumn">Complete Percent</th>
</tr>
</thead>
<tbody>
<tr class="tablerow">
<td class="tablecolumn">Start the work unit</td>
<td class="tablecolumn">No calculations, we are just starting</td>
<td class="tablecolumn">0</td>
</tr>
<tr class="tablerow">
<td class="tablecolumn">Child one completes</td>
<td class="tablecolumn">We take each number in the child&#8217;s weight and divide. Child one is [1,3] so our calculation is (1 / 3) * 100 or 33.333 percent.</td>
<td class="tablecolumn">0 + 33.333% = 33.333%</td>
</tr>
<tr class="tablerow">
<td class="tablecolumn">Child four completes</td>
<td class="tablecolumn">Child four is [1,3,2] so our calculation is (1 / 3 / 2) * 100 or 16.666 percent.</td>
<td class="tablecolumn">33.333% + 16.666 % = 50%</td>
</tr>
<tr class="tablerow">
<td class="tablecolumn">Child two completes</td>
<td class="tablecolumn">Child two is [1,3] so our calculation is (1 / 3) * 100 or 33.333 percent.</td>
<td class="tablecolumn">50% + 33.333% = 83.333%</td>
</tr>
<tr class="tablerow">
<td class="tablecolumn">Child three completes</td>
<td class="tablecolumn">Child three is [1,3,2] so our calculation is (1 / 3 / 2) * 100 or 16.666 percent.</td>
<td class="tablecolumn">83.333% + 16.666% = 100%</td>
</tr>
</tbody>
</table>
<p>This calculation does carry some caveats. There is a certain margin of error while performing floating point division in most programming languages. In java for example:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> Test <span style="color: #009900;">&#123;</span>
    <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000066; font-weight: bold;">void</span> main<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> args<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #000066; font-weight: bold;">double</span> calc <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">1.0</span> <span style="color: #339933;">/</span> <span style="color: #cc66cc;">3.0</span> <span style="color: #339933;">/</span> <span style="color: #cc66cc;">2.0</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">*</span> <span style="color: #cc66cc;">100</span> <span style="color: #339933;">+</span> <span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">1.0</span> <span style="color: #339933;">/</span> <span style="color: #cc66cc;">3.0</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">*</span> <span style="color: #cc66cc;">100</span> <span style="color: #339933;">+</span> <span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">1.0</span> <span style="color: #339933;">/</span> <span style="color: #cc66cc;">3.0</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">*</span> <span style="color: #cc66cc;">100</span> <span style="color: #339933;">+</span>
            <span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">1.0</span> <span style="color: #339933;">/</span> <span style="color: #cc66cc;">3.0</span> <span style="color: #339933;">/</span> <span style="color: #cc66cc;">2.0</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">*</span> <span style="color: #cc66cc;">100</span><span style="color: #339933;">;</span>
        <span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span>calc<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>produces &#8220;99.99999999999997&#8243;. To avoid miscalculation, we multiply by 1000, round the result, and divide by 1000:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
</pre></td><td class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.lang.Math</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> Test <span style="color: #009900;">&#123;</span>
    <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000066; font-weight: bold;">void</span> main<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> args<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #000066; font-weight: bold;">double</span> calc <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">1.0</span> <span style="color: #339933;">/</span> <span style="color: #cc66cc;">3.0</span> <span style="color: #339933;">/</span> <span style="color: #cc66cc;">2.0</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">*</span> <span style="color: #cc66cc;">100</span> <span style="color: #339933;">+</span> <span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">1.0</span> <span style="color: #339933;">/</span> <span style="color: #cc66cc;">3.0</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">*</span> <span style="color: #cc66cc;">100</span> <span style="color: #339933;">+</span> <span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">1.0</span> <span style="color: #339933;">/</span> <span style="color: #cc66cc;">3.0</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">*</span> <span style="color: #cc66cc;">100</span> <span style="color: #339933;">+</span> 
            <span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">1.0</span> <span style="color: #339933;">/</span> <span style="color: #cc66cc;">3.0</span> <span style="color: #339933;">/</span> <span style="color: #cc66cc;">2.0</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">*</span> <span style="color: #cc66cc;">100</span><span style="color: #339933;">;</span>
        <span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #003399;">Math</span>.<span style="color: #006633;">round</span><span style="color: #009900;">&#40;</span>calc <span style="color: #339933;">*</span> <span style="color: #cc66cc;">1000</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">/</span> <span style="color: #cc66cc;">1000</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

]]></content:encoded>
			<wfw:commentRss>http://openinit.com/c/2010/08/08/worker-completion-calculation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>pylons dynamic xml/json web service</title>
		<link>http://openinit.com/c/2010/07/12/pylons-dynamic-xmljson-web-service/</link>
		<comments>http://openinit.com/c/2010/07/12/pylons-dynamic-xmljson-web-service/#comments</comments>
		<pubDate>Tue, 13 Jul 2010 01:49:57 +0000</pubDate>
		<dc:creator>rob</dc:creator>
				<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://openinit.com/c/?p=615</guid>
		<description><![CDATA[While creating some REST style web services, I started to notice a pattern. Most of the controller actions generated dictionaries that were manually manipulated by separate mako templates. It finally made more sense to automatically generate JSON and XML from the structure of the dictionaries. Here are the methods used to generate JSON output: 1 [...]]]></description>
			<content:encoded><![CDATA[<p>While creating some REST style web services, I started to notice a pattern. Most of the controller actions generated dictionaries that were manually manipulated by separate mako templates. It finally made more sense to automatically generate JSON and XML from the structure of the dictionaries. Here are the methods used to generate JSON output:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> jsonarray<span style="color: black;">&#40;</span><span style="color: #dc143c;">array</span><span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">''</span><span style="color: #483d8b;">'create and encode a json array'</span><span style="color: #483d8b;">''</span>
    val = <span style="color: black;">&#40;</span>jsonvalue<span style="color: black;">&#40;</span>i<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #dc143c;">array</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #483d8b;">&quot;[&quot;</span> + <span style="color: #483d8b;">&quot;,&quot;</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span>val<span style="color: black;">&#41;</span> + <span style="color: #483d8b;">&quot;]&quot;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> jsondate<span style="color: black;">&#40;</span>date<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">''</span><span style="color: #483d8b;">'string format date'</span><span style="color: #483d8b;">''</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> jsonencode<span style="color: black;">&#40;</span>date.<span style="color: black;">isoformat</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> jsondict<span style="color: black;">&#40;</span>d<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">''</span><span style="color: #483d8b;">'create and encode a json array'</span><span style="color: #483d8b;">''</span>
    res = <span style="color: #ff7700;font-weight:bold;">lambda</span> k,v : <span style="color: #483d8b;">&quot;%s : %s&quot;</span> <span style="color: #66cc66;">%</span> \
        <span style="color: black;">&#40;</span>jsonencode<span style="color: black;">&#40;</span>k<span style="color: black;">&#41;</span>, jsonvalue<span style="color: black;">&#40;</span>v<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    res = <span style="color: #483d8b;">&quot;,&quot;</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span><span style="color: black;">&#40;</span>res<span style="color: black;">&#40;</span>key, value<span style="color: black;">&#41;</span> \
        <span style="color: #ff7700;font-weight:bold;">for</span> key, value <span style="color: #ff7700;font-weight:bold;">in</span> d.<span style="color: black;">iteritems</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #483d8b;">&quot;{%s}&quot;</span> <span style="color: #66cc66;">%</span> res
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> jsonint<span style="color: black;">&#40;</span>i<span style="color: black;">&#41;</span>: <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">str</span><span style="color: black;">&#40;</span>i<span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> jsonsimple<span style="color: black;">&#40;</span>i<span style="color: black;">&#41;</span>: <span style="color: #ff7700;font-weight:bold;">return</span> jsonencode<span style="color: black;">&#40;</span><span style="color: #008000;">str</span><span style="color: black;">&#40;</span>i<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
_jsonhandlers = <span style="color: black;">&#123;</span>
    <span style="color: #dc143c;">datetime</span> : jsondate,
    <span style="color: #008000;">dict</span> : jsondict,
    <span style="color: #008000;">int</span> : jsonint,
    <span style="color: #008000;">list</span> : jsonarray,
    <span style="color: #008000;">tuple</span> : jsonarray,
<span style="color: black;">&#125;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> jsonvalue<span style="color: black;">&#40;</span>value<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">''</span><span style="color: #483d8b;">'handle json value conversion'</span><span style="color: #483d8b;">''</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> _jsonhandlers.<span style="color: black;">get</span><span style="color: black;">&#40;</span><span style="color: #008000;">type</span><span style="color: black;">&#40;</span>value<span style="color: black;">&#41;</span>, jsonsimple<span style="color: black;">&#41;</span><span style="color: black;">&#40;</span>value<span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>and the XML:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> xmlarray<span style="color: black;">&#40;</span><span style="color: #dc143c;">array</span>, key=<span style="color: #483d8b;">&quot;result&quot;</span><span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">''</span><span style="color: #483d8b;">'create and encode an xml array'</span><span style="color: #483d8b;">''</span>
    vals = <span style="color: #483d8b;">&quot;&lt;/%s&gt;&lt;%s&gt;&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>key, key<span style="color: black;">&#41;</span>
    vals = vals.<span style="color: black;">join</span><span style="color: black;">&#40;</span><span style="color: black;">&#40;</span>xmlvalue<span style="color: black;">&#40;</span>i<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #dc143c;">array</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #483d8b;">&quot;&lt;%s&gt;%s&lt;/%s&gt;&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>key, vals, key<span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> xmldate<span style="color: black;">&#40;</span>date<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">''</span><span style="color: #483d8b;">'string format date'</span><span style="color: #483d8b;">''</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> date.<span style="color: black;">isoformat</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> xmldict<span style="color: black;">&#40;</span>d<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">''</span><span style="color: #483d8b;">'create and encode an xml array'</span><span style="color: #483d8b;">''</span>
    res = <span style="color: #ff7700;font-weight:bold;">lambda</span> k,v : <span style="color: #483d8b;">&quot;&lt;%s&gt;%s&lt;/%s&gt;&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>k, xmlvalue<span style="color: black;">&#40;</span>v<span style="color: black;">&#41;</span>, k<span style="color: black;">&#41;</span>
    res = <span style="color: #483d8b;">&quot;&quot;</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span><span style="color: black;">&#40;</span>res<span style="color: black;">&#40;</span>key, value<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> key, value <span style="color: #ff7700;font-weight:bold;">in</span> d.<span style="color: black;">iteritems</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> res
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> xmlsimple<span style="color: black;">&#40;</span>i<span style="color: black;">&#41;</span>: <span style="color: #ff7700;font-weight:bold;">return</span> i
&nbsp;
_xmlhandlers = <span style="color: black;">&#123;</span>
    <span style="color: #dc143c;">datetime</span> : xmldate,
    <span style="color: #008000;">dict</span> : xmldict,
    <span style="color: #008000;">list</span> : xmlarray,
    <span style="color: #008000;">tuple</span> : xmlarray,
<span style="color: black;">&#125;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> xmlvalue<span style="color: black;">&#40;</span>value<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">''</span><span style="color: #483d8b;">'handle xml value conversion'</span><span style="color: #483d8b;">''</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> _xmlhandlers.<span style="color: black;">get</span><span style="color: black;">&#40;</span><span style="color: #008000;">type</span><span style="color: black;">&#40;</span>value<span style="color: black;">&#41;</span>, xmlsimple<span style="color: black;">&#41;</span><span style="color: black;">&#40;</span>value<span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>Calling these methods is fairly straight forward in the templates. After determining the output type (I&#8217;ve been using the &#8220;Accept&#8221; header), here is a JSON example:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #800000;">${h.jsondict(c.result) | n}</span></pre></td></tr></table></div>

<p>and XML:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;</span>result<span style="color: #000000; font-weight: bold;">&gt;</span><span style="color: #800000;">${h.xmldict(c.result) | n}</span><span style="color: #000000; font-weight: bold;">&lt;/</span>result<span style="color: #000000; font-weight: bold;">&gt;</span></pre></td></tr></table></div>

]]></content:encoded>
			<wfw:commentRss>http://openinit.com/c/2010/07/12/pylons-dynamic-xmljson-web-service/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>simple cron-style batch scheduling</title>
		<link>http://openinit.com/c/2010/06/19/simple-batch-algorithm/</link>
		<comments>http://openinit.com/c/2010/06/19/simple-batch-algorithm/#comments</comments>
		<pubDate>Sat, 19 Jun 2010 14:22:29 +0000</pubDate>
		<dc:creator>rob</dc:creator>
				<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://openinit.com/c/?p=592</guid>
		<description><![CDATA[Some projects require background processes to run on a schedule, independent of the requesting client. For example, statistical data might need to be extracted nightly from a large dataset. There are multiple job scheduling packages well suited for the task, but they might be overkill for simple applications. Also, they are often locked into a [...]]]></description>
			<content:encoded><![CDATA[<p>Some projects require background processes to run on a schedule, independent of the requesting client. For example, statistical data might need to be extracted nightly from a large dataset. There are <a href="http://en.wikipedia.org/wiki/Job_scheduler">multiple job scheduling packages</a> well suited for the task, but they might be overkill for simple applications. Also, they are often locked into a particular programming language or platform.</p>
<p>As an avid unix user, the cron daemon has been an irreplaceable tool. The parameters are intuitive, with options to set the minute, hour, day of month, month, and day of week, that allow any pattern of scheduling. It is possible to build a similar simple scheduling system for an application; two methods add a schedule and retrieve current schedules. These examples rely on python and mongodb, but can be easily adjusted for a different language and a relational database. </p>
<p>Here is the scheduler with constant definitions for the field keys. I use the required &#8220;params&#8221; key as a dictionary holding the task configuration. This code of course doesn&#8217;t handle all cron options, such as ranges and step values, but should suit many tasks. If a value isn&#8217;t specified by a named parameter, it becomes a wildcard match, similar to cron:</p>
<p><br/><br/></p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code"><pre class="python" style="font-family:monospace;">_datefields = <span style="color: black;">&#91;</span><span style="color: #483d8b;">&quot;minute&quot;</span>, <span style="color: #483d8b;">&quot;hour&quot;</span>, <span style="color: #483d8b;">&quot;day&quot;</span>, <span style="color: #483d8b;">&quot;month&quot;</span>, <span style="color: #483d8b;">&quot;dayofweek&quot;</span><span style="color: black;">&#93;</span>
<span style="color: #ff7700;font-weight:bold;">def</span> schedule<span style="color: black;">&#40;</span><span style="color: #dc143c;">user</span>, params, minute=<span style="color: #483d8b;">&quot;*&quot;</span>, hour=<span style="color: #483d8b;">&quot;*&quot;</span>, day=<span style="color: #483d8b;">&quot;*&quot;</span>, month=<span style="color: #483d8b;">&quot;*&quot;</span>, dow=<span style="color: #483d8b;">&quot;*&quot;</span><span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">''</span><span style="color: #483d8b;">'add a schedule item to the database'</span><span style="color: #483d8b;">''</span>
    schedule = <span style="color: black;">&#123;</span>Extract.<span style="color: black;">USER</span> : <span style="color: #dc143c;">user</span>, S.<span style="color: black;">PARAMS</span> : params<span style="color: black;">&#125;</span>
    vals = <span style="color: black;">&#91;</span>minute, hour, day, month, dow<span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> cnt, key <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">enumerate</span><span style="color: black;">&#40;</span>_datefields<span style="color: black;">&#41;</span>: schedule<span style="color: black;">&#91;</span>key<span style="color: black;">&#93;</span> = vals<span style="color: black;">&#91;</span>cnt<span style="color: black;">&#93;</span>
    mongodb.<span style="color: black;">schedules</span>.<span style="color: black;">save</span><span style="color: black;">&#40;</span>schedule<span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>And the schedule retrieval code, which returns current entries. This code basically looks for a match on either the &#8216;*&#8217; pattern or the current timestamped value:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> current_schedules<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">''</span><span style="color: #483d8b;">'find schedule items that need to be run'</span><span style="color: #483d8b;">''</span>
    dt = <span style="color: #dc143c;">datetime</span>.<span style="color: #dc143c;">datetime</span>.<span style="color: black;">today</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    look = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
    vals = <span style="color: black;">&#91;</span>dt.<span style="color: black;">minute</span>, dt.<span style="color: black;">hour</span>, dt.<span style="color: black;">day</span>, dt.<span style="color: black;">month</span>, dt.<span style="color: black;">weekday</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> c, k <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">enumerate</span><span style="color: black;">&#40;</span>_datefields<span style="color: black;">&#41;</span>: look<span style="color: black;">&#91;</span>k<span style="color: black;">&#93;</span> = <span style="color: black;">&#123;</span> <span style="color: #483d8b;">&quot;$in&quot;</span> : <span style="color: black;">&#91;</span><span style="color: #483d8b;">&quot;*&quot;</span>, vals<span style="color: black;">&#91;</span>c<span style="color: black;">&#93;</span><span style="color: black;">&#93;</span><span style="color: black;">&#125;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> mongodb.<span style="color: black;">schedules</span>.<span style="color: black;">find</span><span style="color: black;">&#40;</span>look<span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>Here is some simple client code:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> run<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">try</span>:
        <span style="color: #ff7700;font-weight:bold;">while</span> <span style="color: #ff4500;">1</span>:
            <span style="color: #ff7700;font-weight:bold;">for</span> schedule <span style="color: #ff7700;font-weight:bold;">in</span> current_schedules<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
                _log.<span style="color: black;">info</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;batch item at [%s]&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: #dc143c;">datetime</span>.<span style="color: black;">today</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
                some_execution_method<span style="color: black;">&#40;</span>schedule<span style="color: black;">&#91;</span><span style="color: #483d8b;">&quot;params&quot;</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
            <span style="color: #808080; font-style: italic;"># try to wake up at one second after hour</span>
            <span style="color: #dc143c;">time</span>.<span style="color: black;">sleep</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">60</span> - <span style="color: #dc143c;">datetime</span>.<span style="color: #dc143c;">datetime</span>.<span style="color: black;">today</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>.<span style="color: black;">second</span> + <span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">except</span> <span style="color: #008000;">KeyboardInterrupt</span>: <span style="color: #ff7700;font-weight:bold;">print</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;killed...&quot;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">&quot;__main__&quot;</span>: run<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>Of course, this code isn&#8217;t thread-safe or &#8220;multiple-client-safe&#8221;, so running multiple clients is probably not a good idea.</p>
]]></content:encoded>
			<wfw:commentRss>http://openinit.com/c/2010/06/19/simple-batch-algorithm/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>distributed processing with python and rabbitmq</title>
		<link>http://openinit.com/c/2010/06/06/distributed-processing-with-python-and-rabbitmq/</link>
		<comments>http://openinit.com/c/2010/06/06/distributed-processing-with-python-and-rabbitmq/#comments</comments>
		<pubDate>Mon, 07 Jun 2010 02:27:28 +0000</pubDate>
		<dc:creator>rob</dc:creator>
				<category><![CDATA[python]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://openinit.com/c/?p=574</guid>
		<description><![CDATA[Most software developer types are aware of &#8220;MapReduce&#8221;, made famous by Google. Basically, a large set of data is split into pieces and distributed over a network to multiple &#8220;workers&#8221;, who process in parallel. The processed data is then returned to a &#8220;reducer&#8221; who aggregates the results into a final dataset. There are several implementations [...]]]></description>
			<content:encoded><![CDATA[<p>Most software developer types are aware of &#8220;MapReduce&#8221;, made famous by Google. Basically, a large set of data is split into pieces and distributed over a network to multiple &#8220;workers&#8221;, who process in parallel. The processed data is then returned to a &#8220;reducer&#8221; who aggregates the results into a final dataset. </p>
<p>There are <a href="http://en.wikipedia.org/wiki/MapReduce#Implementations">several implementations</a> of this strategy; of which, Hadoop seems to be the most respected in the open source world. Yahoo has fully <a href="http://developer.yahoo.com/hadoop">embraced</a> the project.</p>
<p>However, Hadoop does have some caveats. Configuring a cluster requires editing a multitude of XML files. Hadoop filesystem has to be installed. The master node is not fault tolerant. Typically the &#8220;slave&#8221; node locations are stored in file, though new ones can be added dynamically. Hadoop also requires jobs to be written in languages that will run on the JVM.</p>
<p>When evaluating Hadoop, we decided it was a little too complex for our first proof-of-concept for a client. Plus we want to use tools like <a href="http://codespeak.net/pypy/dist/pypy/doc/">pypy</a> with our Python application to boost performance. But what was an alternative? <a href="http://discoproject.org">Disco</a>? Seemed reasonable. Unfortunately, messages had to be pushed via SSH to &#8220;workers&#8221;, making them difficult to organize within firewalls. And additional workers have to be added manually through the web interface.</p>
<p>Sometimes square pegs just won&#8217;t fit into round holes. So, what technology was round enough for our needs? We knew distributing tasks was naturally solved by queues; one went in, another went out, until the queue was empty. In the networked world, message queues can spread job information across machines and geography. AMQP allows advanced routing of these messages through <a href="http://rajith.2rlabs.com/2007/10/13/amqp-in-10-mins-part4-standard-exchange-types-and-supporting-common-messaging-use-cases/">exchanges</a> and is widely supported by a multitude of languages. Message queuing systems are also designed to be highly fault tolerant and distributed.</p>
<p>And it worked pretty well. We used <a href="http://barryp.org/software/py-amqplib/">py-amqplib</a> as a client attached to <a href="http://www.rabbitmq.com/">rabbitmq</a> with some fairly basic code:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> amqplib <span style="color: #ff7700;font-weight:bold;">import</span> client_0_8 <span style="color: #ff7700;font-weight:bold;">as</span> amqp
<span style="color: #ff7700;font-weight:bold;">from</span> dobie.<span style="color: black;">extract</span> <span style="color: #ff7700;font-weight:bold;">import</span> configval
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">logging</span>
<span style="color: #ff7700;font-weight:bold;">import</span> uuid
&nbsp;
log = <span style="color: #dc143c;">logging</span>.<span style="color: black;">getLogger</span><span style="color: black;">&#40;</span>__name__<span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> Connection<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">''</span><span style="color: #483d8b;">'basic connection to mq server'</span><span style="color: #483d8b;">''</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: black;">host</span> = configval<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;host&quot;</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">user_id</span> = configval<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;user_id&quot;</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">password</span> = configval<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;password&quot;</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">port</span> = configval<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;port&quot;</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">connect</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> connect<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: black;">connection</span> = amqp.<span style="color: black;">Connection</span><span style="color: black;">&#40;</span>
            host=<span style="color: #483d8b;">'%s:%s'</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">host</span>, <span style="color: #008000;">self</span>.<span style="color: black;">port</span><span style="color: black;">&#41;</span>,
            userid=<span style="color: #008000;">self</span>.<span style="color: black;">user_id</span>, password=<span style="color: #008000;">self</span>.<span style="color: black;">password</span>
        <span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> Base<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">''</span><span style="color: #483d8b;">'shared code for publisher and consumer'</span><span style="color: #483d8b;">''</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, queue, connection=<span style="color: #008000;">None</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: black;">connection</span> = connection <span style="color: #ff7700;font-weight:bold;">or</span> Connection<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">channel</span> = <span style="color: #008000;">self</span>.<span style="color: black;">connection</span>.<span style="color: black;">connection</span>.<span style="color: black;">channel</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">ex</span> = configval<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;exchange&quot;</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">queue</span> = queue
        params = <span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">queue</span>, <span style="color: #008000;">self</span>.<span style="color: black;">ex</span>, <span style="color: #008000;">self</span>.<span style="color: black;">queue</span><span style="color: black;">&#41;</span>
        log.<span style="color: black;">debug</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;queue [%s], exchange [%s], routing [%s]&quot;</span> <span style="color: #66cc66;">%</span> params<span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> setup<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #483d8b;">''</span><span style="color: #483d8b;">'configure queue and exchange'</span><span style="color: #483d8b;">''</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">channel</span>.<span style="color: black;">queue_declare</span><span style="color: black;">&#40;</span>
            queue=<span style="color: #008000;">self</span>.<span style="color: black;">queue</span>, durable=<span style="color: #008000;">True</span>, exclusive=<span style="color: #008000;">False</span>, auto_delete=<span style="color: #008000;">False</span>
        <span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">channel</span>.<span style="color: black;">exchange_declare</span><span style="color: black;">&#40;</span>
            exchange=<span style="color: #008000;">self</span>.<span style="color: black;">ex</span>, <span style="color: #008000;">type</span>=<span style="color: #483d8b;">'direct'</span>, durable=<span style="color: #008000;">True</span>, auto_delete=<span style="color: #008000;">False</span>
        <span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">channel</span>.<span style="color: black;">queue_bind</span><span style="color: black;">&#40;</span>
            queue=<span style="color: #008000;">self</span>.<span style="color: black;">queue</span>, exchange=<span style="color: #008000;">self</span>.<span style="color: black;">ex</span>, routing_key=<span style="color: #008000;">self</span>.<span style="color: black;">queue</span>
        <span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> close<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #483d8b;">''</span><span style="color: #483d8b;">'close connections'</span><span style="color: #483d8b;">''</span>
        <span style="color: #808080; font-style: italic;"># TODO consider pooling</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">channel</span>.<span style="color: black;">close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">connection</span>.<span style="color: black;">connection</span>.<span style="color: black;">close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> __enter__<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: black;">setup</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> __exit__<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, _type, value, <span style="color: #dc143c;">traceback</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: black;">close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> Consumer<span style="color: black;">&#40;</span>Base<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">''</span><span style="color: #483d8b;">'consumer of messages'</span><span style="color: #483d8b;">''</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, queue, callback, connection=<span style="color: #008000;">None</span><span style="color: black;">&#41;</span>:
        Base.<span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, queue, connection<span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">callback</span> = callback
        <span style="color: #008000;">self</span>.<span style="color: black;">cid</span> = <span style="color: #008000;">str</span><span style="color: black;">&#40;</span>uuid.<span style="color: black;">uuid4</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> __enter__<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">def</span> _callback<span style="color: black;">&#40;</span>message<span style="color: black;">&#41;</span>:
            <span style="color: #008000;">self</span>.<span style="color: black;">callback</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, message<span style="color: black;">&#41;</span>
        Base.__enter__<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">channel</span>.<span style="color: black;">basic_consume</span><span style="color: black;">&#40;</span>
            queue=<span style="color: #008000;">self</span>.<span style="color: black;">queue</span>, callback=_callback, consumer_tag=<span style="color: #008000;">self</span>.<span style="color: black;">cid</span>
        <span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> wait<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, count=-<span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span>:
        <span style="color: #483d8b;">''</span><span style="color: #483d8b;">'wait for messages. if count is set, only consume a limited number
           before exiting'</span><span style="color: #483d8b;">''</span>
        <span style="color: #ff7700;font-weight:bold;">while</span> count <span style="color: #66cc66;">!</span>= <span style="color: #ff4500;">0</span>:
            <span style="color: #008000;">self</span>.<span style="color: black;">channel</span>.<span style="color: black;">wait</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
            <span style="color: #ff7700;font-weight:bold;">if</span> count <span style="color: #66cc66;">&gt;</span> <span style="color: #ff4500;">0</span>: count = count - <span style="color: #ff4500;">1</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> Publisher<span style="color: black;">&#40;</span>Base<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">''</span><span style="color: #483d8b;">'publisher of messages'</span><span style="color: #483d8b;">''</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, queue, connection=<span style="color: #008000;">None</span><span style="color: black;">&#41;</span>:
        Base.<span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, queue, connection<span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">delivery_mode</span> = <span style="color: #008000;">int</span><span style="color: black;">&#40;</span>configval<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;delivery_mode&quot;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> publish<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, message<span style="color: black;">&#41;</span>:
        message = amqp.<span style="color: black;">Message</span><span style="color: black;">&#40;</span>message<span style="color: black;">&#41;</span>
        message.<span style="color: black;">properties</span><span style="color: black;">&#91;</span><span style="color: #483d8b;">'delivery_mode'</span><span style="color: black;">&#93;</span> = <span style="color: #008000;">self</span>.<span style="color: black;">delivery_mode</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">channel</span>.<span style="color: black;">basic_publish</span><span style="color: black;">&#40;</span>
            message, exchange=<span style="color: #008000;">self</span>.<span style="color: black;">ex</span>, routing_key=<span style="color: #008000;">self</span>.<span style="color: black;">queue</span>
        <span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>There are some potential problems with this code. Connections aren&#8217;t pooled and it only uses a &#8220;direct&#8221; exchange. However, it suited our needs. </p>
<p>We relied on dictionaries to transfer data. Since we planned to stick with python, we pickled them before sending them over the line, which rabbitmq was happy to accept. </p>
<p>There is one problem which we have yet to resolve: if a client is connected and processing messages, newly connected clients will only receive messages that are added later. Will post a fix if we figure it out.</p>
]]></content:encoded>
			<wfw:commentRss>http://openinit.com/c/2010/06/06/distributed-processing-with-python-and-rabbitmq/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>the disconnect between complex web apps and HTTP</title>
		<link>http://openinit.com/c/2010/05/14/http-disconnect/</link>
		<comments>http://openinit.com/c/2010/05/14/http-disconnect/#comments</comments>
		<pubDate>Fri, 14 May 2010 21:35:42 +0000</pubDate>
		<dc:creator>rob</dc:creator>
				<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://openinit.com/c/?p=543</guid>
		<description><![CDATA[fundamental disconnect Any web application that approaches moderate complexity almost always stumbles across a data transmission problem. Modeling relationships that need to be stored back at the server, such as parent-child or groups, with key-value pairs is not a natural fit. Anyone who worked with struts in the early part of the decade is very [...]]]></description>
			<content:encoded><![CDATA[<h2 class="header">fundamental disconnect</h2>
<p>Any web application that approaches moderate complexity almost always stumbles across a data transmission problem. Modeling relationships that need to be stored back at the server, such as parent-child or groups, with key-value pairs is not a natural fit. Anyone who worked with <a href="http://struts.apache.org/">struts</a> in the early part of the decade is very aware of this problem.</p>
<p>On the other hand, the DOM and more generally XML feature a natural method of designating these associations by embedding tags. For instance, in HTML one could code:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code"><pre class="html4strict" style="font-family:monospace;"><span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">div</span> <span style="color: #000066;">id</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;parent&quot;</span>&gt;</span>
  <span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">span</span> <span style="color: #000066;">class</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;name&quot;</span>&gt;</span>jerry<span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">span</span>&gt;</span>
  <span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">div</span> <span style="color: #000066;">class</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;child&quot;</span>&gt;</span>
    <span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">input</span> <span style="color: #000066;">class</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;name&quot;</span> <span style="color: #000066;">value</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;joey&quot;</span>&gt;&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">input</span>&gt;</span>
  <span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">div</span>&gt;</span>
<span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">div</span>&gt;</span></pre></td></tr></table></div>

<p>which indicates &#8220;joey&#8221; is the child of &#8220;jerry&#8221;. Now imagine trying to transmit this relationship with HTTP parameters. This might work:</p>
<p><code>parent=jerry&#038;jerrychild=joey</code></p>
<p>but deciphering such a statement on the receiving end would be complex and fragile. A document database like CouchDB might fit the task, but then you would have to create javascript code to generate JSON from the child elements. In other words, double code both the page generation and the javascript to &#8220;serialize&#8221; objects.</p>
<h2 class="header">real world experience</h2>
<p>At a recent client, we faced this problem in a dramatic fashion. The group was comprised mostly of front-end experts, who took XML data from a custom CMS and transformed it with XSL to produce their final content. They couldn&#8217;t write the server code to store the submitted data back into the database. And they wanted to create very complex relationships in their data models.</p>
<h2 class="header">modeling data in the DOM</h2>
<p>To mitigate this skill-set disconnect, we created a javascript library in jQuery to generate an XML document from embedded tags in the DOM. This library iteratively loops through a specified element and it&#8217;s children, searching for specific tags. If an element has the class &#8220;complex&#8221;, a tag is generated with the value from the &#8220;id&#8221; or &#8220;name&#8221; attribute before it&#8217;s children are processed. If an element has a &#8220;primitive&#8221; class, a simple element is generated from it&#8217;s &#8220;id&#8221; or &#8220;name&#8221; attribute and value. An attribute is created by specifying an &#8220;id&#8221; or &#8220;name&#8221; attribute with an &#8220;att&#8221; tag. </p>
<p>Following our earlier example:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code"><pre class="html4strict" style="font-family:monospace;"><span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">div</span> <span style="color: #000066;">id</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;parent&quot;</span> <span style="color: #000066;">class</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;complex&quot;</span>&gt;</span>
  <span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">span</span> <span style="color: #000066;">name</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;name&quot;</span> <span style="color: #000066;">class</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;primitive&quot;</span>&gt;</span>jerry<span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">span</span>&gt;</span>
  <span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">div</span> <span style="color: #000066;">name</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;child&quot;</span> <span style="color: #000066;">class</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;complex&quot;</span>&gt;</span>
      <span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">span</span> <span style="color: #000066;">name</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;birth-order&quot;</span> <span style="color: #000066;">class</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;att&quot;</span>&gt;</span>1<span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">span</span>&gt;</span>
      <span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">input</span> <span style="color: #000066;">name</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;name&quot;</span> <span style="color: #000066;">class</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;primitive&quot;</span> <span style="color: #000066;">value</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;joey&quot;</span><span style="color: #66cc66;">/</span>&gt;</span>
  <span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">div</span>&gt;</span>
<span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">div</span>&gt;</span></pre></td></tr></table></div>

<p>would produce:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code"><pre class="xml" style="font-family:monospace;"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;parent<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>jerry<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;child</span> <span style="color: #000066;">birth-order</span>=<span style="color: #ff0000;">&quot;1&quot;</span><span style="color: #000000; font-weight: bold;">&gt;</span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>joey<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/child<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/parent<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></pre></td></tr></table></div>

<p>This data can be submitted (via ajax) to the backend for processing, updating, validation, whatever. You can download library here:</p>
<p><a href="http://github.com/greyrl/relationalxml/blob/master/js/jquery.dom.js">http://github.com/greyrl/relationalxml/blob/master/js/jquery.dom.js</a></p>
<p>You will also need the custom jQuery &#8220;<a href="http://github.com/greyrl/relationalxml/blob/master/js/jquery.class.js">class</a>&#8221; and &#8220;<a href="http://github.com/greyrl/relationalxml/blob/master/js/jquery.util.js">util</a>&#8221; packages. After importing the libraries, you can generate XML Documents or strings:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
</pre></td><td class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #006600; font-style: italic;">// documents</span>
<span style="color: #003366; font-weight: bold;">var</span> docs <span style="color: #339933;">=</span> $<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">&quot;#parent&quot;</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">xmlGen</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #006600; font-style: italic;">// strings</span>
<span style="color: #003366; font-weight: bold;">var</span> docStrings <span style="color: #339933;">=</span> $<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">&quot;#parent&quot;</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">xmlGen</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">innerXML</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<p>If you want to send that data back to the server, you should also consider <a href="http://github.com/greyrl/relationalxml/blob/master/js/jquery.base64.js">Base64 encoding</a>.</p>
<h2 class="header">caveats</h2>
<p>If the id attribute for a group cannot be unique, this library does require a user to bastardize the DOM by adding the unsupported (at least on most elements) &#8220;name&#8221; attribute. However, we feel it is a small price to pay for flexibility.</p>
]]></content:encoded>
			<wfw:commentRss>http://openinit.com/c/2010/05/14/http-disconnect/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>extracting data with selenium</title>
		<link>http://openinit.com/c/2010/04/26/extracting-data-with-selenium/</link>
		<comments>http://openinit.com/c/2010/04/26/extracting-data-with-selenium/#comments</comments>
		<pubDate>Mon, 26 Apr 2010 15:15:08 +0000</pubDate>
		<dc:creator>rob</dc:creator>
				<category><![CDATA[python]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://openinit.com/c/?p=533</guid>
		<description><![CDATA[If you&#8217;ve spent any time developing for the web as an independent contractor, you&#8217;ve probably run into a client who wants to programmatically extract data from a web site. Typically the site is a front end to a database, modified by search terms or configurations. You might also find this client needs to extract a [...]]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;ve spent any time developing for the web as an independent contractor, you&#8217;ve probably run into a client who wants to programmatically extract data from a web site. Typically the site is a front end to a database, modified by search terms or configurations. You might also find this client needs to extract a variety of data from a multitude of paths. <a href="http://seleniumhq.org/">Selenium</a>, traditionally used as a testing tool, can be a great asset in such a situation. </p>
<p>The &#8220;development&#8221; portion of Selenium is a Firefox plugin that records user clicks, paths, and form entries as the user interacts with the browser. Normally, this &#8220;recording&#8221; is converted to a programming language (by Selenium) and tweaked for automated, continuous testing during web development. However, generating code from a selenium script opens up the world of flexibility provided by the underlying programming language. For example, we could use loop constructs to continuously click on a &#8220;next&#8221; button while searching for a specific piece of data. This data could then be stored in a database or file, even sent to a web service.</p>
<p>Here is a very basic example of such functionality:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> selenium <span style="color: #ff7700;font-weight:bold;">import</span> selenium
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">unittest</span>, <span style="color: #dc143c;">time</span>, <span style="color: #dc143c;">re</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> sample<span style="color: black;">&#40;</span><span style="color: #dc143c;">unittest</span>.<span style="color: black;">TestCase</span><span style="color: black;">&#41;</span>:
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> setUp<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: black;">verificationErrors</span> = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">selenium</span> = selenium<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;localhost&quot;</span>, <span style="color: #ff4500;">4444</span>, <span style="color: #483d8b;">&quot;*chrome&quot;</span>, <span style="color: #483d8b;">&quot;http://your.site.here&quot;</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">selenium</span>.<span style="color: black;">start</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">with</span> <span style="color: #008000;">open</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;parts.txt&quot;</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">as</span> f: <span style="color: #008000;">self</span>.<span style="color: black;">parts</span> = f.<span style="color: black;">readlines</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> testSample<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        sel = <span style="color: #008000;">self</span>.<span style="color: black;">selenium</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> part <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">self</span>.<span style="color: black;">parts</span>:
            sel.<span style="color: #008000;">open</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;/database.html&quot;</span><span style="color: black;">&#41;</span>
            sel.<span style="color: #008000;">type</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;partnumber&quot;</span>, part.<span style="color: black;">strip</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
            sel.<span style="color: black;">click</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;go&quot;</span><span style="color: black;">&#41;</span>
            sel.<span style="color: black;">wait_for_page_to_load</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;30000&quot;</span><span style="color: black;">&#41;</span>
            <span style="color: #808080; font-style: italic;"># look for a numeric pattern</span>
            sel.<span style="color: black;">click</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;link=regexp:[0-9]{7}&quot;</span><span style="color: black;">&#41;</span>
            <span style="color: #ff7700;font-weight:bold;">while</span> <span style="color: #ff4500;">1</span>:
                <span style="color: #ff7700;font-weight:bold;">try</span>: 
                    sel.<span style="color: black;">wait_for_page_to_load</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;30000&quot;</span><span style="color: black;">&#41;</span>
                    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span>,<span style="color: #ff4500;">8</span><span style="color: black;">&#41;</span>:
                        <span style="color: #008000;">id</span> = sel.<span style="color: black;">get_table</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;//table[3].%s.1&quot;</span> <span style="color: #66cc66;">%</span> i<span style="color: black;">&#41;</span>
                        status = sel.<span style="color: black;">get_table</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;//table[3].%s.5&quot;</span> <span style="color: #66cc66;">%</span> i<span style="color: black;">&#41;</span>
                        <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;id %s, status %s &quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span><span style="color: #008000;">id</span>, status<span style="color: black;">&#41;</span>
                        <span style="color: #ff7700;font-weight:bold;">if</span> status == <span style="color: #483d8b;">&quot;READY&quot;</span>: <span style="color: #008000;">self</span>._processHit<span style="color: black;">&#40;</span><span style="color: #008000;">id</span><span style="color: black;">&#41;</span>
                    sel.<span style="color: black;">click</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;next&quot;</span><span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">except</span> <span style="color: #008000;">Exception</span> <span style="color: #ff7700;font-weight:bold;">as</span> e:
                    <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;error or end of processing encountered&quot;</span>
                    <span style="color: #808080; font-style: italic;">#print e</span>
                    <span style="color: #ff7700;font-weight:bold;">break</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> _processHit<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, <span style="color: #008000;">id</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;process %s&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: #008000;">id</span>
        sel = <span style="color: #008000;">self</span>.<span style="color: black;">selenium</span>
        sel.<span style="color: black;">click</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;link=%s&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: #008000;">id</span><span style="color: black;">&#41;</span>
        sel.<span style="color: black;">wait_for_page_to_load</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;30000&quot;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">with</span> <span style="color: #008000;">open</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;result.txt&quot;</span>, <span style="color: #483d8b;">&quot;a&quot;</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">as</span> f:
            f.<span style="color: black;">write</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;id %s, name [%s]<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span><span style="color: #008000;">id</span>, sel.<span style="color: black;">get_table</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;//table[3].6.2&quot;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> tearDown<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: black;">selenium</span>.<span style="color: black;">stop</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">assertEqual</span><span style="color: black;">&#40;</span><span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>, <span style="color: #008000;">self</span>.<span style="color: black;">verificationErrors</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">&quot;__main__&quot;</span>:
    <span style="color: #dc143c;">unittest</span>.<span style="color: black;">main</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>Basically the order of operations is:</p>
<li>load the part numbers from a file</li>
<li>for each part, go to the &#8220;database&#8221; page and perform a part search</li>
<li>follow the id pattern from the regular expression in the search result</li>
<li>search through the tables in the result page, looking for an item that&#8217;s &#8220;READY&#8221;</li>
<li>follow a ready link to extract more detail and record information in the &#8220;result.txt&#8221; file</li>
<p>It probably isn&#8217;t a great idea to open and close the &#8220;result.txt&#8221; file for each result, but this is a pretty basic example.</p>
]]></content:encoded>
			<wfw:commentRss>http://openinit.com/c/2010/04/26/extracting-data-with-selenium/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>does an open source, versioning, indexed, scalable datastore exist?</title>
		<link>http://openinit.com/c/2010/04/20/datastore/</link>
		<comments>http://openinit.com/c/2010/04/20/datastore/#comments</comments>
		<pubDate>Tue, 20 Apr 2010 15:01:23 +0000</pubDate>
		<dc:creator>rob</dc:creator>
				<category><![CDATA[databases]]></category>

		<guid isPermaLink="false">http://openinit.com/c/?p=445</guid>
		<description><![CDATA[As a software developer, I&#8217;ve spent a large portion of my career dealing with data storage and retrievel. As many will attest, the relational model leaves a lot to be desired. It feels like an outdated format, certainly with the advent of the web, commodity computing, and dynamic languages. We end up using caches and [...]]]></description>
			<content:encoded><![CDATA[<p>As a software developer, I&#8217;ve spent a large portion of my career dealing with data storage and retrievel. As many will attest, the relational model leaves a lot to be desired. It feels like an outdated format, certainly with the advent of the web, commodity computing, and dynamic languages. We end up using caches and search indexes to extract subsets of important data. Sometimes the structure must be denormalized. Or even worse, we have to facilitate proprietary stored procedures and triggers. </p>
<p>Also, the translation from row to object is cumbersome and just doesn&#8217;t seem to completely fit. Object-relational mappers do a fairly good job, but still require a lot of work and tuning. Have you ever tried to update an object containing a child collection using Hibernate? It is not a straight forward operation.</p>
<h2 class="header">what would be better?</h2>
<p>Alternatives have cropped up over the years to fill this void. XML and object databases in the early part of the decade. &#8220;NoSQL&#8221; solutions more recently. I&#8217;ll describe what I would like to see in a &#8220;full featured&#8221; data store in today&#8217;s world and what systems come close to meeting those needs.</p>
<h3 class="subheader">find-grained caching</h3>
<p>Firstly, fine-grained caching or the ability to manage cache content; for example, keeping thousands of user logins cached but only the first fifteen blog posts. Ehcache in conjunction with Hibernate is a good example. In relatively complex applications this is important, you don&#8217;t want to fill memory with items that just aren&#8217;t going to get accessed that often.</p>
<h3 class="subheader">searching</h3>
<p>Next is searching for data. Searching gives us the ability to add <a href="http://en.wikipedia.org/wiki/Faceted_classification">facets</a> and complete user generated lookups.  Lucene is a great example of a full fledged searching product. I don&#8217;t think a data store needs to be quite as feature rich, but it seems like a natural way to find data.</p>
<h3 class="subheader">flexible schema</h3>
<p>Thirdly, a less rigid data definition system would also make sense. In other words, I&#8217;d like to be able to add or remove a column of data without restarting the system. Most relational systems can handle this, but are not well suited for adding columns on-the-fly at the application level.</p>
<h3 class="subheader">versioning / transactions</h3>
<p>Also, I&#8217;d also like to see an &#8220;immutable&#8221; or &#8220;versioning&#8221; approach to follow the current trend of non-locking in the multicore and functional programming worlds. Basically this means a new copy of the &#8220;tuple&#8221; (or at least the changed data) with each save, with &#8220;rollbacks&#8221; occurring during contention situations. This would allow the data to be stored and extracted against a timeline, much like version control software works today. This technique would also create a simple transaction model.</p>
<h3 class="subheader">scaling</h3>
<p>Another important consideration is scaling and redundancy are important. Servers are now commodities. Virtualization and the cloud permit massive scaling. Any new data storage solution has to take this mentality into account.</p>
<h3 class="subheader">security</h3>
<p>Finally, a more robust security model would be beneficial. Most relational databases will allow security at the database, table, and even column. Configuring such security with the application is much like adding or removing columns, it just isn&#8217;t feasible. Maybe creating separate &#8220;stores&#8221; with their own set of security credentials (i.e. a store for each user&#8217;s payment data) might reduce the potential damage an attacker could wield?</p>
<h2 class="header">what else is out there?</h2>
<p><b>Many things</b>. There are a ton of systems, so I will attempt to categorize the most recent &#8220;NoSQL&#8221; solutions and how they fit (or don&#8217;t fit) into the basic needs described above.</p>
<table class="table">
<thead>
<tr class="row">
<th class="tablecolumn">types</th>
<th class="tablecolumn">key-value store</th>
<th class="tablecolumn">document database</th>
<th class="tablecolumn">column database</th>
</tr>
</thead>
<tbody>
<tr class="tablerow">
<td class="tablecolumn">examples</td>
<td class="tablecolumn">Tokyo Cabinet, Redis, Project Voldemort, Amazon S3</td>
<td class="tablecolumn">MongoDB, CouchDB</td>
<td class="tablecolumn">Cassandra</td>
</tr>
<tr class="tablerow">
<td class="tablecolumn">fine-grained caching</td>
<td class="tablecolumn">These systems basically <em>are</em> caches. Typically you can set expiration on an individual key. This really isn&#8217;t a &#8220;datastore&#8221; mentality though, as the data should still persist.</td>
<td class="tablecolumn">Both seem to have their own caching mechanisms internally, but very little user control over what stays in memory. CouchDB does support views, which index and presumably speed data retrieval.</td>
<td class="tablecolumn">No. Though it appears to be a <a href="https://issues.apache.org/jira/browse/CASSANDRA-688">concern</a>.</td>
</tr>
<tr class="tablerow">
<td class="tablecolumn">querying</td>
<td class="tablecolumn">Tokyo Cabinet seems to have simple querying, but it is not (nor should it be) a focus. Applications and data must be designed with a key/value &#8220;tree&#8221; structure in mind.</td>
<td class="tablecolumn">MongoDB has a query mechanism that feels like an ORM style. CouchDB uses javascript but seems to require views and indexes for searches.</td>
<td class="tablecolumn">Apparently there is a <a href="http://github.com/tjake/Lucandra">lucene integration plugin</a>. Although the query language is robust.</td>
</tr>
<tr class="tablerow">
<td class="tablecolumn">schema modification</td>
<td class="tablecolumn">There is no schema to modify, just place your data and get it back.</td>
<td class="tablecolumn">Seems like it. Although I wonder about view regeneration in products like CouchDB.</td>
<td class="tablecolumn">Adding columns requires a restart.</td>
</tr>
<tr class="tablerow">
<td class="tablecolumn">versioning / transactions</td>
<td class="tablecolumn">No versioning, but some of the solutions seem to support transactions. Tokyo Cabinet locks the database on transactions.</td>
<td class="tablecolumn">MongoDB and CouchDB do support basic transactions. Neither support  versioning (see <a href="http://news.ycombinator.com/item?id=926732">here</a>).</td>
<td class="tablecolumn">No.</td>
</tr>
<tr class="tablerow">
<td class="tablecolumn">scaling</td>
<td class="tablecolumn">Many of these options support sharding in the client. But not real time increases in capacity.</td>
<td class="tablecolumn">Yes.</td>
<td class="tablecolumn">Yes.</td>
</tr>
<tr class="tablerow">
<td class="tablecolumn">security</td>
<td class="tablecolumn">Some options allow authentication, but not on a resource level.</td>
<td class="tablecolumn">CouchDB has a very fine grained security model. MongoDB seems a little more limited, thought it was hard to find any information beyond authentication.</td>
<td class="tablecolumn">Cassandra doesn&#8217;t seem to support authorization, <a href="https://issues.apache.org/jira/browse/CASSANDRA-900">yet</a>.</td>
</tr>
</tbody>
</table>
<h2 class="header">the proposal</h2>
<p>Not being able to find a perfect fit to the requirements listed above, I propose a data system that stores and indexes maps of data. A structure like this:<br />
<code><br />
name: Mr. Data<br />
title: Data Master<br />
description: I am a block of data, please don't lose me or my changes<br />
</code></p>
<p>Would be found in this search:<br />
<code><br />
description: bl*<br />
</code></p>
<p>or this search:<br />
<code><br />
title: "Data Master"<br />
</code></p>
<p>and would produce a language specific construct like a java.util.Map or python dictionary. These maps would be packed into lists.</p>
<h3 class="subheader">query system</h3>
<p>Queries would be handled in a functional style with chained manipulations of data. For example:<br />
<code><br />
query("occupation: student").sort().limit(5)<br />
</code></p>
<p>would query for records by occupation, sort the results, and limit the size of the results to five objects. This &#8220;chaining&#8221; would allow for customized extensions.</p>
<h3 class="subheader">saving data</h3>
<p>&#8220;Tuples&#8221; might be directories with metadata files. The directory name would be a unique identifier. So a record with four saves would look like this:<br />
<code><br />
afm2acaa-asd2/<br />
afm2acaa-asd2/metadata<br />
afm2acaa-asd2/4<br />
afm2acaa-asd2/2<br />
afm2acaa-asd2/3<br />
afm2acaa-asd2/1<br />
</code></p>
<p>The metadata file might contain the pointer to the latest save and a data version number:<br />
<code><br />
latest: 4<br />
version: 0.1<br />
</code></p>
<p>If two threads attempted to save new versions concurrently, they would produce temporary files:<br />
<code><br />
afm2acaa-asd2/thread1<br />
afm2acaa-asd2/thread2<br />
</code></p>
<p>The only synchronized code would be updating the &#8220;metadata&#8221; file. If &#8220;thread2&#8243; finished after &#8220;thread1&#8243;, the &#8220;commit&#8221; operation would fail because it referenced an older version. The &#8220;thread1&#8243; changes would be moved into place as:<br />
<code><br />
afm2acaa-asd2/5<br />
</code></p>
<p>and the metadata file could be updated accordingly. This separate file based structure would allow for grepping and easier backups at the file system level. Unfortunately, there would be a large number of files and directories, which could dratamtically impact performance with big datasets. </p>
<h3 class="subheader">caching</h3>
<p>Caching could be optimized by additions to the query parameters. For example:</p>
<p><code><br />
cache("five.students").query("occupation: student").sort().limit(5)<br />
</code></p>
<p>would cache the results of the query under the name &#8220;five.students&#8221;. </p>
<h3 class="subheader">relationships</h3>
<p>In theory, the results could be combined to produce &#8220;relationships&#8221;:<br />
<code><br />
cache("five.students").combine(<br />
    "class", query("occupation: student").limit(5),<br />
    "name", query("type: class")<br />
)<br />
</code></p>
<p>would inject the &#8220;class&#8221; map into the &#8220;student&#8221; where the &#8220;class&#8221; name matches the &#8220;student&#8221; class attribute. There would be no foreign keys, so data consistency would be up to the application.</p>
<h3 class="subheader">replication</h3>
<p>Replication is another thorny problem. I think a &#8220;shard manager&#8221; might be appropriate solution. It would need to run queries against all the database instances and combine the results. Or maybe be configured with some kind of &#8220;lookup&#8221; algorithm. This might also be a good location to handle caching. Of course, this approach would have to be transparent to the client and easily configured. And managers would need to be able to accept new stores on the fly. And we would want to be able to run multiple managers to avoid a single point of failure. </p>
<h3 class="subheader">security and &#8220;stores&#8221;</h3>
<p>Keeping identifying information for &#8220;tuples&#8221; could be handled with subdirectory &#8220;stores&#8221;. So our earlier example would look like this:<br />
<code><br />
b2asco21-rjx2/<br />
b2asco21-rjx2/metadata<br />
b2asco21-rjx2/afm2acaa-asd2/metadata<br />
b2asco21-rjx2/afm2acaa-asd2/4<br />
b2asco21-rjx2/afm2acaa-asd2/2<br />
b2asco21-rjx2/afm2acaa-asd2/3<br />
b2asco21-rjx2/afm2acaa-asd2/1<br />
</code></p>
<p>The store metadata could contain a user, password combination:<br />
<code><br />
user: test@test.com<br />
password: 1234<br />
</code></p>
<p>or even an private key:<br />
<code><br />
certificate: ...<br />
</code></p>
<p>A client would have to provide credentials before accessing the store data or caches.</p>
<h2 class="header">what do you think?</h2>
<p>Is this a boneheaded idea? Is there something already out there? Maybe this approach lacks features that would complete the &#8220;picture&#8221;? If this <em>is</em> actually interesting to you, please let me know. Maybe we can form a new project around it.</p>
]]></content:encoded>
			<wfw:commentRss>http://openinit.com/c/2010/04/20/datastore/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>insert lines with sed</title>
		<link>http://openinit.com/c/2010/04/13/insert-lines-with-sed/</link>
		<comments>http://openinit.com/c/2010/04/13/insert-lines-with-sed/#comments</comments>
		<pubDate>Tue, 13 Apr 2010 19:16:28 +0000</pubDate>
		<dc:creator>rob</dc:creator>
				<category><![CDATA[unix]]></category>

		<guid isPermaLink="false">http://openinit.com/c/?p=429</guid>
		<description><![CDATA[On occasion, I need to bulk insert text at specific locations in certain files. Sometimes at a line number. More often at a location matching a pattern. Let&#8217;s say you had an original file: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 text [...]]]></description>
			<content:encoded><![CDATA[<p>On occasion, I need to bulk insert text at specific locations in certain files. Sometimes at a line number. More often at a location matching a pattern. Let&#8217;s say you had an original file:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
</pre></td><td class="code"><pre class="txt" style="font-family:monospace;">text text text 
&nbsp;
keyword
&nbsp;
text text text 
text text text 
text text text 
text text text 
text text text 
text text text 
text text text 
&nbsp;
insert below
&nbsp;
text text text 
text text text 
text text text 
text text text 
text text text 
text text text</pre></td></tr></table></div>

<p>and you needed to insert text from file &#8220;ins.txt&#8221; at the location &#8220;insert below&#8221;. All you need to do is run this series of commands:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">find</span> . <span style="color: #660033;">-name</span> <span style="color: #ff0000;">&quot;*.txt&quot;</span> <span style="color: #660033;">-exec</span> <span style="color: #c20cb9; font-weight: bold;">grep</span> <span style="color: #660033;">-l</span> keyword <span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #7a0874; font-weight: bold;">&#125;</span> \; <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">xargs</span> <span style="color: #c20cb9; font-weight: bold;">sed</span> <span style="color: #660033;">-i</span> <span style="color: #ff0000;">'/insert below/r ins.txt'</span></pre></td></tr></table></div>

<p>Basically this command interprets as:</p>
<p>1. in the current directory, find all the files with extension &#8220;txt&#8221;<br />
2. from those files, find only the ones containing the phrase &#8220;keyword&#8221;<br />
2. insert the text from &#8220;ins.txt&#8221; below the phrase &#8220;insert below&#8221; in those matching files</p>
<p>Of course regular expressions work across the board for this type of operation, with both the &#8220;find&#8221; and &#8220;grep&#8221; portions.</p>
]]></content:encoded>
			<wfw:commentRss>http://openinit.com/c/2010/04/13/insert-lines-with-sed/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>complete ws-security example with x509 certificates</title>
		<link>http://openinit.com/c/2010/03/31/complete-ws-security-example/</link>
		<comments>http://openinit.com/c/2010/03/31/complete-ws-security-example/#comments</comments>
		<pubDate>Wed, 31 Mar 2010 16:01:02 +0000</pubDate>
		<dc:creator>rob</dc:creator>
				<category><![CDATA[java]]></category>

		<guid isPermaLink="false">http://openinit.com/c/?p=326</guid>
		<description><![CDATA[After a few days of wrangling with the JAX-WS implementation baked into Java 6, this post is intended to streamline certain processes that didn&#8217;t seem straight forward. Using the annotations and ant tasks was not difficult, but &#8220;extension&#8221; options required quite a bit of tweaking and guesswork. For this project our needs were fairly basic: [...]]]></description>
			<content:encoded><![CDATA[<p>After a few days of wrangling with the JAX-WS implementation baked into Java 6, this post is intended to streamline certain processes that didn&#8217;t seem straight forward. Using the annotations and ant tasks was not difficult, but &#8220;extension&#8221; options required quite a bit of tweaking and guesswork.</p>
<p>For this project our needs were fairly basic: sensitive information and functionality needed to be isolated from our production applications, to safeguard from a potentially compromised server. Web Services wouldn&#8217;t typically be a first choice, considering options like REST or even a simple servlet, but integration with the standard runtime gave hope that they would be easier to manage and might provide some additional utilities for free.</p>
<p>The architecture is simple, a few JPA annotated beans managed by hibernate, utility classes, and a single service class to expose the functionality. We use Jetty for most of our deployments, seeing JBoss or even Tomcat as adding unnecessary complexity. </p>
<h2 class="header">getting started </h2>
<p>Adding the appropriate JSR-181 annotations was not difficult, the only &#8220;tweaking&#8221; we did was to manipulate the default namespace with the &#8220;targetNamespace&#8221; attribute on the &#8220;WebService&#8221; annotation. We also decorated the methods we wanted exposed as services, although that might not be a necessary step.</p>
<p>To begin deployment, we first used the jars from the <a href="https://jax-ws.dev.java.net/">GlassFish JAX-WS reference implementation</a>. The &#8220;WSServlet&#8221; class is included, which can be configured in web.xml to handle incoming requests. For basic functionality, the reference implementation worked well. It would even generate a WSDL on the fly, reducing much maligned configuration hassles. </p>
<p>We had to add a &#8220;sun-jaxws.xml&#8221; file to our &#8220;WEB-INF&#8221; directory:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code"><pre class="xml" style="font-family:monospace;"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;?xml</span> <span style="color: #000066;">version</span>=<span style="color: #ff0000;">&quot;1.0&quot;</span> <span style="color: #000066;">encoding</span>=<span style="color: #ff0000;">&quot;UTF-8&quot;</span><span style="color: #000000; font-weight: bold;">?&gt;</span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;endpoints</span> <span style="color: #000066;">version</span>=<span style="color: #ff0000;">&quot;2.0&quot;</span> <span style="color: #000066;">xmlns</span>=<span style="color: #ff0000;">&quot;http://java.sun.com/xml/ns/jax-ws/ri/runtime&quot;</span><span style="color: #000000; font-weight: bold;">&gt;</span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;endpoint</span> <span style="color: #000066;">implementation</span>=<span style="color: #ff0000;">&quot;(full class name)&quot;</span> <span style="color: #000066;">name</span>=<span style="color: #ff0000;">&quot;calculator&quot;</span> </span>
<span style="color: #009900;">    <span style="color: #000066;">url-pattern</span>=<span style="color: #ff0000;">&quot;/ws/Calculator&quot;</span><span style="color: #000000; font-weight: bold;">/&gt;</span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/endpoints<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></pre></td></tr></table></div>

<p>After firing up Jetty, we used <a href="https://addons.mozilla.org/en-US/firefox/addon/2691">poster</a> to test the service, by POSTing a simple SOAP request similar to this:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
</pre></td><td class="code"><pre class="xml" style="font-family:monospace;"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;?xml</span> <span style="color: #000066;">version</span>=<span style="color: #ff0000;">&quot;1.0&quot;</span><span style="color: #000000; font-weight: bold;">?&gt;</span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;soapenv:Envelope</span> <span style="color: #000066;">xmlns:soapenv</span>=<span style="color: #ff0000;">&quot;http://schemas.xmlsoap.org/soap/envelope/&quot;</span></span>
<span style="color: #009900;">    <span style="color: #000066;">xmlns:xsd</span>=<span style="color: #ff0000;">&quot;http://www.w3.org/2001/XMLSchema&quot;</span><span style="color: #000000; font-weight: bold;">&gt;</span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;soapenv:Body<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;calulator:add</span> <span style="color: #000066;">xmlns:calculator</span>=<span style="color: #ff0000;">&quot;http://calculator.fake.org/&quot;</span><span style="color: #000000; font-weight: bold;">&gt;</span></span>
      <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;first<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>1<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/first<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
      <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;second<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>2<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/second<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/calculator:add<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/soapenv:Body<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/soapenv:Envelope<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></pre></td></tr></table></div>

<p>Here is a portion of our web.xml:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
</pre></td><td class="code"><pre class="xml" style="font-family:monospace;"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;?xml</span> <span style="color: #000066;">version</span>=<span style="color: #ff0000;">&quot;1.0&quot;</span> <span style="color: #000066;">encoding</span>=<span style="color: #ff0000;">&quot;UTF-8&quot;</span><span style="color: #000000; font-weight: bold;">?&gt;</span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;web-app</span> <span style="color: #000066;">version</span>=<span style="color: #ff0000;">&quot;2.5&quot;</span> <span style="color: #000066;">xmlns</span>=<span style="color: #ff0000;">&quot;http://java.sun.com/xml/ns/javaee&quot;</span> </span>
<span style="color: #009900;">    <span style="color: #000066;">xmlns:xsi</span>=<span style="color: #ff0000;">&quot;http://www.w3.org/2001/XMLSchema-instance&quot;</span> </span>
<span style="color: #009900;">    <span style="color: #000066;">xsi:schemaLocation</span>=<span style="color: #ff0000;">&quot;http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd&quot;</span><span style="color: #000000; font-weight: bold;">&gt;</span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;display-name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>ws<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/display-name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;listener<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;listener-class<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
      com.sun.xml.ws.transport.http.servlet.WSServletContextListener<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/listener-class<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/listener<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;servlet<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;servlet-name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>ws<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/servlet-name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;servlet-class<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>com.sun.xml.ws.transport.http.servlet.WSServlet<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/servlet-class<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;load-on-startup<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>1<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/load-on-startup<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/servlet<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;servlet-mapping<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;servlet-name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>ws<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/servlet-name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;url-pattern<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>/ws/*<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/url-pattern<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/servlet-mapping<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;session-config<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;session-timeout<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>30<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/session-timeout<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/session-config<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/web-app<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></pre></td></tr></table></div>

<h2 class="header">adding security</h2>
<p>When we moved beyond the basics, requiring authentication to secure our service, it quickly became apparent we would need to switch to the <a href="https://metro.dev.java.net/">full metro stack</a>. Metro version 1.4 operated differently in some places; for instance, any classes backed by interfaces had duplicate &#8220;mesage&#8221;, &#8220;portType&#8221;, and &#8220;binding&#8221; elements when using the &#8220;wsgen&#8221; utility.</p>
<p>We decided x509 &#8220;mutual certificate&#8221; security would meet our needs by both encrypting messages and requiring a trusted key for authentication. Unfortunately, there are no annotations to configure this service, the WSDL files would now have to be manually updated. This seems to defeat the original purpose of annotations, but was necessary to get the &#8220;free&#8221; WSIT security features. To use predefined WSDL files, we added &#8220;wsdl&#8221; attributes to the appropriate services in &#8220;sun-jaxws.xml&#8221;:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code"><pre class="xml" style="font-family:monospace;"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;?xml</span> <span style="color: #000066;">version</span>=<span style="color: #ff0000;">&quot;1.0&quot;</span> <span style="color: #000066;">encoding</span>=<span style="color: #ff0000;">&quot;UTF-8&quot;</span><span style="color: #000000; font-weight: bold;">?&gt;</span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;endpoints</span> <span style="color: #000066;">version</span>=<span style="color: #ff0000;">&quot;2.0&quot;</span> <span style="color: #000066;">xmlns</span>=<span style="color: #ff0000;">&quot;http://java.sun.com/xml/ns/jax-ws/ri/runtime&quot;</span><span style="color: #000000; font-weight: bold;">&gt;</span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;endpoint</span> <span style="color: #000066;">implementation</span>=<span style="color: #ff0000;">&quot;(full class name)&quot;</span> <span style="color: #000066;">name</span>=<span style="color: #ff0000;">&quot;calculator&quot;</span> </span>
<span style="color: #009900;">      <span style="color: #000066;">url-pattern</span>=<span style="color: #ff0000;">&quot;/ws/Calculator&quot;</span> <span style="color: #000066;">wsdl</span>=<span style="color: #ff0000;">&quot;WEB-INF/wsdl/CalculatorService.wsdl&quot;</span><span style="color: #000000; font-weight: bold;">/&gt;</span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/endpoints<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></pre></td></tr></table></div>

<p>We used Netbeans to generate a sample security configuration that was transplanted into an existing WSDL, following the steps in this <a href="https://metro.dev.java.net/guide/Example_Applications.html#ahiem">link</a>. The &#8220;wsp:Policy&#8221; element and it&#8217;s children at the bottom of the generated WSDL needed to be added, along with the correct namespace references from the top level &#8220;definitions&#8221; element. Here is an example of integrating the security references in the binding section of the document (see generated WSDL for examples). This:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code"><pre class="xml" style="font-family:monospace;"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;operation</span> <span style="color: #000066;">name</span>=<span style="color: #ff0000;">&quot;register&quot;</span><span style="color: #000000; font-weight: bold;">&gt;</span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;soap:operation</span> <span style="color: #000066;">soapAction</span>=<span style="color: #ff0000;">&quot;&quot;</span> <span style="color: #000000; font-weight: bold;">/&gt;</span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;input<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;soap:body</span> <span style="color: #000066;">use</span>=<span style="color: #ff0000;">&quot;literal&quot;</span> <span style="color: #000000; font-weight: bold;">/&gt;</span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/input<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;output<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></pre></td></tr></table></div>

<p>will become:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="code"><pre class="xml" style="font-family:monospace;"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;operation</span> <span style="color: #000066;">name</span>=<span style="color: #ff0000;">&quot;register&quot;</span><span style="color: #000000; font-weight: bold;">&gt;</span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;soap:operation</span> <span style="color: #000066;">soapAction</span>=<span style="color: #ff0000;">&quot;&quot;</span><span style="color: #000000; font-weight: bold;">/&gt;</span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;input<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;soap:body</span> <span style="color: #000066;">use</span>=<span style="color: #ff0000;">&quot;literal&quot;</span><span style="color: #000000; font-weight: bold;">/&gt;</span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;wsp:PolicyReference</span> <span style="color: #000066;">URI</span>=<span style="color: #ff0000;">&quot;#CalculatorPortBinding_Input_Policy&quot;</span><span style="color: #000000; font-weight: bold;">/&gt;</span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/input<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;output<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;soap:body</span> <span style="color: #000066;">use</span>=<span style="color: #ff0000;">&quot;literal&quot;</span><span style="color: #000000; font-weight: bold;">/&gt;</span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;wsp:PolicyReference</span> <span style="color: #000066;">URI</span>=<span style="color: #ff0000;">&quot;#CalculatorPortBinding_Output_Policy&quot;</span><span style="color: #000000; font-weight: bold;">/&gt;</span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/output<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/operation<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></pre></td></tr></table></div>

<p>Netbeans can also be used to generate the client WSDL files (i.e. &#8220;Calculator.xml&#8221;) and &#8220;wsit-client.xml&#8221; configuration file. See <a href="http://netbeans.org/kb/61/websvc/client.html">here</a>. </p>
<h2 class="header">testing it out</h2>
<p>Now we needed to generate keystores and client libraries to test the security configuration. Creating keystores for metro was confusing, apparently they had to be generated with &#8220;SubjectKeyIdentifier&#8221;s as described <a href="http://www.jroller.com/gmazza/entry/using_openssl_to_create_certificates">here</a>. In a nutshell, we used this script:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
</pre></td><td class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">#!/bin/sh</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># create service keystore</span>
<span style="color: #c20cb9; font-weight: bold;">rm</span> <span style="color: #000000; font-weight: bold;">*</span>.p12 <span style="color: #000000; font-weight: bold;">*</span>.pem <span style="color: #000000; font-weight: bold;">*</span>.jks <span style="color: #000000; font-weight: bold;">*</span>.cer
openssl req <span style="color: #660033;">-x509</span> <span style="color: #660033;">-days</span> <span style="color: #000000;">3650</span> <span style="color: #660033;">-newkey</span> rsa:<span style="color: #000000;">1024</span> <span style="color: #660033;">-keyout</span> servicekey.pem \
    <span style="color: #660033;">-out</span> servicecert.pem <span style="color: #660033;">-passout</span> pass:changeit
openssl pkcs12 <span style="color: #660033;">-export</span> <span style="color: #660033;">-inkey</span> servicekey.pem <span style="color: #660033;">-in</span> servicecert.pem <span style="color: #660033;">-out</span> service.p12 \
    <span style="color: #660033;">-name</span> myservicekey <span style="color: #660033;">-passin</span> pass:changeit <span style="color: #660033;">-passout</span> pass:changeit
keytool <span style="color: #660033;">-importkeystore</span> <span style="color: #660033;">-destkeystore</span> servicestore.jks <span style="color: #660033;">-deststorepass</span> changeit \
    <span style="color: #660033;">-srckeystore</span> service.p12 <span style="color: #660033;">-srcstorepass</span> changeit <span style="color: #660033;">-srcstoretype</span> pkcs12
keytool <span style="color: #660033;">-list</span> <span style="color: #660033;">-keystore</span> servicestore.jks <span style="color: #660033;">-storepass</span> changeit <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">less</span>
keytool <span style="color: #660033;">-exportcert</span> <span style="color: #660033;">-alias</span> myservicekey <span style="color: #660033;">-storepass</span> changeit 
    <span style="color: #660033;">-keystore</span> servicestore.jks <span style="color: #660033;">-file</span> service.cer
keytool <span style="color: #660033;">-printcert</span> <span style="color: #660033;">-file</span> service.cer <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">less</span>
<span style="color: #c20cb9; font-weight: bold;">rm</span> <span style="color: #000000; font-weight: bold;">*</span>.pem <span style="color: #000000; font-weight: bold;">*</span>.p12
&nbsp;
<span style="color: #666666; font-style: italic;"># create client keystore</span>
openssl req <span style="color: #660033;">-x509</span> <span style="color: #660033;">-days</span> <span style="color: #000000;">3650</span> <span style="color: #660033;">-newkey</span> rsa:<span style="color: #000000;">1024</span> <span style="color: #660033;">-keyout</span> clientkey.pem \
    <span style="color: #660033;">-out</span> clientcert.pem <span style="color: #660033;">-passout</span> pass:changeit
openssl pkcs12 <span style="color: #660033;">-export</span> <span style="color: #660033;">-inkey</span> clientkey.pem <span style="color: #660033;">-in</span> clientcert.pem <span style="color: #660033;">-out</span> client.p12 \
    <span style="color: #660033;">-name</span> myclientkey <span style="color: #660033;">-passin</span> pass:changeit <span style="color: #660033;">-passout</span> pass:changeit
keytool <span style="color: #660033;">-importkeystore</span> <span style="color: #660033;">-destkeystore</span> clientstore.jks <span style="color: #660033;">-deststorepass</span> changeit \
    <span style="color: #660033;">-srckeystore</span> client.p12 <span style="color: #660033;">-srcstorepass</span> changeit <span style="color: #660033;">-srcstoretype</span> pkcs12
keytool <span style="color: #660033;">-list</span> <span style="color: #660033;">-keystore</span> clientstore.jks <span style="color: #660033;">-storepass</span> changeit <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">less</span>
keytool <span style="color: #660033;">-exportcert</span> <span style="color: #660033;">-alias</span> myclientkey <span style="color: #660033;">-storepass</span> changeit <span style="color: #660033;">-keystore</span> clientstore.jks \
     <span style="color: #660033;">-file</span> client.cer
keytool <span style="color: #660033;">-printcert</span> <span style="color: #660033;">-file</span> client.cer <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">less</span>
<span style="color: #c20cb9; font-weight: bold;">rm</span> <span style="color: #000000; font-weight: bold;">*</span>.pem <span style="color: #000000; font-weight: bold;">*</span>.p12
&nbsp;
<span style="color: #666666; font-style: italic;"># add certificates to corresponding stores</span>
keytool <span style="color: #660033;">-import</span> <span style="color: #660033;">-noprompt</span> <span style="color: #660033;">-trustcacerts</span> <span style="color: #660033;">-alias</span> myclientkey <span style="color: #660033;">-file</span> client.cer \
    <span style="color: #660033;">-keystore</span> servicestore.jks <span style="color: #660033;">-storepass</span> changeit
keytool <span style="color: #660033;">-import</span> <span style="color: #660033;">-noprompt</span> <span style="color: #660033;">-trustcacerts</span> <span style="color: #660033;">-alias</span> myservicekey <span style="color: #660033;">-file</span> service.cer \ 
    <span style="color: #660033;">-keystore</span> clientstore.jks <span style="color: #660033;">-storepass</span> changeit
<span style="color: #c20cb9; font-weight: bold;">rm</span> <span style="color: #000000; font-weight: bold;">*</span>.cer</pre></td></tr></table></div>

<p>The resulting stores had references to their paired certificates, so attempts to connect with other client certificates would not work. We also had to create client code, using the following ant target:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
</pre></td><td class="code"><pre class="xml" style="font-family:monospace;"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;target</span> <span style="color: #000066;">name</span>=<span style="color: #ff0000;">&quot;ws.import&quot;</span> <span style="color: #000066;">depends</span>=<span style="color: #ff0000;">&quot;build.etc.path&quot;</span><span style="color: #000000; font-weight: bold;">&gt;</span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;taskdef</span> <span style="color: #000066;">name</span>=<span style="color: #ff0000;">&quot;wsimport&quot;</span> <span style="color: #000066;">classpathref</span>=<span style="color: #ff0000;">&quot;build.classpath&quot;</span></span>
<span style="color: #009900;">      <span style="color: #000066;">classname</span>=<span style="color: #ff0000;">&quot;com.sun.tools.ws.ant.WsImport&quot;</span><span style="color: #000000; font-weight: bold;">/&gt;</span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;wsimport</span> <span style="color: #000066;">debug</span>=<span style="color: #ff0000;">&quot;true&quot;</span> <span style="color: #000066;">destdir</span>=<span style="color: #ff0000;">&quot;${build.dir}/classes&quot;</span></span>
<span style="color: #009900;">      <span style="color: #000066;">wsdl</span>=<span style="color: #ff0000;">&quot;${ws.wsdl}&quot;</span> <span style="color: #000066;">keep</span>=<span style="color: #ff0000;">&quot;false&quot;</span> <span style="color: #000066;">package</span>=<span style="color: #ff0000;">&quot;${ws.package}&quot;</span><span style="color: #000000; font-weight: bold;">&gt;</span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;produces</span> <span style="color: #000066;">dir</span>=<span style="color: #ff0000;">&quot;${build.dir}/classes/${ws.build.dir}&quot;</span> <span style="color: #000066;">includes</span>=<span style="color: #ff0000;">&quot;*&quot;</span> <span style="color: #000000; font-weight: bold;">/&gt;</span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/wsimport<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/target<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></pre></td></tr></table></div>

<h2 class="header">making the client flexible</h2>
<p>Finally, we wanted to configure the client to work in our QA and Live environments, using variable trust stores and service endpoints. The &#8220;wsp:Policy&#8221; element at the bottom of the client WSDL (a.k.a Calculator.xml):</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
</pre></td><td class="code"><pre class="xml" style="font-family:monospace;"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;wsp:Policy</span> <span style="color: #000066;">wsu:Id</span>=<span style="color: #ff0000;">&quot;CalculatorPortBindingPolicy&quot;</span><span style="color: #000000; font-weight: bold;">&gt;</span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;wsp:ExactlyOne<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;wsp:All<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
      <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;sc:KeyStore</span> ...<span style="color: #000000; font-weight: bold;">/&gt;</span></span>
      <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;sc:TrustStore</span> ...<span style="color: #000000; font-weight: bold;">/&gt;</span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/wsp:All<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/wsp:ExactlyOne<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/wsp:Policy<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></pre></td></tr></table></div>

<p>can be modified to look for a properties file:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code"><pre class="xml" style="font-family:monospace;"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;wsp:Policy</span> <span style="color: #000066;">wsu:Id</span>=<span style="color: #ff0000;">&quot;CalculatorPortBindingPolicy&quot;</span><span style="color: #000000; font-weight: bold;">&gt;</span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;wsp:ExactlyOne<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;wsp:All<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><span style="color: #808080; font-style: italic;">&lt;!-- handled by client-security-env.properties --&gt;</span><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/wsp:All<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/wsp:ExactlyOne<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/wsp:Policy<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></pre></td></tr></table></div>

<p>A  file &#8220;client-security-env.properties&#8221; was required in the classpath, that had similar entries:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code"><pre class="ini" style="font-family:monospace;">keystore.url<span style="color: #000066; font-weight:bold;">=</span><span style="color: #660066;">etc/clientstore.jks</span>
keystore.type<span style="color: #000066; font-weight:bold;">=</span><span style="color: #660066;">JKS</span>
keystore.password<span style="color: #000066; font-weight:bold;">=</span><span style="color: #660066;">changeit</span>
my.alias<span style="color: #000066; font-weight:bold;">=</span><span style="color: #660066;">myclientkey</span>
truststore.url<span style="color: #000066; font-weight:bold;">=</span><span style="color: #660066;">etc/clientstore.jks</span>
truststore.type<span style="color: #000066; font-weight:bold;">=</span><span style="color: #660066;">JKS</span>
truststore.password<span style="color: #000066; font-weight:bold;">=</span><span style="color: #660066;">changeit</span></pre></td></tr></table></div>

<p>You can also modify the endpoint URL during the instantiation of service with code similar to:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
</pre></td><td class="code"><pre class="java" style="font-family:monospace;">java.<span style="color: #006633;">net</span>.<span style="color: #003399;">URL</span> url <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">URL</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;http://localhost:8080/ws/calculator&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
javax.<span style="color: #006633;">xml</span>.<span style="color: #006633;">namespace</span>.<span style="color: #006633;">QName</span> qname <span style="color: #339933;">=</span> 
    <span style="color: #000000; font-weight: bold;">new</span> QName<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;http://calculator.fake.org/&quot;</span>, <span style="color: #0000ff;">&quot;CalculatorService&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
Calculator calc <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> CalculatorService<span style="color: #009900;">&#40;</span>url, qname<span style="color: #009900;">&#41;</span>.<span style="color: #006633;">getCalculatorPort</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<h2 class="header">additional options</h2>
<p>We were getting some strange stack traces during the client configuration from to references to a <a href="http://schemas.xmlsoap.org/ws/2004/09/mex/">&#8220;MEX&#8221;</a> service. &#8220;MEX&#8221; was enabled by adding this line:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
</pre></td><td class="code"><pre class="java" style="font-family:monospace;"><span style="color: #339933;">&lt;</span>endpoint name<span style="color: #339933;">=</span><span style="color: #0000ff;">&quot;sts_mex&quot;</span> url<span style="color: #339933;">-</span>pattern<span style="color: #339933;">=</span><span style="color: #0000ff;">&quot;/ws/calculator/mex&quot;</span>
  implementation<span style="color: #339933;">=</span><span style="color: #0000ff;">&quot;com.sun.xml.ws.mex.server.MEXEndpoint&quot;</span> 
  binding<span style="color: #339933;">=</span><span style="color: #0000ff;">&quot;http://www.w3.org/2003/05/soap/bindings/HTTP/&quot;</span><span style="color: #339933;">/&gt;</span></pre></td></tr></table></div>

<p>to sun-jaxws.xml. Unfortunately, we still got stack traces and had to download and modify &#8220;com/sun/xml/ws/mex/client/MetadataClient.java&#8221;. There was a &#8220;suffixes&#8221; String array that need to have it&#8217;s elements switched from { &#8220;&#8221;, &#8220;mex&#8221; } to { &#8220;mex&#8221; , &#8220;&#8221; }. We then packaged the resulting class with our client code.</p>
<p>For peace of mind, we also used <a href="http://www.wireshark.org/">Wireshark</a> to verify that the SOAP messages were actually being encrypted.</p>
]]></content:encoded>
			<wfw:commentRss>http://openinit.com/c/2010/03/31/complete-ws-security-example/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

