Thursday, September 6, 2012

Open-Source and the 80/20 Rule

I have any number of articles that I have drafted and not published.  Why, because sometimes it is cathartic to rant privately and get over it.  Sometimes though it needs to be said.  This is one of those times...

As a consultant, I typically work with organizations with very high system availability and resiliency requirements or SLAs.  These organizations are in the 20th or 10th or even the 1st percentile of market.  Enterprise organizations that are striving for some level of zero down time, immediate recovery and zero data loss on high-volume systems.  These same organizations are also starting to embrace Free and Open-Source Software (FOSS) to replace their current Commercial Off-The-Shelf (COTS) product.  And after they have completed 90% of their new project using FOSS I am typically called in and asked to solve the following: How do I make this "free" stuff meet my current SLA needs.

My answer is typically "you can't".

How did we get here?

It usually starts with a developer or architect that has been given the green light to develop a new application with FOSS.  They read articles and see documented capabilities that will help them avoid writing large amounts of code to gain the same functionality.  They download the product and run through some demos and see that bam, delivers as expected.  Then they start the prototype...

The prototype is done in a vacuum, many times on a desktop.  An environment that can in no way replicate the daily stress of a production environment.  Then this person demonstrates their prototype to management, extrapolates some numbers and gets green lighted.  And the project takes off.

Then after months of development they drop their newly minted product into a test environment and it doesn't come close to failing over fast enough or recovering fast enough or performing fast enough.  They tweek and nob-turn and then they call.

"This product is awful.  Come make this work!"

I typically arrive and find that there is little I can do for them in the short term.  You know, turn a nob and all is well.  Unfortunately the issue is really a misapplication of the capability or the capability just isn't working as expected.  So while it has shaved significant amounts of time off of the development cycle the capability is now under tremendous stress and cracks began to show.  Why did it work when it was in my prototype and not in production?  Typically it is because you are using a product that has been written by the masses, for the masses or the other 80 percent of the market.

That doesn't mean it won't work.  It is just important to keep in mind that FOSS projects are almost always on the bleeding edge of their given scope.  They are in a perpetual state of motion, with that motion being forward, and as new capabilities are added it takes time before they are fully baked.  Remember, someone out there with the greatest of intents and a genuine desire to help has added that handy new widget, one that you fully expect to make a living on.  Sometimes it is a single use-case, sometimes it is a full-blown API.  If it is a new capability, hopefully someone other than yourself needed it and went through the pains of hardening it for those in the 20th percentile.  If not, it is now up to you and/or me.  And you should never assume.

So how do you avoid this scenario?

Get Educated

Don't just read some articles and run through some demos.  Download the sources and familiarize yourself with the unit tests that cover the capability you were planning on using.  There are times you will find that what is being tested is not completely in line with what is documented.  Again, FOSS is in a constant state of motion so there are going to be times when what is on paper and what is in the codebase don't align.

Get Involved

Are all your use cases covered?  If not, write some and test it out.  Make sure it works the way you were expecting.  And when you are done, offer those test cases back to the community so others don't have to go through what you just did.  Also, with all those new test cases you have just written, if you find an issue report it and follow through.  The last part of that is so important.  I see so many people jump in and say they have an issue or open up a Jira ticket and then just disappear.  There is no indication as to whether the problem was solved or not.  I know it takes time but you chose to use the free product.  So don't be surprised when you are expected to give back.

Get Help

Finally, get those with experience involved early.  Don't be afraid to ask questions on the mailing lists.  You will get answers from a great many qualified folks who love to help.  Its in their nature.  They spend their free time writing software to give a way for free.  That being the case I bet they would love to sit and talk with you about it.

There are also commercial organizations that provide services and support for all sorts of FOSS products.  They have highly technical folks who are specialized in FOSS and how to get the best out of it.  Spending a week with one of them in the early stages of your application may save you many weeks or months of issues later.

Having worked with a great number of developers on open-source projects like ActiveMQ, Camel, CXF, ServiceMix and Karaf, I have gotten to know these folks and one thing should be understood, they are very good at what they do.  But there are only so many of them out there and so many use cases to cover.  They need you to be involved to deliver that first class quality product.

Besides it is in your own best interest.  Your application may be dependent on it.

1 comment: