Recent entries
Recent Comments

David Rice

Mingle, Automated Testing, and Cruise

The Mingle team is absolutely crazy about testing and automation. Outside of testing installer integrity and production data migrations, we've totally replaced the manual regression test phase of the release cycle with a massive suite of automated acceptance tests. I'm mostly going to write about how the Mingle team has overcome some of the difficulties teams encounter when working with large acceptance suites, but, before I do that, I want to talk about our attitude towards testing. Why? Because without the right approach to testing, automation is pretty much useless.

Mingle testers start working with new features even before they are built. They participate in the requirement and scope discussions. When development starts, they sit right there with the developers, analysts, and product manager. They do not go up to the 2nd floor to work on 327 Excel worksheets full of test matrices based upon some pre-written specification. Rather, they start testing the actual feature as soon as development begins. This allows our testers to be extremely informed in their testing of features. They're actually talking to the team and they're using the development builds. They have tremendous insight into what's most important and what's most likely to break. They explore the right areas of the application in all the right ways. This allows them to write the correct automated tests. It's one thing to say you automate tests, it's quite another to build a robust, well-organized, well-targeted automated suite that actually gives the entire team confidence. We've done that.

So, back to Mingle's acceptance suite... We use Selenium RC, with tests written in Ruby. At last check, the suite was over 31,000 LOC and required over 12 hours of execution time for a given platform. Multiply that by the number of platforms we support (database, operating system, Java version, ...) and you see a real problem. How do we get feedback to the team in a timely fashion. If it takes a day to get feedback on a commit, the CI (Continuous Integration) cycle is completely broken. The team will stop authoring and maintaining tests. This is always the rub with a large end-to-end acceptance suite. It is slow. The team stops caring. CI practices break down.

The first step for us to get faster feedback was to break the tests up into smaller suites that could be run in parallel. This presented two immediate problems. First, how would we maintain the suites? Second, what CI tool would be able to run the suites in parallel and then roll them up into a single result?

The suite maintenance problem we knew how to tackle. One of our testers had already implemented a means of tagging tests such that we could associate our tests with specific parts of the domain. This made it possible for a developer or tester to run small, targeted sets of tests that correlated to a specific feature. So long as we appropriately tag tests we will always have nice options for grouping tests. To extend this concept to our CI build, we just wrote a little bit of rake code that would take advantage of those same tags to build sensible suites at runtime.

Now, how to run these suites in a CI build? Initially, for each test platform, we setup 8 CC (open source CruiseControl) boxes, each running a different acceptance target. Each of those boxes was writing results to a shared CC Dashboard. 12 hours of tests split across 8 boxes was getting us feedback in approximately 90 minutes for the entire acceptance suite.

This was certainly a huge improvement, but, as you might imagine, this was far from ideal. The different CC projects were building independently and never really rolled up into a single result. There was also no guarantee that the different projects ever ran against the same source. There was quite a bit of detective work when determining a green revision or exactly when the application broke. And we were maintaining a separate CC project (along with its very own server) for each individual acceptance target. And all of this multiplied by the number of platforms we were targeting. Clearly, our CI infrastructure was not good enough.

Around 2 months ago, the Mingle team moved its CI build to new internal Studios build grid that was running early versions of our newest product, Cruise. Given the challenges of the existing infrastructure, you can imagine that the team was quite excited to move to the new product. Today, I can confidently state that Cruise changes the game in a big way for teams running large acceptance suites. Cruise's concepts of pipelines, stages, and jobs allowed us to implement our CI build exactly as we see it. There are no kludges.

To start, we defined a pipeline where our 'Acceptance' stage only runs after the 'Units and Functionals' stage has passed. Cruise makes this part quite easy. We do not need to hack around with special build targets and tasks outside of the tool to implement the pipeline concept. Cruise supports the pipeline as a first-class concept. Better yet, in our 'Acceptance' stage, we can specify any number of the smaller, split-out acceptance suites as the jobs from which that stage is composed. Cruise manages executing those jobs in parallel and rolls up the results to a single result at build plan level. For a given pipeline run, everything builds against the same source. We're no longer wasting time manually rolling up a bunch of loosely-tied results. Cruise is managing and reporting on our build exactly as we need.

See which stages have failed in the project pipeline
Pipelines in Cruise

We've also placed a manually triggered 'Installers' stage after the 'Acceptance' stage in our pipeline. Any team member can look for a label where the 'Acceptance' stage has passed and kick-off the 'Installers' stage. This will make those installers available to the entire team.

The Cruise Grid is a huge advantage as well. Agent resource tagging allows us to more efficiently share the agent hardware across target platforms, e.g., if an agent is tagged as providing 'Ubuntu' and 'Firefox 3' it can pick up any scheduled job that requires those resources. And when the suites grow larger, or we want to test more platforms, it's fairly trivial for us to add agents to the grid and increase our capacity.

The Mingle team is extremely excited about the new Cruise product. This post only begins to dive into what Cruise offers. As you move forward with the pipeline concept and manual gates it's quite easy to see how Cruise can extend typical CI activities all the way into the deployment phase. And the dynamic build properties and RESTful API will allow emergent usages we've yet to even envision.

Tags :

Comments > (HTML is allowed)

  1. Lance Fisher
    July 30th, 2008 @ 10:15 PM

    Cruise looks really cool. I like that it incorporates the deployment into the build process. I downloaded it yesterday, and I'm planning on trying it out soon. I'm currently using MbUnit for my .NET unit tests, and running them on my CI server. Does Cruise support running MbUnit tests? What .NET test suites does it support?

  2. David
    July 31st, 2008 @ 08:17 AM

    Lance - Cruise supports a nant builder so you should be fine. If you're not using nant, Cruise also supports an exec (cmd line) builder.

  3. Bryan
    August 3rd, 2008 @ 03:15 AM

    Great post and its encouraging to hear that creating this size of automated testing suite is possible. Thanks for sharing this level of detail. Can you shed any light on how this has increased the speed of getting new features to market and reduced overall defects after a release?

  4. Philippe Hanrigou
    August 3rd, 2008 @ 05:00 AM

    Hi David, Sounds cool. Just curious though, since you are only parallelizing your Selenium test runs at a very high level (the test suite split by categories) there is a limit to how much you can parallelize these runs and leverage your hardware/agents. 90 minutes is a great achievement, but it still sounds like quite a long time for agile feedback (how many new checkins by the Mingle team in 90 minutes? ;-) Did you ever experiment with more aggressive parallelization down to the test level? For instance using Selenium Grid and DeepTest distributed test runner? (http://selenium-grid.openqa.org, http://deep-test.rubyforge.org) I bet that even with the same hardware you could cut down the time of your build even further! Cheers, Philippe hanrigou

  5. David
    August 12th, 2008 @ 07:05 PM

    @Bryan - I'm not sure I can quantify any increase in productivity as that is always difficult, if not futile. However, I can state that the build spends a lot more time in a green state and the developers and testers spend a lot less time fixing it. Perhaps more importantly, the developers no longer see build maintenance as a burden. It's no longer de-motivating. So clearly there is an increase in productivity. @Philippe - The team did investigate DTR for units and functionals and found that the pros of optimized parallelization were outweighed by the cons of too many moving parts, too much configuration, and a bit of instability. For one reason or another, DTR required a lot of work to get going. I'm sure other teams have had more positive experiences -- clearly everyone is going to have a different set of variables and issues in this sort of build. You are correct that the Cruise solution is not fully optimizing available hardware, but it's proving to be a much more simple and stable solution than anything previously attempted. As to how many commits the team makes per 90 minutes... given that it's a fairly small team, and the team is working across 2 time zones that are 9 hours apart, it's actually not as many as you might think. Execution speed is not the sole problem with these types of test suites. Brittleness and instability tend to be massive issues as well. While the team is by no means satisfied with 90 minutes, it does feel it's found a sweet spot w/ regards to both speed and stability. I will keep you posted on future improvements.

  6. nowdeste
    September 28th, 2008 @ 04:16 PM

    thanks much, guy

Leave a Comment

Our comments policy


Products  |  Customers  |  Contact Us
Copyright 2008 ThoughtWorks, Inc.