Sunday, May 06, 2012

Google Automation of my QA Strategy

I have consistently found in my own companies and Openview Venture Partners companies that I coach that carefully prioritized implementation of acceptance tests produces higher quality faster than anything I have seen.

An Open-E in Poland they implemented continuous integration and immediately increased the velocity of over 50 developers by 20%. The next step was to improve the quality of their open source network storage server software so it could be released faster with fewer support problems. The software had to run on 80 different hardware platforms and needed thorough acceptance testing by a quality assurance team after development had completed a release. This acceptance test phase took 4-6 weeks per release.

I told them to proritize the problems encountered by the quality assurance team and implement tests in the build process to prevent these problems from ever appearing again. After 3 weeks of work by one developer there were 130 tests were running in the build process. The next release cycle cut quality assurance to 2 weeks and reduced support calls from the field by 50% saving millions of euros of development time and generating millions of euros in increased sales. Nothing I have seen works better than this.

Google has developed a brilliant automated strategy called "Bug Prediction" which does essentially the same thing. Seems like every team should be doing something like this.


Bug Prediction at Google

What's the problem?
Here at Google, we have thousands of engineers working on our code base every day. In fact, as previously noted, 50% of the Google code base changes every month. That’s a lot of code and a lot of people. In order to ensure that our code base stays healthy, Google primarily employs unit testing and code review for all new check-ins. When a piece of code is ready for submission, not only should all the current tests pass, but new tests should also be written for any new functionality. Once the tests are green, the code reviewer swoops in to make sure that the code is doing what it is supposed to, and stamps the legendary “LGTM” (Looks Good To Me) on the submission, and the code can be checked in.

However, Googlers work every day on increasingly more complex problems, providing the features and availability that our users depend on. Some of these problems are necessarily difficult to grapple with, leading to code that is unavoidably difficult. Sometimes, that code works very well, and is deployed without incident. Other times, the code creates issues again and again, as developers try to wrestle with the problem. For the sake of this article, we'll call this second class of code “hot spots”. Perhaps a hot spot is resistant to unit testing, or maybe a very specific set of conditions can lead the code to fail. Usually, our diligent, experienced, and fearless code reviewers are able to spot any issues and resolve them. That said, we're all human, and sneaky bugs are still able to creep in. We found that it can be difficult to realize when someone is changing a hot spot versus generally harmless code. Additionally, as Google's code base and teams increase in size, it becomes more unlikely that the submitter and reviewer will even be aware that they're changing a hot spot.

In order to help identify these hot spots and warn developers, we looked at bug prediction. Bug prediction uses machine-learning and statistical analysis to try to guess whether a piece of code is potentially buggy or not, usually within some confidence range. Source-based metrics that could be used for prediction are how many lines of code, how many dependencies are required and whether those dependencies are cyclic. These can work well, but these metrics are going to flag our necessarily difficult, but otherwise innocuous code, as well as our hot spots. We're only worried about our hot spots, so how do we only find them? Well, we actually have a great, authoritative record of where code has been requiring fixes: our bug tracker and our source control commit log! The research (for example, FixCache) indicates that predicting bugs from the source history works very well, so we decided to deploy it at Google.


For details click here ...

No comments: