Cloud Scale Testing

‘Cloud Scale Testing’ can be defined to have two main dimensions: The testing of whole infrastructure operating at a global Cloud scale, and the use of Cloud to implement testing practices at an equivalent scale.

Chaos Engineering

In their blog for DevOps.com Casey Rosenthal provides a brief introduction to Chaos Engineering and explores the key dynamics of the practice, in particular sharing the key insight, the difference between testing and experimentation: Testing is usually based on a very specific understanding of what is being tested and the anticipated outcomes, whereas as the name suggests Chaos Engineering is about experimenting with unknowns.

Netflix offers a great tutorial on this in their 2016 presentation Microservices at Netflix Scale.

What this demonstrates is that testing practices can be applied to the whole IT environment, not just isolated application code, and the key to this is primarily the assumption that failures will occur and therefore the system is engineered with a capacity to rapidly adapt to them. Netflix pioneered and operates an approach of ‘circuit breakers‘, implemented through their Hystrix component.

Cloud guru David Linthicum makes the point that Cloud Native efforts won’t succeed without a suitable test automation capability like this. The Cloud Native QA guide repeats the Netflix philosophy, notably “It is important to not only design for failure but test for recovery”.

Again automation of these adaptations is critical as is the ongoing destructive testing of the environment to ensure the capability. They implement these tests at a very large scale, simulating the failure of entire AWS regions, demonstrated in visual form at 35m:35s.

Cloud Based Testing

Speaking at Cloud Next the founders of Mabl provide an in-depth review of the second aspect – Harnessing the Cloud for enhanced testing capabilities. They talk through deployment across the Google Cloud and how this enables large scale ‘intelligent testing’ features.

Primarily this intelligence refers to adaptability and automation of high volumes of testing functions. For example, testing can adapt to changed forms and code so it doesn’t break when these changes are made, and data analysis is performed to surface insights for testers and developers that eliminates much repetitive grunt work for them. All of this can be integrated into the CI/CD workflow so that it happens automatically as part of the build process.

At the start of the talk Dan Belcher sets the scene for why these new capabilities are needed:

“Development cycles are getting shorter and new features are coming faster than ever… There simply isn’t enough time to test”. The old ways that we had thought about QA are simply not holding up anymore due to the continuous development and delivery. More and more releases are coming our way and the challenge to us as an industry is how can we achieve that velocity while still being able to deliver high-quality products and build our user’s trust.”

His presentation states the key improvements needed for software testing in the Cloud era, being able to deliver tests that:

  • Adapt seamlessly to applications that change rapidly.
  • Will run in the cloud with exceptional scale and speed.
  • Produce insights based on the output that is actionable.
  • Integrate with automated CI/CD pipeline.
Back to top button