Risk lives in the details…

Risks to your business are found in the details, so get deep into it with your teams. I cannot bang on about this enough, and am consistently surprised by how little it’s talked about in the software testing industry.

I still read EVERYTHING produced by a project, review the controls, processes, culture, and anything else I can get my hands on to see how their approach to testing is putting the business at risk.

I’m currently working on an enterprise AI transformation and in the rush to get things done, GenAI mania, and a healthy dose of FOMO, a fair bit of risk management is being missed or glossed over. No one thing is at fault, but all of them working in concert are punching holes in risk management.

AI systems are inherently complex and with many producing non-deterministic output, very difficult (if not impossible) to test, so a clear-eyed view of risk is paramount.

If you’re not familiar with “How Complex Systems Fail” by Richard Cook, check it out today as it’s a goldmine for ideas to review systemic risk. Here are some examples you can use as a lens to view your work:

  • Catastrophe requires multiple failures – single point failures are not enough.

The array of defences works. System operations are generally successful. Overt catastrophic failure occurs when small, apparently innocuous failures join to create opportunity for a systemic accident. Each of these small failures is necessary to cause catastrophe but only the combination is sufficient to permit failure.

  • Complex systems contain changing mixtures of failures latent within them.

The complexity of these systems makes it impossible for them to run without multiple flaws being present. Because these are individually insufficient to cause failure they are regarded as minor factors during operations. Eradication of all latent failures is limited primarily by economic cost

but also because it is difficult before the fact to see how such failures might contribute to an accident.

  • Complex systems run in degraded mode.

A corollary to the preceding point is that complex systems run as broken systems. The system continues to function because it contains so many redundancies and because people can make it function, despite the presence of many flaws. System operations are dynamic, with components failing and being replaced continuously.

Good questions to be answered by test management…

  • Does our test approach look at systemic risk and where multiple individual vulnerabilities could compound into catastrophic failure?
  • Have we accounted for change that might not be directly relevant to our test plan that could trigger latent issues?
  • Do we understand where our systems ALREADY broken (defect data) and where weaknesses could exist but not readily apparent through our test strategy?

Good luck and keep testing!


Discover more from Quality Remarks

Subscribe to get the latest posts sent to your email.

Leave a Reply