There’s a conversation about software testing we need to have despite the noise and hype around AI productivity gains, coverage expansion, and “AI-assisted quality”:
What kind of reality are we shaping when GenAI writes our tests?
As I’ve said before, testing isn’t neutral and when under the guise of progress, we outsource testing to systems that don’t understand intent, context, or consequence we introduce new risks that look like progress until they aren’t.
What we choose to test (and not to test) helps determine what we believe to be true, and our choices become part of the mental model the business uses to make decisions. As project evolve, those beliefs harden into assumptions with embedded risks.
Our approach to testing shapes reality.
But because most tests reflect only assumptions rather than risks, the reality testing shapes about system quality is often inaccurate and generates unwarranted confidence. And confidence should be fragile; it should be grounded and bound by objective reality.
Good testing, then, is not about confidence. It’s about a continual re-alignment with realityto inform your business decisions. Additionally, volumes of shallow checks reduce a team’s ability to notice signals for systemic failures which should further erode confidence but typically has the reverse effect.
Now add to that equation the mother of all automated-test-generator-algorithmic-defect-predict-O-nators and I fear where we’re heading…
In the era of continuous everything and automate first we generate more tests than ever, but I still find very few test teams that understand what reality their tests represent or whether they’re still meaningful. When tests accumulate faster than teams can understand them fragility increases within our tests, systems and teams.
A useful metaphor I use when talking to management about testing is comparing it to a lens. A lens can focus your attention to one thing, blur other details, but ultimately, it’s a way of looking at the system you’re building and what matters to your business.
When test generation becomes cheap and abundant, the shaping force of those tests – the lens through which we view quality – increases dramatically.
What I’m seeing from most of the marketing and demos from GenAI testing tools is a lens that focuses volumes of tests on what’s easy to see and blurring the importance of human experience: user workflows, organizational incentives, operational stress, and socio-technical interactions.
If you’re organization is considering (as most are) using GenAI to generate tests, here are some considerations I believe should be at the top of your evaluation process:
- Do they create tests explicitly and visibly tied to user harm, business impact, operational failure, or regulatory exposure?
- Do they build entropy into their tests with assumptions that expire as confidence decays over time unless renewed by new evidence?
- What agency do testers have over test selection and execution in relation to coverage and analysis?
We’re just at the beginning of new chapter in the long story of software quality and testing, and I believe that how we frame the problem, how we view reality will be a key differentiator in the success or failure of our business.
Testing has aways shaped reality and GenAI has the potential have a powerful warping effect on that view and the propensity to move risk to areas we no longer look at due to volume, noise, and over-confidence from our leaders and vendors.
Daniel Kahneman famously stated “Overconfident professionals sincerely believe they have expertise, act as experts, and look like experts. You will have to struggle to remind yourself that they may be in the grip of an illusion”.
Clarity is required to dispel that illusion.
Clarity on intent, function, and the reality we’re shaping.
Discover more from Quality Remarks
Subscribe to get the latest posts sent to your email.