CI was green. Tests were passing. PRs were merging. The system was, by all outward appearances, humming along perfectly. Except it wasn’t. Not even close. And the logs? Utterly useless in revealing the rot.
This is the tale of a JavaScript to TypeScript migration that, instead of streamlining things, managed to double the test runs without a single visible error. A cautionary episode for anyone who blindly trusts that green checkmark. We all assumed TypeScript compilation would add a little overhead. Normal. A speed bump, not a hidden disaster.
But then came the accidental discovery. Near the end of the migration, as old .js files were scrubbed, a peculiar thing happened. The test count plummeted by nearly half. From roughly 240 tests down to a mere 120. This wasn’t a minor discrepancy; it was a structural implosion. How could deleting redundant files halve the workload? It couldn’t.
Suddenly, performance debugging took a backseat to a far more insidious problem: duplicated reality. Playwright, in its infinite wisdom, was happily picking up both .spec.js and .spec.ts files. Every single test was executing twice. The same setup, the same assertions, the same teardown – a silent, invisible doubling. The real kicker? CI painted a picture of improvement. Slowing runtime read as ‘normal post-migration overhead.’ A perfectly plausible, utterly false narrative.
The culprit was deceptively simple. The playwright.config.ts file lacked an explicit testMatch directive. Playwright’s default glob pattern, bless its heart, happily slurped up both .js and .ts files. Everything. The fix? A single, elegant line:
testMatch: ['**/*.spec.ts']
Of course, getting to that one line was a protracted exercise in navigating a landscape of false positives. It’s a stark reminder.
CI does not validate correctness. It validates execution. Period. A green CI pipeline simply means nothing crashed during the run. It offers no assurance that the right tests ran, in the correct quantity, with the intended environmental assumptions. It’s a hollow victory.
This particular mess could have been flagged by a simple discovered tests counter within CI. A deviation from the expected number would trigger an explicit failure, rather than a deafening silence. That counter is now standard procedure. The now-broken configuration that facilitated this duplication is also preserved, a tangible artifact for learning and reproduction.
Most issues lurking in test systems don’t manifest as outright failures. They slither in as:
- Duplicated execution cycles.
- Performance degradation that’s too subtle to immediately alarm.
- Runner behavior shifts that are completely masked by seemingly unchanged tests.
And critically, none of these have built-in alerts because we simply don’t design our systems to look for them. We design for the obvious crash, not the silent deception.
The Hidden Deception of Green CI
This situation perfectly encapsulates a fundamental misunderstanding of what CI truly guarantees. The system reported success, but the underlying reality was twice as inefficient and equally untested. The author’s quote nails it:
“I assumed a slower CI run meant normal post-migration overhead. The runner had been doing twice the work for weeks — silently, without a single warning.”
This isn’t about blame; it’s about systemic flaws. We’ve become so accustomed to the binary of pass/fail that we’ve neglected the spectrum of “pass but is wrong.” The migration itself wasn’t the issue; it was the lack of observability into the test execution process that allowed the problem to fester.
Why Does This Matter for Developers?
Because too many of us treat CI as a black box that magically ensures quality. When that box turns green, we nod and move on. This experience is a harsh wake-up call. It suggests that our test suites, our CI configurations, and our expectations might be fundamentally misaligned. We need to build more explicit checks for the process of testing, not just the outcome of individual tests.
This isn’t just about Playwright or TypeScript. It’s a universal problem in automation. If your CI environment can double its workload silently, what else is it doing without your knowledge? Think about it.
Failure Signature: - CI status: Green - Observed runtime: Doubled - Reported test count: Doubled - Warnings or errors: Zero
This pattern is the siren song of silent failure. It’s the engineering equivalent of a doctor telling you your vital signs look great while you’re actually experiencing a heart attack. It requires a shift in perspective – from merely executing tests to actively monitoring the execution of tests.
Is Test Automation Still Worth It?
Absolutely. But with a critical caveat: we must evolve how we monitor and validate our automation. This isn’t a unique bug; it’s a common blind spot. The solution lies in augmenting basic execution with checks for unexpected behavior, test count anomalies, and performance drift. The goal isn’t just to run tests, but to ensure the right tests run, correctly, and efficiently.
This experience, part of the Silent Failures in Test Automation series, is a vital data point. It forces us to ask: what other subtle deceptions are lurking within our CI pipelines, masquerading as progress?
🧬 Related Insights
- Read more: Cobra: The Snake That Still Bites Back in Go CLI Hell
- Read more: Monoliths to Microservices: The Real Scaling Playbook
Frequently Asked Questions
What was the main problem the developer encountered?
The developer found that their test suite was running twice as many tests as intended after a migration, doubling CI runtime without any reported errors. This was due to Playwright’s default configuration picking up both .js and .ts files simultaneously.
How did the developer fix the duplicated test runs?
The fix involved adding a specific testMatch configuration to Playwright’s config file, instructing it to only include .ts files, thereby excluding the now-redundant .js files.
What is the key takeaway about CI status?
The key takeaway is that a “green” CI status only indicates that tests executed without crashing. It does not guarantee that the correct tests ran, that no tests were duplicated, or that performance metrics are as expected.