Your CI Is Slow Because You’re Testing Too Much

Where is the wisdom we have lost in knowledge? n Where is the knowledge we have lost in information?

– T. S. Eliot

Let me walk you through a good problem we ran into and how we solved it. AI has been incredibly useful in speeding up both development and validation. But that same acceleration created a new bottleneck in our CI/CD pipelines, where running near-100% coverage across unit and integration tests started taking too long.

The real challenge was with integration testing and AI evaluation flows. We manage close to 16,000 UI and API integration tests for a large application that has evolved over 10 to 15 years. More recently, AI enablement added another 200+ automated evaluation flows. Running all of them on every PR merge consumed significant time, infrastructure, and cost, slowing release timelines.

The problem was that many PRs only touched a small part of the system, but we still had to run everything because there was no practical way to confidently identify only the impacted tests and flows. Manually deciding what to run was simply not scalable. That is the problem I set out to solve with the solution below.

Here’s a number that will feel familiar: 12 hours.

That’s how long our regression suite took. Not because we were sloppy — because we were thorough. 16,000 tests, every API, every UI flow, every edge case. And every time someone pushed a change, we ran all of them. n Nobody ran the full suite locally. Feedback came the next morning. Context was lost. Bugs stayed unfixed. Engineers stopped caring.

n I wanted something different: don’t speed up the tests — run fewer of them. Not randomly fewer. Intelligently fewer — only the tests relevant to the specific code change.

Your Coverage Data Already Has the Answer

You’re probably already collecting code coverage. JaCoCo, coverage.py, Istanbul — most teams run it.

That data tells you:

For each test → which source code did it touch?

Flip that map, and you get something far more useful:
For each piece of source code → which tests touch it?

That’s the entire idea. Build the inverted map once. Then, on every code change, look up which tests cover the changed packages. Run only those.

The Idea Diagram

The Inversion Concept

The inversion is a single-loop fast hash map build. The lookup is O(1). The impact is enormous.

How It Works: Two Phases

Two Phase Approach

Phase 1 runs periodically — once a week or on major releases. You run your full suite with the coverage agent, record which packages each test covers, and save the inverted map as a plain JSON file committed to your repo.

Phase 2runs on every CI trigger in under a second. Get the changed files from version control, look up the packages in the JSON map, union the test sets, generate a test runner config. Done. n The map is a plain JSON file:source package → list of test classes. Human-readable, version-controlled, and debuggable — if a test is selected or skipped unexpectedly, you can see exactly why.

The Results

The solution has been implemented across multiple product domains, delivering significant savings in both time and cost.

| Where | Full Suite | Selected | Time | Cost |
|:—:|:—:|:—:|:—:|:—:|
| Enterprise storage platform | 6,000 | ~300 | 10–12 hrs → 30 min | −97% |
| Enterprise eSignature platform | 10,000 | ~500 | 8 hrs → 30–60 min | −$300K / 3 yrs |
| Enterprise AI eval platform | 16,000 | ~500 | 12 hrs → 30 min | −$250K+/year |

:::tip
Try It: You need four things: a test listener, a full-suite run to build the map, a VCS hook for changed files, and a test config generator from the lookup result. Under 300 lines of code total. The map probably already exists in your coverage data. You just need to flip it.

:::

Your Coverage Data Already Has the Answer

The Idea Diagram

How It Works: Two Phases

The Results

Leave a Comment Cancel reply