Scaling Availability With Distributed E2E Testing

When people say “CI/CD” in one word, most of the time they’re actually describing only CI with a deploy step glued at the end. I did this mistake for a long time too.

In one of the projects I’m working on at Bitboundaire, I had to really separate both things in practice. The moment I started to treat E2E tests as a deployment gate instead of “just another job in CI”, the gap between CI and CD became very clear.

Different repos and layers move independently: frontend, backend, auth, IaC, etc. Each part evolves separately, and it creates a real risk: a small, local change can silently break a global user flow. I wanted a setup where any change in these parts could be validated from the user journey perspective before production, and in a way that engineers feel responsible for the platform quality end-to-end.

At Bitboundaire we don’t have a dedicated Test team. The same people who write features are the ones who write and update tests, own failures, and fix the bugs. This shapes a lot of how we design our pipelines: quality is not something we throw over the wall, it’s something we bake into the release process.

The incident that pushed us to distributed E2E

One day in this project, we did a “simple” infrastructure change that went wrong.

A small update in our IaC indirectly affected Cognito and blocked new users from signing up. All CI was green. Unit and integration tests were fine. The issue only appeared when the whole ecosystem was wired together. We didn’t detect it from our pipelines. We detected it after people started complaining in support.

No unit test or local integration test would have caught that before production. It was not a “bug in a single service”, it was an ecosystem side effect.

That’s exactly why we moved to a distributed E2E approach.

In this project, we now make sure that:

End-to-end flows are validated across the whole ecosystem, not just per repo
Changes in one area (like IaC or auth) are tested together with the rest of the platform
We don’t depend on a dedicated Test team or manual regression cycles to protect core journeys

The result is not just “more tests”. The result is fewer surprises: when something breaks, it tends to break in a controlled environment first, not in front of a new customer trying to sign up. This is very aligned with how we think at Bitboundaire: if we say we truly care, it has to show up in the way we release software, not in a slide deck.

How I separate CI and CD in my head

My mental model is:

CI (Continuous Integration) is where I try to break code as fast and cheap as possible.
CD (Continuous Delivery/Deployment) is where I try to break releases before users see them.

In this project:

CI lives inside each repo: frontend, backend, auth, infra.
CD is cross-repo and is orchestrated through a dedicated E2E repository, using GitHub Actions and repository_dispatch.
On every PR and merge, CI runs the usual stack inside each repository: lint, type checks, unit tests, integration tests. If any of that is red, the change doesn’t even touch staging.

CD starts only after a change is already merged and deployed to staging. That’s where the distributed E2E comes in, in a separate repo, validating the whole platform from the outside.

This separation is important because it changes behavior in the team:

CI failures feel like “I broke my own code.”
CD / E2E failures feel like “I broke the platform experience.”

Same engineer, different level of responsibility.

What we keep in CI vs what we moved to CD

To avoid repeating the same things in ten sections, I’ll summarize the layers, then talk about how they show up in the process.

Unit tests in CI

⁠Unit tests live entirely in CI:

They validate small pieces of logic in isolation.
They are cheap and fast, so we run them all the time.
When they fail, the signal is very direct: you know exactly where to look.

They are the first line of defense. If we’re breaking basic logic and letting it reach integration or E2E, something is wrong in how we write code.

Integration tests in CI

Integration tests also live in CI:

They cover interactions between parts inside a repo: controller + service + DB, or a React tree + mocked API, etc.
They are slower, but still acceptable for PRs.
When they fail, you still have a pretty localized blast radius.

⁠They give us service-level guarantees. For a single repo, we want to know: “internally, this thing is consistent”.

E2E tests in CD (staging)

E2E tests live in CD and run in staging:

They cover real user journeys across multiple components: login, onboarding, main workflows, billing, etc.
They are slower and more expensive to maintain, but they reflect reality from the user point of view.
When they fail, they normally mean: “someone just broke something important that a real user will feel.”

The important part is when each layer runs:

Unit + integration: gate for merging (CI).
E2E: gate for releasing (CD).

So if something slips and E2E catches it, it becomes feedback for us to improve the lower layers inside the repos. Since the same engineer owns feature + tests + bugfix, this feedback loop is actually tight and practical.⁠
⁠

The architecture: separate E2E repo + repository_dispatch

Now the fun part: how this works wired in GitHub Actions.

We have multiple repos:

frontend
backend
auth
infra / IaC
e2e-tests (this one is special)

Every “feature repo” (frontend, backend, etc.) has a similar shape:

1. CI pipeline on PR:

lint
type checking
unit tests

integration tests

2. Merge → deploy to staging:

after CI is green, we deploy the new version to staging.

3. After staging deploy completes, the workflow triggers a repository_dispatch event targeting the e2e-tests repo.

The E2E repo has its own workflow:

It listens to repository_dispatch.
Reads the payload (which repo, SHA, environment info, etc.).
Runs the Playwright E2E suite against the staging environment.
Reports status back (through GitHub checks / statuses) that is used by the original pipeline to decide if production deploy is allowed.

This gives us a few nice properties:

The E2E logic is centralized in a single repo, not duplicated in every service.
When we change the definition of “platform is healthy”, we do it once.
We can evolve the E2E suite and its infra without touching feature repos.

And again, there is no QA department “owning” this. The same engineers that ship features contribute to the E2E repo when they add new flows or change existing behavior. Responsibility is not outsourced.

Why a dedicated E2E repository instead of mixing everything

I see a lot of setups where E2E tests live inside the frontend repo or backend repo. I used to do that too. It works at first, but it scales poorly in a modular monolith with multiple repos around it.

With a dedicated e2e-tests repo, a few things are easier:

Clear system-level view

The tests are written from the perspective of “the platform”, not from “the frontend project”. The mental model is the user journey: “sign up”, “reset password”, “upgrade plan”, etc.
Explicit contract of what “healthy” means

The E2E repo becomes the contract that describes what must be true before we allow a prod deploy. When we intentionally change behavior, we change the tests together.
Ownership model aligned with responsibility

Since engineers own quality, the rule is simple: touched a critical flow? update or add the E2E that protects it. The repo structure makes that obvious and visible.

Playwright as the main E2E tool

For the E2E layer we picked Playwright. The reasons are pretty pragmatic:

Strong browser automation support.
Good parallelization and stability.
Good DX with TypeScript, which fits the rest of our stack.
Tracing, screenshots and videos, which help a lot when an engineer needs to debug a failing run.

⁠The repo is roughly structured by domains of the product, not by technical layers. So instead of “/login-page” and “/user-service”, we have things like:

⁠tests/
  auth/
    login.spec.ts
    password-reset.spec.ts
  onboarding/
    signup.spec.ts
    first-session.spec.ts
  billing/
    checkout.spec.ts
    change-plan.spec.ts

Each test is written with a story in mind:

“User creates an account, confirms email, and lands in the dashboard.”
“Existing user upgrades a plan and still sees all their data.”

“User in paid tier sees X features, others don’t.”

From the engineer's perspective, this makes it easier to reason about: when a product requirement changes, they go to the test that describes that journey and adapt it. Same person, full loop: code → tests → deploy → E2E → fix.

Staging must be close to prod or E2E becomes theatre

All of this only works because staging is not a random sandbox. We treat staging as “prod with smaller audience”:

Same infra topology and configuration style.
Same authentication paths (SSO, tokens, etc).
Same feature flags model, maybe with a bit more enabled for internal testing.

If staging diverges too much from prod, E2E passing means nothing. So we push strongly to keep them aligned. Since engineers own both code and quality, this alignment is something they feel in the day-to-day, not a checkbox.

What we actually traded: less deploys, more confidence

One consequence of this setup is visible on paper: production deploys got a bit slower.

Before this, we could deploy more times per day. After adding the full E2E gate, we added around 30 minutes to the path from “merge” to “prod” because we run the Playwright suite on staging.

If you look only at “deploys per day”, it looks like a slowdown.

But less deploys in a day doesn’t mean we became slower as a team. In practice, there is a speed effect that matters a lot more:

We spend far less time chasing production bugs.
We roll back and hotfix less often.
We avoid a lot of painful context switches caused by incidents.

So yes, we added automated E2E checks into the release path. But in practice, we speed up the overall process, because we’re not constantly paying the tax of shipping things we don’t trust. We release with more confidence and less firefighting, which means more real delivery.

This matches how we think about engineering at Bitboundaire: shipping fast is good, but shipping something people can trust is non-negotiable. The structure of the pipeline is just the reflection of that belief.

Engineers owning quality end-to-end

One last point that I think is important: this setup only works because engineers own quality.

There is no “throw to QA” step. The main principles are:

If you ship a feature, you own the necessary unit and integration tests for it.
If your change affects a critical flow, you also own the corresponding E2E updates.
If the pipeline is red, you don’t look around waiting for someone from “the QA team” to fix. You fix.

The CI/CD design reinforces this: failures are very visible, and they block merges or releases. But the process is not there to punish anyone; it’s there to support the kind of engineer we want to be: the one that cares about what happens after git push.

And for this project, the combination of strong CI inside each repo, and a distributed E2E gate in staging gave us a good balance between speed and safety. It lets us move a bit more deliberate, but with the feeling that what reaches production is something we can stand behind.

At Bitboundaire, “we truly care” means treating availability and user trust as product requirements, not as afterthoughts. This project is one concrete example of that: when independent pieces started to create invisible risk, we responded by tightening our end-to-end guarantees instead of accepting “support will catch it”.