Germany's FHIR mandates — ISiK on the supply side, the ePA on the demand side — share a property that turns out to be the whole problem: they certify or test conformance at a point in time, but they operate continuously, at national scale. So how do you know your conformance holds between those checkpoints? The answer most teams reach for is: "We validate in CI."

It's a reasonable answer. Running a FHIR validator in a pipeline and catching profile violations before they reach production is a meaningful improvement over manual release checks. For many teams it's the first serious quality gate they've added.

This is not an argument against CI validation. CI validation is necessary. A fast, embeddable validator such as @records-fhir/validator makes that layer easier to add to GitHub Actions, local development, and automated checks. That is where the validator belongs: close to code, fixtures, pull requests, and controlled changes.

But CI still answers one class of question: did this controlled change break the examples we tested?

That question matters. It is not the same as asking whether live production data is still conformant after profiles, terminology, or infrastructure changed independently. The limitation is not the validator. It is the measurement context. CI runs at the boundary of a code change; FHIR data quality changes outside that boundary. It's the same shape as the certification gap behind both ISiK and the ePA: a model built to verify a point-in-time state, applied to a system whose state changes between measurements.

What CI is designed for

CI — continuous integration — is designed to verify that software changes don't break things. A developer commits code. The pipeline runs. Tests pass or fail. If they fail, the change doesn't ship.

The assumption baked into this model is that the system is deterministic: given the same inputs, it produces the same outputs. Regressions come from code changes. If no code changed, nothing broke.

FHIR data quality doesn't work this way.

The three things that change without a code change

FHIR data validity is a relationship between three moving parts: the data, the profiles applied to it, and the terminology sources those profiles reference. All three can change independently of your codebase.

The profiles change. An Implementation Guide publishes a new version. A cardinality constraint tightens. A new must-support element is added. Resources that conformed to the previous version may now fail — and no line of your code changed.

The terminology changes. A SNOMED CT release retires codes your data uses. A ValueSet expansion shifts because the underlying hierarchy was updated. A CodeSystem version is pinned in your validator but not in your production server. Again: no code change, new failures.

The infrastructure changes. Your FHIR server gets a security patch. The validator dependency in your pipeline is pinned to one version while a partner, staging job, or server-side validator uses another. Your staging environment drifts from prod. These are software changes — but they're not your application code.

None of these changes trigger a CI run. None of them will be caught by a pipeline that only runs when someone pushes a commit.

The test fixture problem

Even when CI does run, it runs against the wrong data.

CI pipelines run against test fixtures — curated, controlled resources that cover the cases you thought to test. They're constructed to be valid, or to represent known failure modes you've already handled.

They don't represent your live production data. They don't contain the edge cases that emerge at scale. They don't include the patient records created two years ago against an older profile version. They don't include the resources ingested from a partner system with different terminology binding assumptions.

This matters because FHIR data quality failures often live in the long tail. Rare code combinations. Resources from systems that implement the spec differently. Legacy data that was valid at ingestion time and has since drifted. A test suite, however thorough, won't find what it wasn't designed to find.

The only data that tells you whether your production system is healthy is your production data.

The signal problem

When CI validation catches an error, it produces a pass/fail signal. The pipeline goes red. A developer investigates. The issue gets fixed. The pipeline goes green.

This is the right model for software correctness. For data quality, it's insufficient — for one reason: the signal is build-scoped. It may be stored as a CI log, but it is not a comparable, reproducible history of your production data.

Three months from now, when a partner rejects a batch of resources and asks when the problem started, your CI build history doesn't answer that question. This isn't primarily about audit evidence — it's a debugging problem. You can't determine whether a failure was introduced last week or six months ago if your only record is whether each individual build passed.

Evidence is not the same as a signal. Evidence is a timestamped, reproducible artifact tied to specific, immutable inputs — validator version, IG packages, terminology snapshot, environment — that anyone can independently verify. A signal tells you the current state; evidence tells you whether the state changed, when it changed, and what changed with it.

CI produces signals. Evidence requires a different mechanism.

The coverage gap

There's a final structural problem: CI covers the path forward, not the data already in production.

When you add a new validation rule to your CI pipeline, it protects future data. It doesn't tell you whether the data already in your system conforms to that rule. Production datasets accumulate over years. An ISiK profile revision from six months ago may have introduced a constraint that none of your existing patient records satisfy — and your CI pipeline will never surface that, because those records were ingested before the rule existed.

Validating new data on the way in is necessary. It's not the same as knowing the state of the data already at rest.

What continuous validation actually means

The observability community solved this problem for application performance a decade ago. You don't only measure response times when you deploy code. You monitor them continuously — because the system degrades in ways that have nothing to do with code changes. Server load increases. External dependencies slow down. Data patterns shift. You need a continuous signal, not just a pre-deploy check.

FHIR data quality needs the same model.

Continuous validation means:

Running on a schedule against live production data — not triggered by commits, not limited to fixtures
Comparing against a baseline — so you know immediately when conformance status changes, not just whether something is currently failing
Operating independently of your pipeline — so terminology changes, profile updates, and infrastructure drift are caught regardless of whether anyone touched the code
Producing evidence, not signals — timestamped, reproducible runs tied to locked inputs that survive past the moment

This isn't a replacement for CI validation. CI still catches the regressions you introduce. Continuous validation catches everything else.

For teams that want to tighten CI first, the validation engine inside Records is published as @records-fhir/validator on npm under Apache-2.0. The 0.3.0 release is built for that developer-tooling layer: instant startup, embeddable in any Node.js pipeline, with fix suggestions for every error code. For a concrete setup, see the GitHub Actions validation gate.

Records uses the same validation discipline for a different operational question: scheduled runs, baselines, deltas, and reproducible evidence against real environments. The CI gap and the continuous validation gap are different problems; closing them well requires different layers.

The question CI can't answer

CI answers: "Does this code change break validation?"

That's a useful question. It's not the only question.

The question CI can't answer: "Is my production data valid right now — and has that changed since last week?"

For a FHIR server supporting clinical workflows, that's the question that matters. It's the question your partners will ask when a data exchange fails. It's the question your auditors will ask when they review your compliance posture. It's the question you need to answer before promoting a server upgrade or a profile revision to production.

CI was built for a model where correctness is a property of code. FHIR data quality is a property of a relationship — between data, profiles, and terminology — that changes on its own schedule.

Treating CI as your quality gate means treating that relationship as static. It isn't.

Why CI Validation Is Not Enough for FHIR