Six weeks inside
live WISeR production.
Field notes from running a deployment-flexible, agent-decomposable prior authorization platform processing Medicare Fee-for-Service under the CMS WISeR Model. The first six weeks of production. What worked exactly as we predicted. What surprised us. What we would change if we were starting over today.
Live in CMS Medicare since January 2026.
Genzeon Platforms is one of six commercially available platforms processing Medicare Fee-for-Service prior authorizations under the CMS WISeR Model since January 2026. Our deployment is in MAC JL (New Jersey) with Novitas Solutions. As of the period covered by this note: 15,000+ authorizations processed, 100% compliance with the CMS three-day turnaround time, zero auto-denials by architecture. The deployment is the first commercial-AI-assisted PA pilot the federal government has run, and the most likely model to inform broader Medicare PA policy.
The architecture earned its keep.
A few things we built defensively and were glad we had. None of these were heroic; all of them prevented production incidents that would have been operationally expensive to debug after the fact.
Audit packet generation as a synchronous step.
We made the audit packet a blocking step on every determination, not a background job. This caught three classes of bugs in the first week — cases where a determination was being issued without an evidence chain that could defend it. Every one of those cases would have shown up later as an audit finding; instead they showed up immediately as failed determinations that we could investigate and fix.
Typed state and replayability.
The state object's persistence at every agent boundary meant we could reproduce production failures locally with high fidelity. The mean time to debug a production issue in the first six weeks was 47 minutes. Comparable shops without typed-state architectures spend hours or days on the same class of issue.
The no-auto-deny invariant.
Every clinical non-affirmation routed through a human reviewer. Zero auto-denials. The invariant was tested repeatedly by edge cases that would have looked, to a less-disciplined system, like reasonable candidates for auto-denial. The architecture's rejection of those temptations was a feature.
Rule pack versioning with content hashes.
Every audit packet records the rule pack version applied. When a rule pack updated mid-month, every determination was traceable to the exact rule pack content active at the moment of decision. This is going to matter at audit time. It already mattered when a provider questioned a determination and we could pull the exact rule text in seconds.
The model was the easy part.
Three things we did not predict, in roughly the order they showed up.
Volume of administratively-correctable submissions.
A non-trivial fraction of incoming submissions in the first six weeks had administrative issues — wrong code-set version, missing required field, expired beneficiary identifier — that we caught in intake validation. Higher than we had budgeted for. The intake agent's value was greater than its training-data evaluation predicted.
Request pattern variation within a single MAC.
We had assumed the request distribution would be roughly uniform across the MAC. It was not. Provider-mix and population-mix differences across regions within MAC JL produced meaningful variation in the kinds of services being requested. Rule pack tuning that worked for one region needed adjustment for another.
Integration was the hard part. Not the model.
The hardest production failures were not AI failures. They were FHIR endpoint behavior, X12 278 parsing variations, eligibility data feed edge cases. Model behavior was largely as predicted by pre-production evaluation. The model was the easy part. If we were starting over, we would lead with integration testing rather than model evaluation.
Three things, if we were starting today.
Useful for any team about to start a similar deployment, or about to scale to additional MACs.
One. We would lead the integration sprint with a hardened test harness that exercised every FHIR endpoint and every X12 278 variant against the customer's actual production data, not synthetic data. We did exercise the endpoints; we did not exercise them at production-data scale until the first weeks of go-live. Most of our early operational firefighting was production-data realism that the test harness had not seen.
Two. We would invest more in upfront rule-pack tuning by region within the MAC, rather than deploying a uniform pack and tuning post-go-live. This would have caught the regional pattern-variation surprise before it became operational noise. The cost of pre-deployment regional tuning is a few engineer-weeks; the cost of post-deployment tuning is reviewer time and explanation calls.
Three. We would deploy the audit-packet validator (a separate service that re-checks every emitted audit packet for completeness) before production go-live, not in the second week. We had it on the roadmap; we deferred it under deployment pressure; the result was three days of post-hoc audit-packet review when we caught a packet-completeness issue that had affected a handful of determinations. Cheap to ship pre-deployment, expensive to ship post-deployment.
Why this deployment matters beyond Genzeon Platforms.
WISeR is the first time the federal government has fielded commercial AI in Medicare PA. Every commercial platform in the model is a data point CMS will use to inform broader policy.
The Wasteful and Inappropriate Service Reduction Model is a six-year program (2026–2031) operating in six states (New Jersey, Ohio, Oklahoma, Texas, Arizona, Washington). The architectures the participating commercial platforms ship — the auto-deny disciplines, the audit-packet patterns, the human-routing protocols — are going to inform Medicare PA policy beyond WISeR. The CMS Innovation Center treats the model as a learning environment.
The architectural commitments we made — no auto-deny, audit-packet-as-primary-output, decomposed agents — are not just our preferences. They are bets about what CMS will eventually require of every commercial AI platform operating in Medicare PA. Six weeks in, those bets look correct. The next six years will say more.
FAQ.
What is the CMS WISeR Model?
The Wasteful and Inappropriate Service Reduction (WISeR) Model is a six-year CMS Innovation Center model running 2026–2031 that uses AI alongside human clinical review to streamline prior authorization for selected Medicare Fee-for-Service items and services in six states. Genzeon Platforms is one of six commercial platforms participating.
How is Genzeon Platforms participating in WISeR?
Genzeon Platforms operates in MAC JL (New Jersey) with Novitas Solutions, processing Medicare FFS prior authorization since January 2026. Production performance: 15,000+ authorizations processed, 100% compliance with the CMS three-day turnaround time, zero auto-denials by architecture.
What surprised the team in the first six weeks of production?
Three things: the volume of administratively-correctable submissions (intake validation caught more than expected), the request-pattern variation by region within a single MAC, and how much of the work was integration validation versus model behavior. The model was the easy part.
What would the team do differently?
Lead with integration testing rather than model evaluation. The hardest production failures were not AI failures — they were FHIR endpoint behavior, X12 278 parsing variations, and edge cases in the eligibility data feed. Model behavior was largely as predicted by pre-production evaluation.
Continue the cluster.
WISeR Live Deployment
The full deployment overview. Architecture, partners, performance metrics, MAC JL specifics.
See the deployment Note · EngineeringPA Agent Architecture
The reference architecture that powers this deployment. Agents, state, audit packets, failure modes.
Read the note Note · PositionThe Auto-Deny Problem
The architectural argument behind the zero-auto-denial outcome of this deployment.
Read the note Note · ArchitectureHow PA Automation Actually Works
The seven-step workflow this deployment operationalizes.
Read the note AudienceFor Government
The public-sector view. CMS, federal health programs, sovereign deployment.
Read the gov view OutcomesCustomer Outcomes
Production deployments and measurable outcomes across HIP One and PES One.
See outcomesA walkthrough of the live deployment.
A 30-45 minute conversation with the operations team running the WISeR deployment.