Vol. 1 · Study 1 · Protocol v2.0 · 22 April 2026 · Status: Draft for IRB submission
Call for Physician Adjudicators

Agentic AI for Automation of Medicare Prior Authorization Workflows in Primary Care

A Three-Arm In-Silico Comparative Effectiveness Study

We are recruiting four board-certified Family Medicine or Internal Medicine physicians to serve as independent reference-standard adjudicators for Study 1 of the PACE-AI research program. Adjudicators will establish the ground-truth labels against which an agentic AI system, an unassisted large language model, and the HL7 Da Vinci FHIR-standard reference implementation are compared on Medicare primary-care prior authorization cases. Primary publication target: NEJM AI.
Three-arm randomized in-silico simulation
2,340 synthetic PA cases
4 adjudicators sought
Months 3–4 · ≈2–3 weeks part-time
OSF preregistration · Zenodo deposit
Structured Abstract
Design
Three-arm parallel-group randomized in-silico simulation. Unit of randomization: the individual synthetic prior authorization case, stratified by CMS service category, documentation completeness, and criteria complexity.
Setting
Synthetic Medicare primary-care population (Synthea-generated), evaluated against four payer exemplars: CMS Fee-for-Service, UnitedHealthcare Medicare Advantage, BCBS Medicare Advantage, and a Regional MA plan.
Interventions
Arm A — agentic AI workflow (multi-agent OpenClaw orchestration with retrieval-augmented generation, human-in-the-loop checkpoint, and chain-of-thought prompting). Arm B — unassisted single-LLM prompted to mimic a clinical PA coordinator. Arm C — HL7 Da Vinci Prior Authorization Reference Implementation (the FHIR-standard non-AI system mandated by CMS-0057-F for 2027 payer compliance).
Sample
Approximately 780 cases per arm, 2,340 cases total, distributed across five CMS service categories: advanced imaging, durable medical equipment, Part B drugs, specialty referral, and other (PT/OT, home health, skilled nursing, speech).
Reference standard
Blinded dual-adjudicator rating protocol with third-adjudicator tiebreaking, per Clinical Events Committee–style methodology adapted for PA appropriateness determination. Inter-rater reliability monitored per batch (Cohen's κ ≥ 0.70 required).
Primary outcome
Concordance with reference-standard adjudicated label (Arm A non-inferiority versus Arm B). Co-primary: end-to-end processing time per case.
Status
Protocol v2.0 in draft. Pending IRB submission, OSF preregistration, and methodological review. Adjudicator recruitment open. Target submission to NEJM AI; backup venue Lancet Primary Care. Preprint to medRxiv concurrent with submission.

Background & Rationale

Why this study, why now

Prior authorization imposes one of the largest administrative burdens in US primary care and is a leading contributor to clinician burnout. Published analyses indicate that more than 93% of Medicare Advantage prior authorization requests are ultimately approved, while 82% of appealed denials are overturned — a pattern suggesting that a substantial fraction of the administrative process does not alter clinical decision-making and is a candidate for automation.

The regulatory environment is converging on the specific FHIR-based standards that constitute Study 1's Arm C. The CMS Interoperability and Prior Authorization Final Rule (CMS-0057-F), published January 2024, requires impacted Medicare Advantage, Medicaid, CHIP, and Qualified Health Plan payers to implement FHIR-based Prior Authorization APIs. Operational provisions began January 2026; full API compliance is required by January 2027. CMS explicitly points to the HL7 Da Vinci implementation guides — Coverage Requirements Discovery, Documentation Templates and Rules, and Prior Authorization Support — as the preferred implementation path. Arm C of this study represents exactly the approach the regulated industry is being mandated to adopt.

Across a structured literature review covering NEJM AI, Lancet Primary Care, JAMA, JAMA Network Open, JAMIA, and adjacent venues through April 2026, no original research has been published evaluating agentic AI for provider-side Medicare prior authorization automation. This is the gap Study 1 is designed to address.

Without rigorous, blinded, board-certified physician adjudication, no comparison between an AI system and a standards-body reference implementation is scientifically valid. The reference standard is the science. — Protocol v2.0, §3.7 Reference Standard Adjudication

Study 1 is designed in tandem with a planned follow-on study (Study 1b) that will extend the comparison to a real human clinic-staff arm under a full human-subjects regulatory pathway. Every methodological decision in this protocol — case construction, adjudication rubric, reference-standard derivation, data storage, and infrastructure — is made so the case bank and findings can be carried forward to Study 1b without re-deriving ground truth or re-adjudicating cases. Adjudicators recruited now are the foundational cohort for both studies.

Methods Overview

The three arms

Each synthetic case is independently randomized to one of three intervention arms, processed end-to-end by the assigned arm in a fully automated experimental harness, and the output is recorded under blinded human adjudication.

Arm B · Active Comparator

Unassisted Single LLM

Same underlying language model as Arm A, prompted in a single turn to mimic a clinical prior authorization coordinator. No tool use, no retrieval, no multi-agent scaffolding. Isolates the contribution of the agentic architecture itself.

claude-haiku-4-5-20251001 · single-call · frozen prompt

Arm C · Standards-Body Reference

HL7 Da Vinci FHIR

Coverage Requirements Discovery, Documentation Templates and Rules, and Prior Authorization Support reference implementations deployed per the HL7 Da Vinci specifications. Represents the non-AI rule-based approach mandated by CMS-0057-F for 2027 payer compliance.

HL7-DaVinci/CRD · prior-auth · CDS-Library · CQL rules

The case bank

The case bank consists of approximately 2,340 synthetic Medicare primary-care prior authorization cases generated using the Synthea open-source patient simulation platform, structured as FHIR R4 bundles with realistic clinical documentation. Cases are stratified across four payer exemplars (CMS Fee-for-Service, UnitedHealthcare Medicare Advantage, BCBS Medicare Advantage, and a Regional MA plan) and five CMS service categories (advanced imaging, durable medical equipment, Part B drugs, specialty referral, and other).

Upon completion of reference-standard adjudication, the case bank is frozen and deposited to Zenodo with a permanent DOI before any arm begins processing. This deposit is referenced in the preregistration and manuscript and is a non-negotiable methodological commitment of the protocol.

The Adjudicator's Role

What you'll do

Adjudication is conducted entirely remotely via a secure web interface. Each case is presented with all materials needed to make a determination. Your independent determination establishes the reference-standard label.

Review the case packet

Each case displays on a single screen: the synthetic patient record (FHIR bundle rendered as a clinic-style chart), the prior authorization order, the applicable payer coverage criteria (verbatim and codified), and the programmatic initial determination produced by the case-derivation module. No tab switching. No external lookups required.

Record your independent determination

Classify each case as Should Approve, Should Deny, or Ambiguous / Insufficient Information against the payer's stated coverage criteria. Optionally annotate the specific criterion driving your decision. All ratings are timestamped and locked. You cannot see another adjudicator's ratings.

Disagreements escalate to a third reviewer

Cases on which two adjudicators disagree are automatically routed to a third adjudicator for tiebreaking. You are not asked to revise your rating based on what another adjudicator decided — independence is the entire point. Inter-rater reliability (Cohen's κ) is computed per batch. If κ drops below 0.70, a brief recalibration discussion is scheduled before the next batch.

Work in batches of 100, at your own pace

Cases are released in batches of 100. There is no per-session minimum. Log in when convenient, work through as many cases as you wish, and log out. The interface tracks your queue and resumes exactly where you left off. Most adjudicators complete a batch of 100 in 3–5 sittings spread over one to two weeks.

Calibration first, primary adjudication after

Before primary adjudication begins, all adjudicators independently rate the same 10 pilot cases. The PI computes pairwise κ. If κ ≥ 0.70 across all adjudicator pairs, the rubric is frozen and primary adjudication begins. If κ falls short, discrepant cases are reviewed jointly, the rubric is refined, and calibration repeats.

Time commitment

Adjudication is designed to be realistic for an active clinician. The estimates below are conservative; experienced reviewers of Medicare PA criteria typically accelerate after the first 50 cases.

Onboarding session 1 hour, one-time
Calibration run 10 cases · ~2.5 hrs
Primary adjudication ≈585 cases · 15–20 min each
Total estimated effort ≈150–200 hours
Adjudication window Months 3–4 · ≈2–3 wks
Modality 100% remote · async

Eligibility

Who we're recruiting

Required criteria are non-negotiable. Preferred qualifications strengthen your application but are not disqualifying in their absence.

Required criteria

  • MD or DO degree From an accredited institution. Equivalent international degrees (MBBS, MBChB) accepted with valid clinical licensure.
  • Board certification in Family Medicine or Internal Medicine ABFM, ABIM, or equivalent international board. This aligns with the Medicare primary-care PA case population.
  • Active or recent clinical practice Currently practicing, or within the past three years. Independent practitioners and academic clinicians both welcome.
  • Clinical English proficiency Case materials, payer criteria, and the rubric are in English. Reading-level fluency is sufficient — no presentation requirement.
  • Reliable internet access The adjudication interface is browser-based. No software installation. Standard modern browser required.
  • Availability for 1-hour onboarding Conducted by video call before your first batch. Weekday evenings and weekend slots available across multiple time zones.

Preferred qualifications

  • Prior authorization experience Direct experience submitting, appealing, or peer-reviewing PA requests — as a clinician, peer reviewer, or medical director.
  • Medicare Advantage familiarity Working knowledge of UnitedHealthcare, BCBS, or other MA plan coverage policies for imaging, DME, or specialty referrals.
  • Clinical research background Experience as an endpoint adjudicator or in clinical trial data review. ICH-GCP or CITI training is a plus but not required.
  • Interest in healthcare AI Engagement with AI-in-medicine literature or policy. No technical AI background required and no AI evaluation skills assumed.

What you receive

Recognition & professional benefits

We are transparent about what participation involves — and what it offers. No monetary compensation is provided. The benefits below are professional and recognitional.

i.

Named acknowledgment in the manuscript

All adjudicators are named in the supplementary acknowledgments of the primary NEJM AI manuscript. Adjudicators meeting ICMJE criteria (substantial contribution to data acquisition, drafting or revising, final approval, accountability) are eligible for co-authorship — discussed individually with the PI before submission.

ii.

Structured training in Medicare PA criteria

The onboarding session and adjudication rubric provide expert-guided exposure to CMS LCD/NCD criteria, UnitedHealthcare MA policy, and BCBS MA coverage standards across imaging, DME, Part B drugs, specialty referrals, and PT/OT/home health.

iii.

Frontline view of clinical AI in practice

You see first-hand how agentic AI, an unassisted LLM, and a FHIR-standard rule-based system handle real-world PA complexity — before findings are published anywhere. A substantive, early perspective on technology actively reshaping primary-care workflows.

iv.

Documented adjudicator service

Upon study completion, the PI provides a formal letter documenting adjudicator service for academic CVs, promotion files, and clinical research portfolios. Useful for institutional research credit.

v.

Continued collaboration in Study 1b

PACE-AI is a four-study program. Study 1b — a four-arm extension adding human clinic-staff comparators under a full human-subjects regulatory pathway — uses the same adjudication infrastructure. Study 1 adjudicators are prioritized for continued involvement.

vi.

GCP-aligned, IRB-overseen process

The study operates under an IRB protocol (pending approval) and follows ICH-GCP principles for data handling and adjudication. A data-handling agreement is provided before onboarding. Your participation is formally documented and auditable.

On compensation. Study 1 does not provide monetary compensation to adjudicators. Anthropic API costs for Arms A and B are funded separately by the sponsoring nonprofit; Anthropic has no role in study design, conduct, analysis, or publication decisions. We are stating this openly because we believe physicians evaluating whether to volunteer their expertise deserve a clear picture of the arrangement before they apply.

Application Process

From application to first batch

We aim to complete screening within five business days of application and have all adjudicators trained before the case bank is finalized.

1

Submit your expression of interest

Complete the form below. The PI reviews every application personally — no algorithmic screening.

5–10 minutes
2

Eligibility review & reply

If you meet criteria, you receive a personal email from the PI with a scheduling link for the onboarding call. Otherwise we explain why and, where appropriate, suggest a future opportunity.

Within 5 business days
3

Data-handling agreement

A brief data-handling agreement is sent for e-signature. The patient records you review are fully synthetic — Synthea-generated, not de-identified real records — but this agreement is a protective formality consistent with GCP standards.

~10 minutes · e-sign
4

One-hour onboarding session

A video call with the PI covering: the adjudication rubric, payer criteria reference materials, web-interface navigation, and the 10-case calibration run. Inter-rater reliability is computed after calibration.

1 hour · Zoom or Google Meet
5

Primary adjudication begins

Once κ ≥ 0.70 is confirmed, your queue is populated with the first batch of 100 cases. Most adjudicators complete all batches in 2–3 weeks of part-time work.

Months 3–4 of the study

Frequently Asked Questions

Questions we hear most

Is the patient data real?

No. All cases use fully synthetic patient records generated with the Synthea open-source patient simulation platform. There is no real patient data anywhere in the case bank. The synthetic records are designed to be clinically plausible for Medicare-aged primary-care patients, not to represent any real individual.

Do I need to know about AI to participate?

No. Your role is to adjudicate the underlying PA cases using clinical judgment and payer coverage criteria — exactly as you would in clinical practice. You are not evaluating AI outputs directly. The three arms are applied separately and blinded to your adjudication.

What specialties qualify?

The protocol specifies board-certified Family Medicine or Internal Medicine, aligning with the Medicare primary-care PA case population. Physicians with dual certification or a closely related primary board (e.g., geriatrics or general internal medicine subspecialties with FM/IM primary boards) are encouraged to apply and will be reviewed individually.

What if I disagree with my co-adjudicator?

Disagreements are expected and built into the protocol. When two adjudicators reach different determinations, the case is automatically flagged for a third-adjudicator tiebreaker. You are never asked to change your rating based on what another adjudicator decided. Independence is the entire scientific point.

Will I be an author on the paper?

Authorship follows ICMJE criteria: substantial contribution to data acquisition or analysis, drafting or critical revision, final approval, and accountability. Adjudicators meeting these criteria are eligible for co-authorship and are discussed individually with the PI before submission. All adjudicators are named in the supplementary acknowledgments regardless of authorship status.

Is there any compensation?

No. The study is funded by a sponsoring nonprofit, and Anthropic API costs for Arms A and B are covered separately. We are transparent about this: what we offer is professional recognition, structured training in Medicare PA criteria, early access to landmark findings, and the opportunity to contribute to research that addresses one of primary care's most pressing administrative burdens.

What if my κ doesn't reach 0.70 during calibration?

The calibration threshold exists to ensure adjudication quality, not to screen out good physicians. If κ is below 0.70 after the 10-case pilot, the PI schedules a brief discussion to review discrepant cases and refine the rubric — calibration is then repeated. The rubric is intentionally adjustable at this stage.

Can I withdraw after I start?

Yes. Participation is entirely voluntary. If you withdraw, notify the PI; your completed cases are retained with your consent or removed from the dataset if not. We ask only that you communicate promptly so a replacement adjudicator can be onboarded if needed.

What is Anthropic's role in the study?

Anthropic API services are used for Arms A and B (the agentic AI arm and the unassisted-LLM arm both use Anthropic's claude-haiku-4-5-20251001). Anthropic has no role in study design, conduct, analysis, or publication decisions. API costs are funded by the sponsoring nonprofit. This is disclosed in full in the manuscript at submission.

Apply

Expression of interest

Complete the form below. The PI reviews every application personally and replies within five business days. Thorough answers in the specialty and motivation fields significantly speed up screening.

Personal information

Medical credentials

Study fit & availability

Information submitted here is used solely for adjudicator screening and is not shared with any third party. Stored on a secure server consistent with GCP data-handling standards.