Methodology

How we score clinical AI

We test each tool against a fixed benchmark of clinical questions, score five dimensions independently, and publish every change. Our methodology is open so you can decide whether to trust our conclusions.

The five dimensions

01 — Accuracy

Each tool answers the same 50-question benchmark covering common bedside presentations, drug interactions, guideline-driven decisions, and rare-but-important diagnoses. Answers are graded against the current primary source (guideline, label, or systematic review) on a 0–10 scale. We re-run the benchmark each quarter to catch model drift.

02 — Citation quality

Does the tool show you where the answer came from, in a form you can verify? We score for: presence of citations, traceability to a primary source, recency, and the fraction of claims that are actually grounded versus generated. Tools that “sound right” without a source ceiling at 5.

03 — Accessibility

Who can actually use it? Cost, geographic restrictions, professional-verification gates, and institutional access requirements all reduce the score. A tool that’s only usable by US-verified physicians cannot score above 6 here, no matter how good it is.

04 — Speed

Time-to-useful-answer from a cold start, measured on a standard residential connection. Bedside utility depends on this. Tools that take more than 10 seconds for a typical query ceiling at 7.

05 — Language support

Quality of clinical output in non-English languages, evaluated by native-speaking clinicians where possible. Tools without confirmed multilingual support are marked ”—” rather than zero, and excluded from the language component of the overall.

What we don’t do

  • We don’t accept payment, free credits, or beta access in exchange for coverage.
  • We don’t let vendors review scores before publication.
  • We don’t change rankings retroactively without publishing what changed and why.

Conflicts and disclosures

Where any editorial team member has a prior relationship with a vendor we cover, that relationship is disclosed on our disclosures page and the team member is recused from scoring that tool. Where a tool is referenced as a “top pick” or “highlight,” that placement must be defensible from the published scoring rubric — it is never editorial preference alone.

Updates and corrections

The Index is refreshed quarterly. When a score changes, the change is logged with the date and reason. If we get a fact wrong, we publish a correction on the affected article and update our editorial policy log. Email editorial@theaugmentedclinician.com with corrections.