How we score clinical AI
We test each tool against a fixed benchmark of clinical questions, score five dimensions independently, and publish every change. Our methodology is open so you can decide whether to trust our conclusions.
The five dimensions
01 — Accuracy
Each tool answers the same 50-question benchmark covering common bedside presentations, drug interactions, guideline-driven decisions, and rare-but-important diagnoses. Answers are graded against the current primary source (guideline, label, or systematic review) on a 0–10 scale. We re-run the benchmark each quarter to catch model drift.
02 — Citation quality
Does the tool show you where the answer came from, in a form you can verify? We score for: presence of citations, traceability to a primary source, recency, and the fraction of claims that are actually grounded versus generated. Tools that “sound right” without a source ceiling at 5.
03 — Accessibility
Who can actually use it? Cost, geographic restrictions, professional-verification gates, and institutional access requirements all reduce the score. A tool that’s only usable by US-verified physicians cannot score above 6 here, no matter how good it is.
04 — Speed
Time-to-useful-answer from a cold start, measured on a standard residential connection. Bedside utility depends on this. Tools that take more than 10 seconds for a typical query ceiling at 7.
05 — Language support
Quality of clinical output in non-English languages, evaluated by native-speaking clinicians where possible. Tools without confirmed multilingual support are marked ”—” rather than zero, and excluded from the language component of the overall.
What we don’t do
- We don’t accept payment, free credits, or beta access in exchange for coverage.
- We don’t let vendors review scores before publication.
- We don’t change rankings retroactively without publishing what changed and why.
Conflicts and disclosures
Where any editorial team member has a prior relationship with a vendor we cover, that relationship is disclosed on our disclosures page and the team member is recused from scoring that tool. Where a tool is referenced as a “top pick” or “highlight,” that placement must be defensible from the published scoring rubric — it is never editorial preference alone.
Updates and corrections
The Index is refreshed quarterly. When a score changes, the change is logged with the date and reason. If we get a fact wrong, we publish a correction on the affected article and update our editorial policy log. Email editorial@theaugmentedclinician.com with corrections.