Ontario’s Auditor General found this month that AI note-taking systems approved for doctors were routinely getting basic clinical facts wrong, raising a fairly awkward question for a program meant to save time without adding risk.
In tests of 20 approved AI scribe systems, auditors found that 12 inserted the wrong drug information into notes, 17 missed key mental-health details, and 9 fabricated information or suggested treatment-plan changes that were never discussed.
The findings come from the Office of the Auditor General of Ontario’s May 12 report on AI use in provincial public services, which examined the Ministry of Health’s AI Scribe program for physicians, nurse practitioners and other clinicians. Supply Ontario launched the vendor-of-record arrangement in April 2025, saying approved products met provincial requirements for clinical functions, data security and privacy.
Auditors assessed the systems using two simulated doctor-patient recordings. Medical professionals then reviewed the recordings against the AI-generated notes to judge accuracy. This is about as close to a basic sanity check as procurement gets, and a notable share of the tools still failed it.
“AI may sometimes produce false or inaccurate content,” the College of Physicians and Surgeons of Ontario says in its guidance to doctors, adding that physicians must review all AI-generated information for accuracy and completeness.
That warning looks less like boilerplate after the audit. OntarioMD, which supported the procurement process, has also recommended manual review of AI notes, but auditors found there was no mandatory attestation feature in approved systems to confirm that a clinician had actually checked the output.
How the AI Scribe program was evaluated
The sharper finding was not just that the tools made mistakes, but how little note accuracy appeared to matter in the scoring. The Auditor General’s report, as described by The Register and Ars Technica, said 30 percent of a vendor’s evaluation score depended on whether it had a domestic presence in Ontario, while medical-note accuracy counted for only 4 percent.
Other safeguards were weighted lightly too. Bias controls counted for 2 percent of the score, while threat, risk and privacy assessments counted for another 2 percent, with SOC 2 Type 2 compliance adding 4 percent. If you were trying to design a process that optimized for looking institutionally comfortable rather than being correct, this would at least be competitive.
That scoring sat inside a larger provincial rollout. Supply Ontario’s vendor page says the AI Scribe arrangement runs from April 27, 2025 to April 27, 2028, and the current roster lists 25 qualified vendors. A June 2025 launch notice said the program was intended to reduce administrative burden and give clinicians more time with patients.
The attraction is not hard to see. An OntarioMD evaluation report from Women’s College Hospital and partners said AI scribes cut documentation time during encounters by 69.5 percent in lab settings and reduced after-hours administrative work by about three hours per week in routine practice among more than 150 primary care providers. The problem is that saving time only helps if the saved time is not later spent catching invented medications.
A related problem, and one well known with large language models, is that errors often arrive in the tone of confidence rather than uncertainty. In healthcare, that is not a cosmetic issue. As we wrote recently, AI in healthcare systems can shift trust rather than remove it: someone still has to absorb the risk when the machine is wrong.
What the province said about the rollout
The Ontario Ministry of Health said, in comments reported by The Register via CBC, that more than 5,000 physicians are participating in the AI Scribe program and that there were no known reports of patient harms linked to it so far.
That is a meaningful limitation on what the audit shows. The exercise used simulated conversations rather than live patient encounters, and the publicly reported findings do not establish that the documented errors caused harm in practice. They do show that approved systems were capable of making clinically important mistakes before reaching routine use.
The next factual milestone is likely to be the province’s formal response to the Auditor General’s recommendations and any changes to the vendor-of-record criteria before the current arrangement expires in April 2028.
Key Takeaways
- Ontario’s Auditor General found major factual errors in tests of 20 approved AI scribe systems.
- Twelve of 20 systems inserted wrong drug information, 17 missed key mental-health details, and 9 fabricated information or treatment suggestions.
- The procurement scoring reportedly gave 30 percent of the total to Ontario presence and only 4 percent to medical-note accuracy.
- Supply Ontario’s AI Scribe vendor-of-record arrangement runs from April 2025 to April 2028 and currently lists 25 qualified vendors.
- Ontario’s medical regulator tells physicians they remain responsible for reviewing all AI-generated notes for accuracy and completeness.
Further Reading
- Sick and wrong: Ontario auditors find doctors’ AI note takers routinely blow basic facts, The Register’s summary of the audit findings on errors, hallucinations, and procurement scoring.
- Artificial Intelligence Solutions-AI Scribe Vendor of Record, Supply Ontario’s vendor-of-record page with dates and the current vendor list.
- Enhancing patient care with Ontario AI Scribe VOR arrangement launch, Supply Ontario’s launch announcement for the program.
- Using Artificial Intelligence in Clinical Practice, CPSO guidance warning that AI-generated clinical content can be false or inaccurate.
TOPIC VOCABULARY (from the research brief, may inform your keyword choice, but the article body is authoritative):
AI scribes, Ontario audit, medical note accuracy, healthcare transcription, clinical documentation
