“What is the fate of medicine in the time of artificial intelligence (AI)?
Our fate is to change”
-The Lancet, 392, 2331, 2018
Physicians have been evolving their method of summarizing their hospitalized patients’ care in the paper medical record as a “discharge summary” for decades. Initially to do so, they sat in a tiny cubicle in the Cape Cod Hospital Medical Records Department and talked into a dictaphone machine. The “transcribers” of the tape, either down the hall or across town, would type it out and mail it back to the physician for proofing days or maybe weeks later. It was not a paper work job cherished by physicians. They needed to be encouraged constantly to “complete their records”. Hence, the wall above the cubicles had a poster of a monkey sitting on a toilet entangled in toilet paper with the caption: “The job is not finished until the paper work is done.”
Eventually some of the specialty surgeons (urologists, ophthalmologists), who were performing basically the same procedure every day, recorded templates of their procedures and just dictated individual patient information (“number of grams of tissue removed”, “right or left eye”) to shorten the task. Of course, they almost always ended the report with “no complications occurred”. When the internet appeared, physicians would dictate to a transcriber in India, and the process somehow got shorter. Later, some physicians brought a “transcriber” into their exam rooms. A specially trained human, a “medical scribe” (sometimes a moonlighting medical student), wrote down the physician’s words as he examined the patient, not exactly like a court reporter but pretty close. The next step replaced the human scribe with a recording device that recorded what the physician said, or even just recorded the spoken unique data for that patient, ( “angle of knee flexion is . . .”, “no hip joint tenderness noted . . . , etc,”) to a template preloaded into the device by the physician. It was printed out right then for the medical record. Now, there is artificial intelligence (AI) that can generate narrative text without any need for dictation or recording.
Everyone is impressed by the potential of AI programs (a large bunch of computer codes based on math calculations) with their ability to scan huge medical data bases of texts, numbers, and images, “learn” correlations, patterns, associations., and generate a relevant text narrative. This scanning and “learning” is referred to as “training the AI”. Medical researchers immediately began exploring AI as a diagnostic tool and specialized treatment planner for physicians. Some enthusiastic AI pioneers even suggested that AI programs could substitute for (“replace”) physicians for diagnosis and treatment decisions. There are multiple, nay a huge number, of studies looking at how well AI generates diagnostic decisions and treatment plans for a whole variety of clinical specialities.
How is that going?
Accuracy of AI generated diagnoses
A recent Mega-analysis of eighty three (83) studies of the use of 30 different diagnostic AI models (programs) in healthcare in 17 clinical specialities ( general medicine, radiology, etc.) were reviewed to validate their accuracy in generating a diagnosis for a patient based on their medical data. The most frequent AI model (program) in this study was GPT-4, but there were THIRTY different AI models identified and tested; that is thirty different computer programs that no one is really clear about how any one of them actually works.
The overall diagnostic accuracy of the AI programs was 52%. No one AI model was significantly more accurate than another. Physician-based diagnostic accuracy was close to 70% in the same study. “Although our meta-analysis shows no significant difference between the pooled accuracy of models and that of physicians, recent research demonstrates that generative AI may perform significantly worse than physicians in more complex scenarios where models are provided with detailed information from electronic health records. Generative AI models are not yet reliable substitutes for expert physicians but may serve as valuable aids in non-expert scenarios and as educational tools for medical trainees.”
One research area of particular interest is how AI might help physicians do their essential completion of hospitalized patients’ medical records (remember “discharge summaries” above?). With almost all of an inpatient’s data ( doctors’ and nurses’ notes, lab results, image reports, treatment plans, medication lists, referrals, follow-up appointments, etc.) in the EMR (electronic medical record) and with most offices and clinics fully implemented with EMR, why can’t AI generate discharge summaries?
How is that going?
AI generated discharge summaries for inpatients
“High-quality discharge summaries are associated with improved patient outcomes, but contribute to clinical documentation burden. Large language models (LLMs) of AI provide an opportunity to support physicians by drafting discharge summary narratives.”
In one study all the hospital care of 100 inpatients with stays of 3-6 days cared for by hospitalist physicians were scanned by an AI program to generate a discharge summary in narrative text. The AI discharge summary was then compared to the usual discharge summary dictated by the hospitalist by 22 attending physicians for overall quality, reviewer preference, comprehensiveness, concision, coherence, and 3 error types (inaccuracies, omissions, and hallucinations). “Hallucinations” is the apt AI industry term for unintentional misinformation or made-up data by the AI program. You have probably already seen a hallucination like six fingers on a hand or an extra arm behind a back in an AI generated image.
AI discharge summaries were more concise and coherent, but less comprehensive than the physician ones. Both had about the same low number of individual errors, almost all errors had a very low potential for causing harm. There was no difference between AI and physician potential harm scores, but one of the 100 AI summaries had an error with a potential permanent harm score of 4 (out of 7). “AI-generated discharge summary narratives were of comparable quality, and were preferred equally, to those generated by physicians. AI-generated narratives were more likely to contain errors but had low overall harmfulness scores. These results suggest that, in clinical practice, using such narratives after human review may provide a viable option for hospitalists.”
Translate discharge summaries into patient friendly language
The physician-generated discharge summaries of fifty general medicine 65 yo. inpatients were translated by an AI program into “patient-friendly language”. The translations were reviewed by two physicians and over half were determined to be more readable and more understandable than physician generated ones in the EMR. “However, implementation will require improvements in accuracy, completeness, and safety. Given the safety concerns, initial implementation will require physician review.”
AI bias
One of the concerns on reliance of AI generated medical narratives is the presence of unintended, hidden bias based on the training of the AI program. Since there are multiple AI models , and since few of them explain their methods of AI training, inherent hidden bias is a real concern. For instance,“ [Early (2021)]AI algorithms used health care costs as a proxy for health needs and falsely concluded that Black patients are healthier than equally sick white patients, as less money was spent on them.”
Conclusion: It is obvious from these studies that generative AI is currently just one tool for the physician (or health care provider; I didn’t even search for studies of AI use by nurse practitioners and physician assistants). AI is not a replacement of the provider nor a trusted substitute for the patient seeking care, and AI narrative is safe only after human review. AI infrastructure requires a lot of physical space and an enormous amount of electricity, so it costs a lot. We will soon see how expansion of AI capability may compete for our healthcare dollar like EMR does, and if generative AI provides an equivalent benefits to our medical outcomes and to our health care processes.
Bonus Tip: If you want to try out a free AI model (like Google’s) to see what AI generated narratives look like, do so, and then be like a patient and go for a second opinion to Perplexity. Comparing the two AI generated narratives for any differences might suggest an AI model bias. As all the AI usage studies indicate, “User beware”.

Posted by hubslist