It’s funny how, in the span of maybe two or three years, artificial intelligence exploded and is suddenly everywhere. In the beginning, studies of AI’s abilities were singular and appeared in journals at a gradual pace. Now, however, reports of AI’s latest triumphs are coming so fast, you need an AI to keep up with them.

A study from the University of Cambridge in the U.K. gave mock ophthalmology exam questions to several AI systems, expert ophthalmologists and trainees. The answers were graded by a masked panel of ophthalmologists. They found that the AI GPT-4 outperformed the trainees, and even compared favorably to the expert ophthalmologists.1

Another study from Mt. Sinai in New York City tested an AI and fellowship-trained ophthalmologists on a set of ophthalmological questions and patient cases from the realms of glaucoma and retina. The researchers found that ophthalmologists rated the machine’s accuracy and completeness higher than the physicians’.2

In medicine in general, Google’s medical-oriented AI, Med-PaLM 2, recently became the first artificial intelligence to rank as an “expert” in performance on a MedQA dataset of U.S. Medical Licensing Exam-style questions.3 It achieved an accuracy of more than 85 percent. It also was the first AI to score a 72.3 percent on Indian AIIMS and NEET medical examination questions.3

So, AI is everywhere, including in the popular document creation/editing software Adobe Acrobat in the form of an “AI Assistant.” Since Acrobat is one of the programs we use extensively here at Review, the staff naturally wanted to test the AI Assistant to see what it could do.

One of the Assistant's functions is it allows you to “feed” it a journal article and it'll summarize it for you, as well as answer queries about it. Perfect! Who wouldn’t want a quicker way to digest all the data we’re bombarded with daily?

So, with visions of the Supreme Intelligence that’s beating physicians all over the place, our editors fed some articles to the AI Assistant—and got a reality check. 

Though some results were useful, it also did things like refer to a treatment in the “subconscious intellispace,” rather than the subconjunctival space. The errors it makes are subtle, as one editor put it, so you have to go through them line by line to check for accuracy. (You might as well just summarize the article yourself at that point.) “Sometimes the AI extrapolates too much from a single sentence it’s identified as being important,” she said. “It writes very well, so it’s easy to read its output and believe it’s true.”

Let these words ring in your ears the next time you’re analyzing the output of a chatbot, Large Language Model or other AI system. As the saying goes: “Trust—but verify.”

— Walter Bethke
Editor in Chief

 

1. Thirunavukarasu AJ, Mahmood S, Malem A, Foster WP, Sanghera R, Hassan R, et al. Large language models approach expert-level clinical knowledge and reasoning in ophthalmology: A head-to-head cross-sectional study. PLOS Digit Health 2024;3:4: e0000341. https://doi.org/10.1371/journal.pdig.0000341.

2. Huang AS, Hirabayashi K, Barna L, et al. Assessment of a large language model’s responses to questions and cases about glaucoma and retina management. JAMA Ophthalmol 2024;142:4:371-375.

3. Gupta A, Waldron A. A responsible path to generative AI in healthcare [online article]. April 13, 2023. https://cloud.google.com/blog/topics/healthcare-life-sciences/sharing-google-med-palm-2-medical-large-language-model. Accessed April 19, 2024.