AiNews.com
Posts
OpenAI's Whisper AI Has Hallucination Issues in Medical Transcriptions

OpenAI's Whisper AI Has Hallucination Issues in Medical Transcriptions

Alicia Shapiro
October 28, 2024 • Estimated Reading Time: 4 minutes

A medical examination room scene with a doctor using a tablet equipped with an AI-powered transcription tool to document a patient conversation. The doctor, seated next to an attentive patient, is holding the tablet, which displays mixed text with both accurate medical notes and occasional AI-generated errors, symbolizing the concept of 'hallucinations' in transcription. The room includes standard medical furnishings, like an examination table, storage cabinets, and essential equipment, creating a professional and technologically advanced atmosphere that highlights the integration of AI in healthcare documentation

Image Source: ChatGPT-4o

OpenAI's Whisper AI Has Hallucination Issues in Medical Transcriptions

OpenAI’s transcription model, Whisper, is increasingly used in the healthcare industry to transcribe and summarize patient meetings, boasting over 7 million medical conversations transcribed. Nabla, a company utilizing Whisper’s technology, reports that more than 30,000 clinicians across 40 health systems depend on the tool for documenting patient interactions. However, researchers and healthcare professionals have raised concerns about Whisper’s reliability, specifically noting its tendency to produce "hallucinations" — or fabricated passages — especially during moments of silence in recordings.

Research Exposes Whisper's Hallucination Problem

A study led by researchers from Cornell University, the University of Washington, and others, highlighted Whisper’s hallucination issue, finding that the model produced entire fabricated sentences in about 1% of transcriptions. These hallucinations included phrases entirely irrelevant to the context, with some containing violent or nonsensical statements. The researchers collected audio samples from TalkBank’s AphasiaBank, a resource for studying language disorders like aphasia, where moments of silence are common. During these pauses, Whisper was noted to “invent” content unrelated to the actual conversation.

Allison Koenecke from Cornell University, a researcher involved in the study, shared specific examples, revealing some of Whisper’s fabricated outputs. These hallucinations included imagined medical conditions and phrases like “Thank you for watching!” — language more typical of YouTube videos than medical dialogue, possibly influenced by Whisper’s exposure to over a million hours of YouTube transcriptions during training for OpenAI’s GPT-4.

Whisper's Ongoing Use and Nabla’s Response

Despite these challenges, Nabla continues to implement Whisper in medical settings and has acknowledged the hallucination issue, stating they are actively “addressing the problem.” OpenAI, meanwhile, has taken steps to manage Whisper’s use in sensitive contexts, with spokesperson Taya Christianson stating that they are committed to reducing hallucinations and have restricted Whisper's use for high-stakes decision-making on their API platform.

OpenAI expressed appreciation for researchers who brought attention to the model's limitations and emphasized ongoing efforts to mitigate these issues.

About the Study

It is not clear if the study, which was presented at the Association for Computing Machinery FAccT conference in Brazil in June, has been peer-reviewed, adding to ongoing conversations about the oversight and accountability needed in AI applications within healthcare.

Future Considerations for AI in Medical Transcription

The Whisper model’s challenges highlight important considerations for AI’s role in healthcare, where accuracy is paramount. While Whisper offers promising advancements in transcription efficiency, the hallucination issue emphasizes the need for stringent safeguards and ongoing research. OpenAI and Nabla are working to address these concerns, aiming to reduce inaccuracies in high-stakes settings like medicine. As AI tools continue to evolve, their responsible use in sensitive fields will be essential for building trust and ensuring that these technologies genuinely support healthcare professionals in delivering accurate, reliable patient care.