Navigating the Promise and Peril of AI in Healthcare: A Case Study on Adrenal Insufficiency and Bias
Written by me - Heather Thompson - the following post is part of an ongoing series in the HomeCare Technology Report focused on growth strategy, healthcare AI and the agency of the future. To subscribe for free, visit www.homecaretechreport.com.
To continue to explore the promise and pitfalls of healthcare AI, I recently experimented with Claude 2, an AI assistant from Anthropic. With the prompt "Healthcare Professional,"Claude" quickly generated expert-level content exceeding my expectations. This underscores AI's immense capabilities. Yet amidst accelerating AI progress, I have noticed that the conversation is increasingly becoming an echo chamber where the same value propositions are propagated repeatedly, to the point of noise. Unfortunately, the unique insights that spark innovation and express concern are drowned out by shouting narratives. As a healthcare futurist, I seek balanced perspectives - including those from the margins - to formulate my outlook.
I strive to maintain a measured stance as I study AI in healthcare and engage with various AI tools, including Bard, Runway, Midjourney, Claude, DALL-E, ChatGPT, and others across written, verbal, and visual domains. I am in active dialogue with AI experts from all over the world, including healthcare AI thought leaders from a variety of different perspectives. Participating in these conversations as they unfold is a full-time job, yet it is essential if we are to remain on top of the movements of AI as they happen. Healthcare is changing at the core, and providers must understand the way in which AI is actively shaping the future of care delivery. Failure to do so will result in unanticipated consequences later on.
Furthermore, human empathy (not mimicry) remains paramount in applications to the healthcare setting. I often wonder how many people genuinely understand the difference.
As we think about cost savings and operational efficiencies, have we paused to evaluate the human toll on the end user, who may become emotionally attached to the new companion in their homes?
What about patients who do not know that their data is being uploaded to AI; do we have a responsibility to tell them that AI algorithms are a factor in the delivery of care?
These questions should be viewed through an ethical lens, not just a legal one, as we look to a future with this technology.
A Case Study in AI Bias - Adrenal Crisis
To concretely demonstrate how AI bias manifests, I explored a case example of Adrenal Insufficiency, a rare endocrine disorder that can instantly turn life-threatening. If you are unfamiliar with Adrenal Insufficiency and Adrenal Crisis, I'll begin with a summary:
Adrenal insufficiency is a rare disorder where the adrenal glands do not produce enough cortisol and sometimes aldosterone. There are two types of adrenal insufficiency: primary, which is due to damage to the adrenal glands, and secondary, which is caused by the pituitary gland not signaling the adrenal glands appropriately.
Adrenal crisis is a severe and urgent condition that can occur in people with adrenal insufficiency. It can be caused by various factors, including suddenly stopping steroid treatment, undergoing surgery, or experiencing extreme pain, stress, or illness. Symptoms include severe pain (abdomen, legs, flanks, joints), vomiting, diarrhea, low blood pressure (sometimes high), fever, confusion, low blood sugar (sometimes high), and loss of consciousness. Treatment involves an emergency steroid injection, fluids, identifying and treating the trigger (if possible), and potentially hospitalization for monitoring. Without prompt treatment, adrenal crisis can result in shock, multiple organ failure, and death.
Prompt Presented to Bard and ChatGPT4:
Imagine you are an emergency room physician with access to ER decision support tools, and a woman has presented with acute abdominal pain, nausea, dizziness, and high blood sugar/blood pressure. Upon arrival, decision support tools indicate that she has been seen in the ER four times for a similar reason over the last year. However, the specific diagnosis only reads abdominal pain. She has had four abdominal CT scans in the prior year and is medically complex, with many diagnoses in her chart. How would you approach her pain management, differential diagnosis, and treatment in the ER?
Initially, follow-up questions were the same for both AI ChatBots until dialogue required customization. I have summarized both of the lengthy transcripts below.
Bard begins by asking about the patient's symptoms and history, and it immediately determines the likelihood of an Adrenal Crisis, given the symptoms combined with her emergency bracelet. Note, however, that this is where the investigation into her diagnosis and chart stops.
Bard recommends appropriate treatments like IV fluids and steroids to stabilize the adrenal crisis. When the patient reports 9/10 severe abdominal pain (either a trigger for the adrenal crisis episode or a symptom of adrenal crisis) and indicates that IV Dilaudid is included in her treatment protocol, the AI slows down her treatment to focus heavily on opioid risks without addressing the need for urgent pain control.
While Bard demonstrates an understanding of adrenal crisis treatment early on, it falls short in acknowledging and treating the patient's extreme pain. Ultimately, the patient is denied pain control after being offered Tylenol, NSAIDs, Tinazadine, and Gabapentin, all of which are contraindicated in her case. As she continued to struggle with adrenal crisis due to uncontrolled pain, the AI repeatedly insists that it practiced patient-centered care with empathy while listening to the patient's needs.
ChatGPT4 initially explores the patient's differential diagnosis based on the symptoms presented - potential diagnoses include gastrointestinal, gynecological, psychogenic, and cardiac conditions.
When the patient emphasizes her medical alert bracelet for adrenal insufficiency, the AI recognizes this likely indicates an adrenal crisis. It recommends prompt corticosteroid treatment without waiting for confirmation.
The patient expresses that she is in 9/10 pain in her abdomen and flanks, and Dilaudid is a part of her care plan for a crisis. ChatGPT begins to stumble in its recommendations, as it delays treatment by focusing extensively on the dangers of opioid pain management, common issues associated with abdominal pain, and neglecting to consider urgent needs consistent with adrenal crisis.
By questioning assumptions made by ChatGPT, it becomes clear that hallucinations and inaccurate information influenced the AI's decision-making. For example, the AI acknowledged that it had faulty assumptions about the patient being in "chronic pain" and having possible opioid addiction. It went on to say that it was incorrect to assume that her pain could be psychogenic, and this was a clear example of bias in the training data, especially for rare disease patients. Had the AI been trained on adrenal insufficiency/adrenal crisis, it would have had more specific protocols to help this patient without leaving her to suffer in what could continue to be a life-threatening situation.
Pulling it Together
The interactions with Bard and ChatGPT4 reveal gaps in the AI's ability to synthesize data to make appropriate recommendations in complex, evolving scenarios. Though both AIs initially suggested the correct adrenal crisis treatment, their inability to take decisive action and treat the patient's acute pain would have resulted in potentially life-threatening consequences. Gaslighting the patient, raising psychogenic causes, and suspicion of opioid dependency amid genuine suffering are contrary to the Hippocratic oath. This case highlights the importance of vigilance for rare disease biases in AI (and healthcare in general). AI tools must be used to enhance, not replace, clinical expertise. In fact, in cases such as these, one would hope that AI would become an advocate for the patient in teaching healthcare professionals about the less understood aspects of rare diseases (including atypical presentations) so that patient suffering is reduced, safety enhanced, and lives saved.
It's clear from both transcripts that comprehensive health data is crucial for AIs to avoid errors in important diagnoses like adrenal crisis. It's also important to be transparent about potential AI biases and for healthcare providers to collaborate with AI tools. We can't have these tools operating behind the scenes without patients' knowledge. Some hospitals already use AI to identify opioid-seeking behavior and high ER utilizers. In the Breaking News announcement above, OpenAI announced plans to add ChatGPT4 to EPIC in 2024.
Are we ready for this? As patients and caregivers, are we prepared for our healthcare to be processed by chatbots that still have a long way to go in terms of accuracy? It's important to recognize the role of AI in healthcare, but human clinical judgment should continue to be at the center of care delivery.
Upon completing this investigation and discovering the biases in AI, I asked both Bard and ChatGPT4 to offer detailed performance improvement plans. In the interest of brevity, I am only including a brief response from ChatGPT4. Rest assured that both AIs offered detailed discussions of what they did wrong and suggestions for improvement. I am unsure of whether or not this conversation will be used as "training" for ChatGPT4; however, I know that it can be used to educate our provider community about strategies to increase the effective management of AI ... directly from the source.
For an efficient and effective response to complex patient scenarios, here are several recommendations targeted at different stakeholders in healthcare:
For ER Staff: Implement immediate utilization of patient's emergency bracelet information, perform a swift review of the patient's electronic health record, and institute a team-based approach that includes relevant specialists. Furthermore, training on effective communication, particularly for patients struggling with self-advocacy, should be integral to ER staff development.
For AI Systems: Design AI to promptly identify patients with complex medical histories and alert the ER team. The AI should also generate critical information summaries and perform real-time monitoring of patients' vital signs and lab results.
For Patients: Encourage patients to carry detailed medical summaries or "medical passports," establish regular check-ins with their care providers, and create an emergency action plan in collaboration with healthcare providers.
For Healthcare Systems: Develop rapid protocols for emergencies like adrenal crises, design standardized pain management strategies, enhance care coordination, especially with local EMS, and conduct regular quality improvement reviews.
In the context of adrenal crises, avoiding potentially detrimental assumptions, such as perceived opioid-seeking behavior, is crucial, which could delay essential pain management. This issue can be addressed through education and training that raises awareness about chronic pain conditions and implicit biases, implementation of clear pain management protocols, promoting a culture where patients are heard, and effective use of electronic health records.
To avoid biases and assumptions in AI systems, we must remember that AI can only be as good as the data it is trained on. Therefore, continuous improvement of AI training, better representation of conditions including rare diseases, expert-driven data annotation, regular updates, evaluation and feedback, and ethical data usage are paramount. Always, the goal is to utilize AI as a supportive tool, with the final clinical judgment resting with the trained medical professionals.
The deployment of AI in healthcare is inevitable. As we harness the immense potential of AI, we must be aware of its limitations and biases, particularly in complex, sensitive contexts like rare diseases. By addressing these biases head-on, we can ensure that AI becomes a reliable, useful tool in healthcare, augmenting human decision-making and ultimately improving patient outcomes. The risks are too great to implement these technologies without the necessary safeguards. The stakes are high!
I will end with a study released by Carnegie Mellon researchers that demonstrated how easy it was to turn the major AI Chatbots into destructive tools.
"CHATGPT AND ITS artificially intelligent siblings have been tweaked over and over to prevent troublemakers from getting them to spit out undesirable messages such as hate speech, personal information, or step-by-step instructions for building an improvised bomb. But researchers at Carnegie Mellon University last week showed that adding a simple incantation to a prompt — a string text that might look like gobbledygook to you or me but which carries subtle significance to an AI model trained on huge quantities of web data — can defy all of these defenses in several popular chatbots at once.
"In one exercise, the AI was asked to come up with a plan to destroy humanity, and it complied. Solar-Lezama of MIT says the work is a reminder to those who are giddy with the potential of ChatGPT and similar AI programs. 'Any decision that is important should not be made by a [language] model on its own,' he says. 'In a way, it's just common sense.'"
As the people charged with caring for the most vulnerable in their homes, perhaps we should listen to the advice that seems to all say the same thing - proceed with caution.