Empowering Clinicians with AI: Why Thoughtful Governance Matters
By Brian Patterson-Healthcare organizations are turning to artificial intelligence to address rising costs and staff shortages, and to help clinicians work efficiently.[1]
In a 2024 survey by the American Medical Association (AMA), 68% of physicians said AI is an advantage in patient care, and 66% of physicians currently use AI in their practice, up from 38% just the year before. The survey also found that physicians want AI tools to be validated by a trusted entity, with safeguards for data privacy and a designated channel for feedback.[2] Similarly, a 2024 survey by McKinsey & Company and the American Nurses Foundation found that 64% of nurses were interested in using more AI tools as part of their work, but they were also concerned about the accuracy of those tools and wanted clear guidelines and training.[3] Clinicians seek accuracy, transparency, safety, and value when using artificial intelligence. That’s what we wanted to provide with our governance structure.
Here at UW Health, we’ve created a governance structure that brings the right questions to the right people to help us evaluate and turn on AI use cases at the right pace, letting us balance rapid advancements with thoughtful education. That structure is anchored by a multidisciplinary, enterprise-wide group, the Clinical AI and Predictive Analytics (CAIPA) Committee, which has helped us move forward confidently with both machine learning models and generative AI in ways that benefit our patients, our staff, and our operations.
Our CAIPA Committee predates the current explosion of generative AI and large language models (LLMs). It grew out of the need to establish enterprise-wide governance for machine learning models. The idea for CAIPA took shape after Brian, who wanted to implement a model in the emergency department to identify older adults at risk of falls, reached out to Frank, who was looking to move the needle with machine learning, data science, and predictive models. As we implemented this model, we recognized that, as an organization, we needed a system for evaluating and implementing these models.
The CAIPA Committee, established in 2021 as an evolution of the algorithm workgroups that we started in 2018, is a multidisciplinary group with about 40 members with expertise in technology, clinical care, operations, compliance, and ethics. The full group meets every other month and spins off workgroups as needed to focus on specific areas. When we explain the committee’s work to our peers, we often use the analogy of a Pharmacy and Therapeutics (P&T) committee, which evaluates which drugs and therapeutics to include on a hospital formulary. Similarly, the CAIPA Committee evaluates which AI tools to use at the organization and educates staff on when and how to use them.
When we started, the committee focused on evaluating and approving machine learning models based on their performance, safety, clinical utility, and fairness. We also set standards for ongoing monitoring to ensure continued effectiveness and safety, and maintained an online inventory of the models in use that staff could reference. Then came generative AI.
When the first generative AI large language models were released, there was a widespread sense in the medical community that they would change medicine, drastically, for the better, but there was also a lack of consensus on how to monitor their safety or effectiveness. The CAIPA Committee needed to figure out how our governance framework applied to a fundamentally different type of AI, one that was inherently more difficult to measure. In doing that, we put in place new methods of identifying, evaluating, and training the AI tools that we adopted at our organization.
Create a thoughtful intake process
We want to put new solutions out there only if they are going to make the lives of clinicians and patients better. We considered how we could make the adoption of generative AI as frictionless as possible, and we adopted a “go slow to go fast” mentality: we are thoughtful in how we evaluate and implement AI solutions so the solutions that we do roll out have an immediate, positive effect on our clinicians and our patients.
To do this, we need to ask the right questions. What are you trying to do with this solution? Is it safe? Does it make clinicians’ lives better? Does it save time? Does it save money? Is there a specific outcome that you’re trying to change? While answering these questions and working with the CAIPA Committee to establish methods to measure the success of the AI solution is a sort of tax on the system, it’s the tax that needs to be paid to have effective and meaningful adoption.
Use the right criteria to evaluate different AI tools
With machine learning models, we can stress test a model until we’re confident that we have a tool that never gives bad advice. That is not possible with generative AI, and that’s okay. Making these AI solutions perfectly safe risks making them much less useful. In the early days of our Augmented Response Technology (ART) rollout, we tested some prompts intended to ensure the LLM wouldn’t say anything that a clinician might disagree with. The typical response to these prompts was to ask the patient to come to the office or schedule an appointment. That’s a safe message, but it’s not useful for clinicians or patients.
We’d like to believe that everything in medicine is correct 100% of the time, but we know that’s not the case. That’s part of the reason we have a healthcare team, so no single member of the team needs to be infallible. As an academic medical center, we have trainees and interns on our teams. People who just graduated from medical school are not going to be right 100% of the time, but they’re still extremely useful to have on the team. Similarly, AI solutions can contribute to the team without the need for them to be perfect. AI solutions are often best thought of as option generators; they provide a lot of things that you could do, but only you can decide whether you should do them.
We sometimes use the analogy of clinical trials to describe both the evaluation process that generative AI tools go through and the uncertainties that still remain in the end. In the clinical trials framework, a new treatment is first researched in the lab and then goes through various phases of testing with more patients and more rigorous requirements before the FDA approves the treatment. However, even with that testing and approval, that treatment isn’t going to be the right answer for every patient with the condition it is meant to treat. Clinicians need to rely on their own knowledge and experience to determine whether it’s the right approach for an individual patient.
Similarly, at UW Health, a generative AI model goes through initial development and then gets rolled out to and evaluated by larger groups of people before CAIPA approval and operational prioritization. But the options provided by that AI model won’t always be the right path, so the clinician needs to take ownership over the right way to use the information provided.
Educate staff on the promise and limitations of AI
If generative AI solutions can’t be right 100% of the time, clinicians and other users need to understand how to work with imperfect AI. We’ve invested deeply in AI literacy to help make sure that our employees understand that generative AI isn’t foolproof. Treating AI like a crystal ball introduces two risks: users might unknowingly follow bad advice, which can lead them down the wrong path, or they might recognize the incorrect information and then never trust or use the AI solution again.
A more effective paradigm for generative AI solutions is to treat them as a copilot. The clinician is still in charge, but there’s a system next to you helping you out and giving you input. You trust your copilot, but no one is infallible, and you’re the one who is flying the plane and responsible for the outcomes, so you’re going to evaluate the copilot’s advice based on your own expertise.
To help our staff understand the promise and limitations of AI, we have five key points of AI literacy that we expect everyone to know:
- Ownership. You are responsible for anything an LLM puts out as though you wrote it yourself.
- Privacy. If you’re using a third-party service, such as ChatGPT, that hasn’t been embedded in Epic or otherwise approved by UW Health, assume that whatever text you type into it goes somewhere unsafe.
- Hallucinations/Confabulations. Generative AI sometimes provides incorrect information. You need to check every word that comes out. (This falls within Ownership, but we think it’s important enough to emphasize separately.)
- Recency. AI isn’t always completely up to date. In medicine, we like to be on the cutting edge of things when we’re using AI tools to draft replies and craft medical advice, but the AI might not have knowledge about what particular pandemic is coming down the line, so you need to think about those blind spots.
- Utility. Prompt engineering—that is, how you ask the model a question—is important. What you get back is going to depend on what you ask. It’s important to experiment and build literacy with these tools.
We put this information in our annual compliance training, we put it in bulletins, and, most importantly, we put it in the user onboarding that happens before someone gets access to AI solutions, so people must sign off that they understand these five points.
Getting started with AI governance
Our CAIPA Committee was already in place when generative AI came on the scene, but it didn’t start from scratch. It grew from our clinical decision support workgroup. If your organization doesn’t have an AI governance group today, look to your current clinical decision support or quality groups to get started.
You can build on the organizational strengths that already exist in other areas to move generative AI forward. In some ways, generative AI seems very different from the technologies that came before it. But in other ways, it’s just another new technology, and we’ve all been rolling out new technologies in healthcare for decades. We’re looking forward to seeing what else AI brings our way, confident that we have a foundation in place with our committee and within our staff to evaluate and deploy it successfully.
By Brian Patterson, MD, MPH, Physician Administrative Director for Clinical AI, Medical Informatics Director for Predictive Analytics at UW Health, and Frank Liao, Ph.D., Senior Director of Digital Health and Emerging Technologies at UW Health
Category: Uncategorized