The current practice of medicine is incredibly biased — because its policies, procedures, technologies and people are all implicitly biased. Though there has been ongoing attention to explicitly biased individuals and processes in healthcare, there are also long-standing policies, procedures, and technologies that have ingrained implicit bias.
Recently, many have wondered if the introduction of artificial intelligence and machine learning (AI/ML) technologies in the healthcare setting will result in increased bias and harm. It is possible — when AI/ML solutions use inherently biased studies, policies or processes as inputs, the technology, of course, will serve biased outputs. However, AI/ML technology can be key in terms of making the practice of medicine more fair and equitable. When done right, AI/ML technology has the potential to greatly reduce bias in medicine by flagging insights or critical moments that a clinician might not see. In order to create technology that better serves at risk and underserved individuals and communities, technologists and healthcare organizations must actively work to minimize bias when creating and deploying AI/ML solutions. They can do so by leveraging the following three strategies:
Creating a checklist that evaluates potential sources of bias and what groups may be at risk for inequity,
Proactively evaluating models for bias and robustness ; and
Continuously monitoring results and outputs over time.
Understanding why healthcare is biased and the sources of bias
Bias enters healthcare in a variety of ways. Depending on the way medical instruments were developed, they may not account for a variety of races. For example, pulse oximetry is more likely to miss hypoxemia (as measured by arterial blood gas) in black patients than white patients. This is because pulse oximeters were developed and calibrated with light-skinned individuals; and since a pulse ox reads light passing through the skin, it’s not surprising that skin color could impact readings.
Policies and processes can also hold inherent bias. Many organizations prioritize patients for care management using models that predict a patient’s future cost based on the assumption that patients with the highest healthcare costs also have the greatest needs. The issue with this assumption is Black patients tend to generate lower healthcare costs than White patients with the same level of comorbidities, likely because they have more barriers to accessing health care. As a result, resources might be mis-allocated to patients with lower needs (but higher predicted cost).
Historical studies have also led to inequities in care. Interpretation of spirometry data (for lung capacity) creates unfairness because Black people are assumed to have 15% lower lung capacity than white people, and Asians are assumed to have 5% lower. These “correction factors” are based on historical studies that conflated average lung capacity with healthy lung capacity, without accounting for socioeconomic distinctions. Lung capacity tends to be reduced for individuals that live near roads, but this is correlated with disadvantaged ethnic groups.
These care disparities have a significant impact. For example, Sepsis, a condition which causes over 300,000 deaths per year, disproportionately impacts minority communities. According to the Sepsis Alliance, Black and Hispanic patients have a higher incidence of severe sepsis as compared to white patients; Black children are 30% more likely than white children to develop sepsis after surgery; and Black women have more than twice the risk of severe maternal sepsis as compared to white women.
For health systems, creating tools that actively work to combat these disparities in care isn’t a nice to have, but a mission critical must have. Health systems have a responsibility to provide equitable, safe care, and AI/ML technologies have the promise to help them do so.
What can be done to combat bias and promote equity in AI/ML technology?
Health organizations can implement these three strategies when launching AI/ML technologies to drive better, more equitable care outcomes.
Create a checklist that evaluates potential sources of bias and what groups may be at risk for inequity. Prior to validating or deploying a predictive model, it is worthwhile to clearly describe the clinical/business driver(s) for the intended predictive model and how the model will be used. Given the intended use, is there a risk that the model might perform unequally across subgroups and/or result in an unequal allocation of resources or outcomes for specific subgroups? If the prediction target is only a proxy for the outcome of interest, could that lead to unintended disparities between subgroups?
Once the objectives are clearly determined, it is possible to identify potential sources of bias in a given model. Some example questions to address include:
- Are there inputs that might be predictive of the outcome for some subgroups (e.g., socioeconomic status) that are not included in the model?
- Is the prediction target measured in the same way for all subgroups?
- Are input variables more likely to be missing in one subgroup than another?
- Could end users use the model outputs differently for specific subgroups?
Proactively evaluate models for bias and robustness. Identifying subgroups at risk of bias or inequity facilitates explicit testing for differences in model performance between subgroups. Understanding differences in performance is necessary to avoid and mitigate bias, but it is not sufficient because the validation data may still differ in important ways from the environment in which the model is ultimately deployed. Fortunately, new machine learning techniques can evaluate whether models are robust to differences in data and also identify the conditions under which the model will no longer perform and potentially become unsafe.
Continuously monitor results and outputs over time. Done incorrectly we risk harming patients, making care less safe and potentially exacerbating bias. Even if models are free from bias when initially validated and deployed, it is essential to continue monitoring model performance to ensure performance does not degrade over time. Models are particularly susceptible to failure after unanticipated changes in technology (e.g., new devices, new code sets), population (e.g., demographic shifts, new diseases), or behavior (e.g., practice patterns, reimbursement incentives). These changes are collectively referred to as dataset shift because the data used in clinical practice differs from data used to train the predictive model. Although clinicians, administrators, or IT teams can mitigate changes in performance by explicitly identifying scenarios when dataset shift is likely, it is equally important that solution vendors monitor model performance on an ongoing process and update the models when needed
As more health systems and healthcare organizations implement AI/ML technology to help enable patient-specific insights to drive improved care, they need to be actively working to reduce bias and provide better, more equitable care by implementing three key strategies. Understanding the potential sources of bias, proactively looking for and evaluating for bias in models, and monitoring results overtime will help reduce differential treatment of patients by race, gender, weight, age, language and income.