Artificial intelligence (AI) healthcare technology can solve our most complex and intransigent health issues. It has the potential to enhance healthcare quality, improve access, reduce cost, and deliver highly personalized care. However, delivering those solutions requires large, historical, broadly representative, and well-organized data from an affected population as well as from the individual that is seeking care. Without representative data and algorithms that render fair, accurate results, AI will exacerbate, rather than solve, longstanding health issues, especially for individuals that suffer from healthcare disparities.
Data Set Curation
The data sets that currently inform AI healthcare—upon which most new data sets are created—are frequently gathered from high-resourced, predominantly white communities, which may not be representative of treatment populations. Organizations with the goal of improving AI fairness by using representative data to create and improve algorithms, may use, combine, duplicate, or share individually identifiable health data collected from communities where greater data are needed, without their informed consent. For example, an organization may purchase data sets from a healthcare organization and enhance them with data obtained from public sources, exchanged with partner organizations, or purchased from third parties. Often, there is no requirement that the details of these relationships are disclosed to the individuals who provided their data.
Clinical trial data may be collected, used, and shared to maximize the representativeness of data sets. In the United States, the Common Rule applies to all experimental research, such as, for example, clinical trials to test the safety and efficacy of medical devices powered by AI. Under the Common Rule, the ethical collection and use of data is predominantly determined by an Institutional Review Board (IRB). This means that which data will be collected and how data will be used—for example whether data will be shared with third parties or reused for other research after the trial has concluded—may be approved by the IRB but not included in consent forms or disclosed to trial participants in any detail.
Data Collection and Consent
Data collected, used, and shared as part of healthcare treatment and insurance coverage, pose many of the same issues. For example, the Health Insurance Portability and Accountability Act (HIPAA) only requires that covered entities, including healthcare providers, health plans, and healthcare clearinghouses, disclose their information handling practices at a high level to patients or health plan members. Disclosure of such practices is not subject to consent under HIPAA, instead organizations are only required to make a reasonable effort to document acknowledgement that individuals received a notice of their privacy practices. Consent is only required when organizations seek to share information with third parties outside of treatment, payment, and healthcare operations activities. These information sharing practices are subject to an authorization form signed by the patient as part of the HIPAA authorization process. The disclosures that organizations choose to put in the authorization form are not subject to any ethical oversight, as would be common under an IRB. Rather, HIPAA only requires that organizations follow what is disclosed in the authorization, which could include disclosure to third parties that are using identifiable health data for commercial gain, such as is the case in AI commercial development.
HIPAA further permits covered entities to collect and use identifiable health data without informed consent of patients if it is for the purpose of “treatment, payment, or healthcare operations.” However, “healthcare operations” is a broad term that could include the use of AI for diagnostic, treatment, or illness management protocols. This means that the collection and use of identifiable health data (under HIPAA, called Protected Health Information, or PHI, when collected by a covered entity) for purposes of running and feeding an AI system, does not require consent. Some healthcare privacy laws in individual states may require consent. However, even when consent is required, often detailed disclosures explaining any potential risks or details about the third parties who will receive these data, are not required.
In healthcare, consent is not exactly representative of choice because most health plan members receive insurance from their employers, and most health plans limit coverage to a small number of healthcare providers. Even if individuals do not want certain forms of data collection, use, or sharing, they are not likely to rank privacy considerations over receiving a treatment that an insurer or government program will cover. This issue of ranked preferences, in a context in which patients are encouraged to trust their healthcare providers, means that it is unlikely individuals will change providers simply because they do not like something they see in a privacy notice.
In 2019, the Center for Open Data Enterprise (CODE) and the U.S. Department of Health and Human Services’ (HHS) Office of the Chief Technology Officer produced a report acknowledging the desirability of increasing the volume of health data and need to facilitate data sharing between organizations contractually while protecting patient privacy. The report proposes that organizations reduce the identifiability of data sets to maximize data availability. Although HHS created the de-identification method Safe Harbor to enable covered entities to use PHI without restriction, this method does not necessarily protect patient privacy interests. Big data sets, which are used to generate algorithms, are often created by combing de-identified data sets with others and public data. As data volume increases, it is more likely these data sets—despite meeting HHS’ de-identification safe harborcould include characteristics that would enable the identification of individuals, especially when powerful AI algorithms are analyzing these data in one or more big data sets.
Big Data Fairness
Determining whether AI health technology is fair and safe will require regulators to examine which data were used to create algorithms that yield unfair results and ensure that organizations have sufficiently tested their algorithms to reveal and correct unfair results. When data elements such as race, ethnicity, gender identity, sexual identity, religion, and disability status are stored and used to create and optimize algorithms, it is obvious that the algorithm will differentiate results based on these elements and create a risk of unfairness in their application. Less obvious unfairness risks are created when AI decisions are based on proxies for the above data elements.
For example, physical location data can indicate an individual’s religion with a high probability, if that individual routinely visits a place of worship. Dietary data could similarly indicate diagnoses or co-morbidities. Proxies can be created from de-identified and seemingly innocuous data sets. Larger data sets and the use of substantial computing power with AI, can increase the likelihood of AI decisions based on proxies. When proxies are correct and used as the basis for a decision, unfairness, and potentially even discrimination, can result. Unfair algorithms don’t just lead to discriminatory outcomes. In healthcare, they lead to poor healthcare outcomes, even death. Unfairness is not just an ethical and moral issue, it is a quality and safety issue, too.
Solving Fairness Issues
Regulators must ensure that organizations develop fair, high-quality algorithms through ethical data collection, population of representative data sets based on the treatment community, and counterfactual testing to ensure that AI healthcare technology does not render discriminatory or unsafe results. In addition to regulation, organizations can also leverage Algorithmic Impact Assessments (AIA) to demonstrate that their technologies pose very low risk to individuals which will enable regulators like the U.S. Food and Drug Administration (FDA) or certified third parties to ensure safety and fairness. As organizations not typically regulated by the FDA enter the healthcare marketplace, Congress will likely need to determine whether the FDA should be the primary organization to regulate healthcare AI, and, if not, how to coordinate between various regulatory agencies.
A central question for any agency regulating AI is exactly how to regulate highly complex algorithms. Unfortunately, the “black-box” or opaque nature of AI healthcare algorithms makes the possibility of predicting unfair outcomes less likely. Opacity can be technical, in that the technology may be so complex that exactly how it renders a result or recommendation is impossible to identify or not terribly helpful to explain (inscrutability). Moreover, “unlocked” or continuously learning algorithms, such as unsupervised machine learning systems, routinely change how decisions are rendered; unlocked algorithms may be dynamic. Dynamic inscrutability, therefore, makes it difficult to determine how a decision or recommendation was rendered and whether it is unfair or unsafe. Moreover, legal choices by organizations, such as a decision to protect an AI technology with trade secrecy or confidentiality limitations, can limit its ability to disclose information associated with the technology.
Whether changes to AI healthcare technologies are issued as scheduled code releases and updates to an existing product or the algorithm is designed to change and learn as it is used, solely regulating AI development will fail to anticipate downstream fairness and safety issues. The FDA could require organizations to flag potential downstream issues, just as it requires for the devices it currently regulates. Continuous testing, which could be completed by a second AI utility, could also detect unfairness and safety issues. Organizations could also be required to deposit a functioning, updated AI algorithm in a publicly available location, which would enable both competitors and advocacy groups to test it and report issues. The U.S. could also pass a law requiring organizations using or creating AI to explain their healthcare AI decisions at the demand of patients, like the European Union’s automated decision-making explanation requirement under the General Data Protection Regulation. Although explanations or interpretations may not provide the type of information that could reveal fairness or safety issues, organizations could potentially leverage algorithms specially designed to provide explanation or interpretation. Moreover, such laws should also address discretionary legal choices that otherwise limit explanations or other technology disclosures.
Transforming the quality and efficacy of healthcare experiences for the most vulnerable and under-resourced of our communities requires a carefully crafted approach beginning with data selection and including activities by regulators. This collection explores algorithmic bias in relation to data collection and the resulting impacts to individual interests and potential avenues for creating fair, efficacious technologies that will improve healthcare.