Summaries of AI Bias Papers – June 25, 2023

Machine Learning for MDs Weekly Digest

The mission of ML for MDs is to connect physicians interested in machine learning. This newsletter provides learnings at the intersection of medicine and machine learning. 

Fun Fact

  • The first message sent over ARPANET happened on Oct. 29, 1969. Charley Kline, who was a student at UCLA, tried to log in to the mainframe at the Stanford Research Institute. He successfully typed in the characters L and O, but the computer crashed when he typed the G of the command LOGIN

News and Stories of the Week

  • From Eric Topol’s Substack: “In this week’s JAMA, Kanjee, Crowe, and Rodman published a comparison of 70 NEJM CPCs [clinical patient conferences/grand rounds] for the medical expert diagnosis compared with GPT-4.” 
  • A review of using top-down vs. bottom-up approach when using EHR data
  • Nice summary by the team at AI Checkup of the current sprawling health AI landscape with really great graphics.
  • A research group created Clinical GPT using “medical records, domain-specific knowledge, and multi-round dialogue consultations in the training process” and says it significantly outperforms other models in “medical knowledge question-answering, medical exams, patient consultations, and diagnostic analysis of medical records”
  • This perspective paper on foundation models for medical images gives a nice framework of “the “spectrum” of medical foundation models, ranging from general vision models, modality-specific models, to organ/task-specific models”

Weekly Summary

Bias in AI has long been a concern, and the effects were first seen in non-medical settings: algorithms that gave Black inmates longer sentences or presented job postings for highly paid positions primarily to men.

Bias studies in healthcare are still developing, but below are some representative studies. They show the lack of publicly available representative data, as well as the bias introduced by using proxies in algorithms.

Obermyer and colleagues published one of the first studies of AI bias in healthcare in Science:

SettingLarge academic center 
Timeframe2013-2015 
PatientsPrimary care patients enrolled in risk-based insurance contracts. About 6000 Black patients, 43,000 White patients
What the investigators didAnalyzed racial differences in a real algorithm used by many insurers and healthcare systems, designed to identify high-risk, complex patients who would benefit from additional healthcare resources. 
What the investigators foundThis widely-used algorithm used healthcare costs as a proxy to identify patients with high healthcare needs. However, at the same risk score, Black patients were sicker because they don’t utilize as much healthcare as White patients at the same level of illness. This means they were less likely to qualify for a program with additional support.
Key takeawayProxies used for machine learning algorithms can introduce hidden bias

Also in 2019 published in the Lancet, Tomasev et al trained a model to predict kidney injury and found significant gender bias:

SettingUS
Timeframe2011-2015
Patients703,782 inpatients and outpatients from VA hospitals, 94% men
What the investigators didDeveloped a deep learning algorithm (recurrent neural network) to predict future kidney injury
What the investigators foundModel could predict 90% of future dialysis needs, but performed significantly worse for women
Key takeawayDatasets that aren’t representative of specific populations will likely perform worse on them

In 2021, Celi et al described the bias present in data availability in PLoS Digital Health:

SettingClinical papers published in PubMed
Timeframe2019
Patientsn/a
What the investigators didUsed a machine learning algorithm to determine how where machine learning databases were from
What the investigators found40% of the databases were from the US and 13% from China, and 40% of authors were American or Chinese
Key takeawayCountries with better datasets are likely to benefit more from AI, and current datasets and authors are overwhelmingly from the US and China

In 2022, Wen et al published a paper in Lancet Digital Health showing that publicly available datasets are not representative of many parts of the world:

SettingDatasets published in MEDLINE, Google, and Google Dataset
Timeframe2020-2021
Patients100,000 images
What the investigators didSearched all publicly available dermatology image datasets of skin cancer
What the investigators found79%-88% of image sets were from Europe, Oceania, and North America. Only one dataset originated from Asia, two from South America, and none from Africa.
Key takeawayThe images demonstrated a “massive under-representation of skin lesion images from darker skinned populations”

In 2021, the WHO published “Ethics and governance of artificial intelligence for health”, a 165 page document detailing:

  • Laws and policies related to AI in healthcare
  • Key ethical principles for use of AI in healthcare
    • Protect autonomy
    • Promote human well-being, safety,and the public good
    • Ensure transparency, explainability and intelligibility
    • Foster responsibility and accountability
    • Ensure inclusiveness and equity
    • Promote AI that is responsive and sustainable
  • Ethical challenges to use of artificial intelligence for health care
    • Bias, cybersecurity, data collection, accountability, etc
  • Building an ethical approach to use of AI for health
    • Transparent design, impact assessment, public engagement
  • Liability regimes for AI in health
    • Liability, compensation for errors, regulatory agencies
  • Elements of a framework for governance of AI for health
    • Data governance, regulatory considerations, model legislation

Community News

  • If you haven’t introduced yourself, please do so under the #intros channel. 

Thanks for being a part of this community! As always, please let me know if you have questions/ideas/feedback.

Sarah

Sarah Gebauer, MD

MlforMDs.com

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top