Machine Learning for MDs Weekly Digest
The mission of ML for MDs is to connect physicians interested in machine learning. This newsletter provides learnings at the intersection of medicine and machine learning.
Fun Fact
- The first message sent over ARPANET happened on Oct. 29, 1969. Charley Kline, who was a student at UCLA, tried to log in to the mainframe at the Stanford Research Institute. He successfully typed in the characters L and O, but the computer crashed when he typed the G of the command LOGIN
News and Stories of the Week
- From Eric Topol’s Substack: “In this week’s JAMA, Kanjee, Crowe, and Rodman published a comparison of 70 NEJM CPCs [clinical patient conferences/grand rounds] for the medical expert diagnosis compared with GPT-4.”
- A review of using top-down vs. bottom-up approach when using EHR data
- Nice summary by the team at AI Checkup of the current sprawling health AI landscape with really great graphics.
- A research group created Clinical GPT using “medical records, domain-specific knowledge, and multi-round dialogue consultations in the training process” and says it significantly outperforms other models in “medical knowledge question-answering, medical exams, patient consultations, and diagnostic analysis of medical records”
- This perspective paper on foundation models for medical images gives a nice framework of “the “spectrum” of medical foundation models, ranging from general vision models, modality-specific models, to organ/task-specific models”
Weekly Summary
Bias in AI has long been a concern, and the effects were first seen in non-medical settings: algorithms that gave Black inmates longer sentences or presented job postings for highly paid positions primarily to men.
Bias studies in healthcare are still developing, but below are some representative studies. They show the lack of publicly available representative data, as well as the bias introduced by using proxies in algorithms.
Obermyer and colleagues published one of the first studies of AI bias in healthcare in Science:
Setting | Large academic center |
Timeframe | 2013-2015 |
Patients | Primary care patients enrolled in risk-based insurance contracts. About 6000 Black patients, 43,000 White patients |
What the investigators did | Analyzed racial differences in a real algorithm used by many insurers and healthcare systems, designed to identify high-risk, complex patients who would benefit from additional healthcare resources. |
What the investigators found | This widely-used algorithm used healthcare costs as a proxy to identify patients with high healthcare needs. However, at the same risk score, Black patients were sicker because they don’t utilize as much healthcare as White patients at the same level of illness. This means they were less likely to qualify for a program with additional support. |
Key takeaway | Proxies used for machine learning algorithms can introduce hidden bias |
Also in 2019 published in the Lancet, Tomasev et al trained a model to predict kidney injury and found significant gender bias:
Setting | US |
Timeframe | 2011-2015 |
Patients | 703,782 inpatients and outpatients from VA hospitals, 94% men |
What the investigators did | Developed a deep learning algorithm (recurrent neural network) to predict future kidney injury |
What the investigators found | Model could predict 90% of future dialysis needs, but performed significantly worse for women |
Key takeaway | Datasets that aren’t representative of specific populations will likely perform worse on them |
In 2021, Celi et al described the bias present in data availability in PLoS Digital Health:
Setting | Clinical papers published in PubMed |
Timeframe | 2019 |
Patients | n/a |
What the investigators did | Used a machine learning algorithm to determine how where machine learning databases were from |
What the investigators found | 40% of the databases were from the US and 13% from China, and 40% of authors were American or Chinese |
Key takeaway | Countries with better datasets are likely to benefit more from AI, and current datasets and authors are overwhelmingly from the US and China |
In 2022, Wen et al published a paper in Lancet Digital Health showing that publicly available datasets are not representative of many parts of the world:
Setting | Datasets published in MEDLINE, Google, and Google Dataset |
Timeframe | 2020-2021 |
Patients | 100,000 images |
What the investigators did | Searched all publicly available dermatology image datasets of skin cancer |
What the investigators found | 79%-88% of image sets were from Europe, Oceania, and North America. Only one dataset originated from Asia, two from South America, and none from Africa. |
Key takeaway | The images demonstrated a “massive under-representation of skin lesion images from darker skinned populations” |
In 2021, the WHO published “Ethics and governance of artificial intelligence for health”, a 165 page document detailing:
- Laws and policies related to AI in healthcare
- Key ethical principles for use of AI in healthcare
- Protect autonomy
- Promote human well-being, safety,and the public good
- Ensure transparency, explainability and intelligibility
- Foster responsibility and accountability
- Ensure inclusiveness and equity
- Promote AI that is responsive and sustainable
- Ethical challenges to use of artificial intelligence for health care
- Bias, cybersecurity, data collection, accountability, etc
- Building an ethical approach to use of AI for health
- Transparent design, impact assessment, public engagement
- Liability regimes for AI in health
- Liability, compensation for errors, regulatory agencies
- Elements of a framework for governance of AI for health
- Data governance, regulatory considerations, model legislation
Community News
- If you haven’t introduced yourself, please do so under the #intros channel.
Thanks for being a part of this community! As always, please let me know if you have questions/ideas/feedback.
Sarah
Sarah Gebauer, MD