Machine Learning for MDs Weekly Digest


What’s New in ML for MDs
Welcome to the ML for MDs Newsletter. The mission of ML for MDs is to connect physicians interested in machine learning. This newsletter provides the most relevant news, journal articles, and jobs at the intersection of medicine and machine learning.
Fun Facts
- The UK’s National Health Service has estimated that by 2040 it will require 90% of its staff base to be data literate. (this may be more sad than fun – why wait 17 years? Why not 100%?)
- Bachelor’s degrees in data science were practically non-existent five years ago; now over 50 higher education institutions in the US currently offer one.
- Data scientists spend 80% of their time cleaning and organizing data
This Week’s Top Stories
- Insilico is combining generative AI and quantum computing to accelerate drug discovery.
- “The quantum discriminator of GAN with only tens of learnable parameters can generate valid molecules and outperforms the classical counterpart with tens of thousands parameters in terms of generated molecule properties”
- Only half of survey respondents chose an in-person physician rather than AI. To me, this signals a pretty high willingness to choose a computer (relatively unknown quantity) compared to a known quantity.
- Less likely to choose AI: older, politically conservative, more religious
Weekly summary
Open Source AI 101 (Part 1) – a basic introduction to open source resources
What does open source mean?
Open source refers to software that is:
- Freely available to use, modify, and distribute
- Often developed by a community of developers who collaborate on the project
- Allows for greater flexibility and customization than proprietary software.
What are the disadvantages of open source software?
- Can be hard to use/not user-friendly
- Depending on the platform, it may be hard to get help if you get stuck or need help. Some of the open source platforms have active communities, but there’s not always someone to directly answer your questions
It seems like some companies are open source. How do they make money?
- Usually by charging businesses to provide custom products based on the open source solutions. Companies that want to use open source software still have to integrate it into their system and customize it to fit their specific needs. Hugging Face, for example, is a venture-backed startup that provides “enterprise support” for its NLP open source software
What is github and why do I see it everywhere?
- Github allows developers to store and manage their code repositories in the cloud and collaborate with other developers on projects. GitHub is used by millions of developers around the world and is an essential tool for open source software development.
- It makes money by charging for large files, private software repositories, and extra security for companies
What is the most popular use of open source software in healthcare?
- By far the most common way open source software is used in healthcare is by non-profits or low-resource countries for EHR use. Although they’re open source, most non-profits still have to hire programmers to customize the software and make it functional for their organization, so it’s not to be mistaken for a “free” EHR. Some popular examples are:
- OpenEMR
- OpenMRS
- OpenEHR
- GNU Health
- FreeMedForms
- OpenClinicGA
- Canada uses OSCAR EMR; the US (used to/kind of still does) use VISTA for the VA system
- SMART is a platform for developing healthcare applications using the FHIR (Fast Healthcare Interoperability Resources) is an open standard for exchanging healthcare information electronically.
What are the most popular AI open source programs overall?
- TensorFlow: TensorFlow is widely used for neural networks.
- Keras: Python interface for building neural networks, runs on top of TensorFlow with lots of easy to use shortcuts.
- PyTorch: Very popular for teaching natural language processing.
- Scikit-learn: Very popular for teaching machine learning. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN.
- Caffe: CNN developed by the Berkeley Vision and Learning Center (BVLC) for image classification, segmentation, and object detection.
- Theano: Python library for multi-dimensional arrays efficiently; used to build deep neural networks.
- Torch: Uses Lua programming language.
- MXNet: define, train, and deploy deep neural networks on a wide array of devices.
- H2O.ai: No-code machine learning algorithms for big data analysis with fairly easy to use interface.
- Apache Mahout: clustering, classification, and collaborative filtering.
Community News
- If you haven’t introduced yourself, please do so under the #intros channel.
Thanks for being a part of this community! As always, please let me know if you have questions/ideas/feedback.
Sarah
Sarah Gebauer, MD