June 5, 2023

Machine Learning for MDs Weekly Digest

What’s New in ML for MDs

Welcome to the ML for MDs Newsletter. The mission of ML for MDs is to connect physicians interested in machine learning. This newsletter provides the most relevant news, journal articles, and jobs at the intersection of medicine and machine learning. 

Fun Facts

  • The UK’s National Health Service has estimated that by 2040 it will require 90% of its staff base to be data literate. (this may be more sad than fun – why wait 17 years? Why not 100%?)
  • Bachelor’s degrees in data science were practically non-existent five years ago; now over 50 higher education institutions in the US currently offer one.
  • Data scientists spend 80% of their time cleaning and organizing data

This Week’s Top Stories

Weekly summary

Open Source AI 101 (Part 1) – a basic introduction to open source resources

What does open source mean?

Open source refers to  software that is:

  • Freely available to use, modify, and distribute
  • Often developed by a community of developers who collaborate on the project
  • Allows for greater flexibility and customization than proprietary software. 

What are the disadvantages of open source software?

  • Can be hard to use/not user-friendly
  • Depending on the platform, it may be hard to get help if you get stuck or need help. Some of the open source platforms have active communities, but there’s not always someone to directly answer your questions

It seems like some companies are open source. How do they make money?

  • Usually by charging businesses to provide custom products based on the open source solutions. Companies that want to use open source software still have to integrate it into their system and customize it to fit their specific needs. Hugging Face, for example, is a venture-backed startup that provides “enterprise support” for its NLP open source software

What is github and why do I see it everywhere?

  • Github allows developers to store and manage their code repositories in the cloud and collaborate with other developers on projects. GitHub is used by millions of developers around the world and is an essential tool for open source software development.
  • It makes money by charging for large files, private software repositories, and extra security for companies 

What is the most popular use of open source software in healthcare?

  • By far the most common way open source software is used in healthcare is by non-profits or low-resource countries for EHR use. Although they’re open source, most non-profits still have to hire programmers to customize the software and make it functional for their organization, so it’s not to be mistaken for a “free” EHR. Some popular examples are:
    • OpenEMR
    • OpenMRS
    • OpenEHR
    • GNU Health
    • FreeMedForms
    • OpenClinicGA
  • Canada uses OSCAR EMR; the US (used to/kind of still does) use VISTA for the VA system
  • SMART is a platform for developing healthcare applications using the FHIR (Fast Healthcare Interoperability Resources) is an open standard for exchanging healthcare information electronically.

What are the most popular AI open source programs overall?

  1. TensorFlow: TensorFlow is widely used for neural networks.
  2. Keras: Python interface for building neural networks, runs on top of TensorFlow with lots of easy to use shortcuts.
  3. PyTorch: Very popular for teaching natural language processing.
  4. Scikit-learn: Very popular for teaching machine learning. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN.
  5. Caffe: CNN developed by the Berkeley Vision and Learning Center (BVLC) for image classification, segmentation, and object detection.
  6. Theano: Python library for multi-dimensional arrays efficiently; used to build deep neural networks.
  7. Torch: Uses Lua programming language.
  8. MXNet: define, train, and deploy deep neural networks on a wide array of devices.
  9. H2O.ai: No-code machine learning algorithms for big data analysis with fairly easy to use interface. 
  10. Apache Mahout: clustering, classification, and collaborative filtering.

Community News

  • If you haven’t introduced yourself, please do so under the #intros channel. 

Thanks for being a part of this community! As always, please let me know if you have questions/ideas/feedback.

Sarah

Sarah Gebauer, MD

MlforMDs.com

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top