Summary of new NEJM article about AI in molecular medicine – July 3, 2023

Machine Learning for MDs Weekly Digest

The mission of ML for MDs is to connect physicians interested in machine learning. This newsletter provides learnings at the intersection of medicine and machine learning. 

Fun Fact

  • The Human Genome Project cost several billion dollars and took 10 years to sequence one genome. In 2022, a more complete genome was sequenced in 5 hours for a few hundred dollars

AI Words of the Week

  • Recurrent neural networks (RNNs) are usually used for sequential data – weather patterns are a good example
    • They work by using previous data to predict future outcomes, and don’t look at future data
    • Drawbacks include a limit to how much context they can “remember” and can be slow to process
    • Long short-term memory is a subtype of RNNs with the ability to use longer amounts of data as context
  • Convolutional neural networks (CNNs) are usually used for image data
    • They work by putting a filter over small squares of pixels and converting that information into numbers
  • Support Vector Machines (SVMs) are used to classify data, and often to group similar data together.
    • Support vectors are kind of like lines drawn between groups of data points on a graph

Weekly Summary

The NEJM came out with a review article about AI in molecular medicine this week. It describes the different fields of molecular medicine and lists the recent advances and the types of AI used for each field. 

Some context: Looking at the definitions above, it makes sense that RNNs and LSTMs especially would be used to analyze genetic code, since they’re basically long lists, and you don’t need to “look ahead” for context like you do in language to predict the next base/word. Convolutional neural networks in genomics are an interesting use case of this technology that is almost always used for images – it’s used to analyze an image of the data produced by RNNs. And SVMs are an older technology, but are good at developing groups of data, which means it makes sense to use them for finding which genes and markers are similar to each other.

Background: 

  • Genetic sequencing used to be done by looking at DNA or RNA sequences that were up to a few hundred bases long. 
  • In the early 2000s, sequencing by synthesis led to billions of DNA templates being synthesized and read at the same time
  • The computation and processing needs of genome processing are huge, and result in files millions of rows long
  • “The most important advances in the application of machine learning to genomics…[are] the process of determining where the…patient sample varies from the reference sequence”

Machine learning and neural networks help identify genetic causes of rare diseases through:

  • Improved haplotyping (“mapping of DNA strands to the parental chromosome of origin”) 
  • “Reading the transcriptome (the sum of all the RNA transcripts in an organism)” helps identify genes that cause rare diseases
  • Matching phenotypes to gene candidates for rare diseases, ie, which gene disorders are likely to cause this kind of problem
  • Epigenetic evaluation – newer field, previously difficult due to size
  • Methods used: recurrent neural networks (for sequential data) and convolutional neural networks (for image recognition of the maps produced)

Machine learning has helped proteomics and metabolomics through:

  • Improved prediction of candidate peptides with LSTM (long short-term memory)
  • Prediction of when a peptide will be elated from a liquid chromatography column through convolutional neural networks (CNNs)
  • New peptide sequencing, protein identification, and protein function prediction with CNNs and LSTM
  • Metabolomics measures the small molecules like lipids, amino acids, and carbohydrates to measure inborn errors of metabolism, which have a traditionally low diagnostic rate

Community News

  • If you haven’t introduced yourself, please do so under the #intros channel. 

Thanks for being a part of this community! As always, please let me know if you have questions/ideas/feedback.

Sarah

Sarah Gebauer, MD

MlforMDs.com

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top