Neural Networks for Genomics

When we first set out to apply deep learning to genomics, we asked ourselves what the current state of the art is. What problems are researchers working on and what approaches are they using? This post contains a summary of what we found — an overview of popular network architectures in genomics, the types of data used to train deep models, and the outcomes predicted or inferred.

Despite being able to sequence the genome at nucleotide-level resolution, and the abundance of publicly available labeled datasets from sources like the 1000-genome project, ENCODE and GEO, we are still far from bridging the genotype-phenotype divide or predicting disease from genome sequences. This talk by Brendan Frey puts the deep learning-and-genomics problem in context, explaining why sequencing more genomes may not be the answer. The genome is complex and contains many interacting information layers. Most current approaches involve developing a system to interpret the genomic code or a part of it, rather than directly training a network that predicts phenotype from sequence. Below are some of the ways that deep learning has been used for genomics, with emphasis on implementations for the human genome or transcriptome.

Source: Neural Networks for Genomics