Predicting cell-type specific RNA splicing from gene sequence using deep learning

We are sorry, this position has been filled.

RNA splicing, the cellular process by which "junk" intronic regions are removed from initially transcribed RNA, is tightly regulated in healthy human development but frequently dysregulated in genetic disease. Splicing is a complex process involving hundreds of proteins and non-coding RNAs, so that predicting the effects of mutations on outcome is beyond current physical models. We instead take a data-driven approach: leveraging large-scale RNA-seq and massively parallel reporter assay data to train deep neural networks to predict splicing directly from gene sequence. This project will explore extensions to cell-type specific prediction and multitask prediction of splicing with RNA-binding protein interactions. Time-permitting, the improved model will be used to predict effects of mutations from ALS and autism whole-genome sequencing cohorts we have access to through collaborators at the New York Genome Center.

Students should have Python programming experience, some exposure to machine learning/deep learning would be very helpful. Some computational biology knowledge would be useful but is not required.

Lab: Knowles Lab

Direct Supervisor: David Knowles

Position Dates: 6/1/2020 - 9/4/2020

Hours per Week: 35

Paid Position: Yes

Credit: Yes

Number of positions: 1

Qualifications: Students should have Python programming experience, some exposure to machine learning/deep learning would be very helpful. Some computational biology knowledge would be useful but is not required.

Eligibility: Freshman, Sophomore, Junior, Senior, Master's; (SEAS only)

David Knowles, dak2173@columbia.edu