Data Science Associate (machine learning and statistical genetics), Knowles Lab

New York Genome Center
Computer Science
United States NY New York


The Knowles lab ( at the New York Genome Center (NYGC) and Columbia University Departments of Computer Science and Systems Biology is seeking a Data science associate to work on NIH-funded projects using machine learning to understand the genetic underpinnings of Alzheimer’s disease (AD) and Parkinson’s Disease. These multidisciplinary projects are a collaboration with Dr. Towfique Raj’s group in the Departments of Neuroscience and Genetics and Genomic Sciences at the Icahn School of Medicine at Mount Sinai and will be part of the National Institute on Aging’s Alzheimer’s Disease Sequence Project (ADSP) consortium. The project will involve the application of deep learning, network and causal inference methods (primarily developed in the lab) to large-scale whole genome and transcriptome sequencing data with the aim of identifying the genetic variants, regulatory elements, genes, pathways and cell-types involved in neurodegenerative disease pathogenesis.

The Knowles lab aims to understand the role of transcriptomic dysregulation across the spectrum from rare to common genetic disease. This involves better characterization of the genetic and environmental factors contributing to mRNA expression and splicing variation. Beyond these specific projects there are opportunities for close collaboration with diverse research groups at NYGC collecting large-scale genomics datasets in the context of neurological disease and developing novel genomic technologies including single cell methods, forward genetic screens and long-read transcriptomics.

We anticipate that this position will be performed remotely through July 2021, unless the successful candidate prefers to work on site.

Key Responsibilities

Implement, execute and document data analysis pipelines and workflows.
Benchmark new tools for sequencing data analysis and propose improvements.
Document and present results in written or oral reports to other lab members and external collaborators.
Assist other lab members and collaborators in preparing text and figures for manuscripts and external presentations.
Assist other lab members in management of internally and externally produced data sets.
Other tasks, as assigned.


BS in computer science, statistical genetics, bioinformatics or a related quantitative field required, MS preferred.
Strong interest in genetics and genomics.
1+ years programming experience in Python or R (preferably both).
Understanding of Unix systems/command line/bash.
Some experience with NGS data analysis, functional genomics and/or human genetics preferred.
An understanding of standard software engineering practices (e.g. version control, code review, unit testing), and a willingness to learn these in practice.

Start date

October 19, 2020

How to Apply

For more details and to apply: