Data Engineer

Icahn School of Medicine at Mount Sinai
Icahn Institute for Genomics & Multiscale Biology
United States NY New York
icahn.mssm.edu/research/genomics

Description

The Icahn Institute for Genomics & Multiscale Biologyis looking for talented data engineers with the responsibilities and qualifications listed below. At the Icahn Institute, our vision is to transform biomedical research and healthcare delivery into a data-driven, evidence-based, patient-tailored discipline. The Icahn Institute was founded in 2011 to help advance precision medicine with cutting-edge technologies, novel partnerships between the public and private sectors, and world class computational and analytical resources. By maximally leveraging information from patients around the world, we deliver premier precision care optimized for each patient, while discovering breakthrough, next-generation treatments through insights derived from cutting-edge analytics applied to unprecedented amounts of patient-derived data. We promote a core set ofvalues –designed to promote our future vision of team-oriented, data driven global biomedical research: 1) Do good for the patient, 2) Simplify, 3) Share openly, 4) Focus, 5) Synergize, 6) Contributions, not politics,and 7) Deliver.

What You’ll Do

Some of the major responsibilities of Data Engineers are to collect, clean, transform, and structure messy data, and create robust, reproducible models that produce potentially valid scientific inferences. Depending on expertise and interests, our data engineers spend their time either developing pipelines or wrangling datasets.

Pipeline development includes the creation and deployment of computational pipelines. Examples of currently used computational pipelines at theInstitute are designed for:
o processing terabytes of sequence data (e.g. DNA FASTQ read files to DNA variant calls, RNA FASTQ read files to transcript counts).
o converting free text in electronic medical records (EMRs) to feature vectors using open-source natural language processing packages.
o converting ICD-9 codes to ICD-10 ones for millions of electronic medical records.
o transforming and loading MIMIC-III data into a PostgreSQL OHDSI Common Data Model instance.

Dataset wrangling includes collecting, indexing, and cleaning in-house and publicly-available datasets. Examples of current datasets in need of an expert data engineer include:
o Mount Sinai electronic medical records (EMRs).oPublicly available genome-phenome data: UK Biobank, Simons Genome Diversity Panel, gnomAD, and dbGAP.
o Publicly available cancer genomics data: TCGA and CCLE.
o Publicly available data from public cohorts/studies: ClinicalTrials.gov, NHLBI’s BioLINCC, NIDDK Central Repository, and ImmPort.
o Publicly available population health data: NHANES and CMS provider-level Part B and Part D.

Data Engineers at Mount Sinai are critical to the successful development and maintenance of systems that 1000’s of patients daily, as well as myriad cutting-edge research projects, depend on.

Duties and Responsibilities

The ideal candidate will have strong software development and deployment proficiencies and be passionate about working at the intersection of biomedical research and healthcare provision. Specific duties and responsibilities include:
• Collect, clean, transform, and structure disorganized data into analysis-ready datasets.
• Help design, build, run, monitor, and maintain informatics pipelines.
• Work concurrently on a variety of data engineering projects.
• Integrate with services across computational platforms and programming languages.
• Architect and maintain databases, filesystems (local, remote, and distributed), and custom file-formats for storing/serializing clean, transformed data.
• Participate in the preparation of manuscripts, grants, and presentations.
• Foster a data-centric culture through the use of best practices for data provenance, structuring, and access.


Qualifications

 Bachelor’s, Master’s, or PhD in Bioinformatics, Computer Science, Statistics, or a related quantitative discipline.
 Good organization and communication skills, with a demonstrated ability to work productively as a member of a team.
 Strong verbal and written communication skills in English.
 3+ years of experience in data engineering and/or software development.
 Proficiency with UNIX systems and several programming languages.
 Knowledge and experience with software engineering best practices, such as version control, code review, unit/regression testing, and continuous development/continuous integration.


Start date

As soon as possible

How to Apply

If interested and qualified, please apply at careers.mountsinai.org/jobs/2311556?lang=en-us.


Contact

data@mssm.edu