Data Scientist for Machine Learning Applications in the Genomics and Clinical Data

UCLA Dept of Computational Medicine
Computational Medicine
United States California Los Angeles
recruit.apo.ucla.edu/JPF05170

Description

Job description: The Department of Computational Medicine jointly with the Institute of Precision Health aims to leverage the UCLA biobank (AtLAs) to improve prediction and diagnosis of clinical outcomes using algorithms that combine the genetic data with the medical records, physiological waveforms, and imaging data. In order to achieve this goal, novel and creative machine learning and statistical analyses are needed. The candidate will be in charge of developing and implementing such algorithms on a variety of data sources and on a variety of clinical outcomes. The role includes the development of entire pipelines, from quality control procedures to the applied machine learning algorithms.

Work environment: Our projects involve the collaboration of interdisciplinary teams including machine learning researchers, data scientists, computational biologists, and clinicians. The candidate will be expected to be a good communicator, particularly with the ability to communicate in an interdisciplinary environment, where different individuals have different expertise. The work is highly practical and translational - the goal is to develop improved diagnostic computational tools using existing data that can be incorporated across UCLA Health and beyond.


Qualifications

Job requirements:
● Advanced degree in either computer science, statistics, engineering, biomedical informatics, or a related field.
● Programming competence demonstrated in at least one or more of these programming languages: Python, R, Java, C++, Matlab.
● Knowledge and deployment of advanced statistical and machine learning concepts used in big data analysis including nonparametric tests, ANOVA, mixed models, modern supervised and unsupervised machine learning algorithms such as SVM, Random Forest, PCA, Clustering, and Neural Networks.
● Ability to communicate clearly in an interdisciplinary environment.
● Knowledge and experience working with genomic data or with medical records data - advantage.
● Ability to work in a Linux environment - advantage.
● Software tool development experience: source control (git), packaging, documentation - advantage.

Requirement self-testing: In order to have an idea whether you are qualified for this work, please see if you can answer the following questions:

Explain what is a Bonferroni correction for multiple hypotheses.
Let X,Y be two independent standard Normal random variables. What is the expectation of X*Y? What is the variance of X*Y? Is X*Y also normally distributed?
You have an array of 1,000,000,000 numbers. Find an efficient algorithm that returns the 100th largest number of the array.
If v is an eigenvector of X as well as an eigenvector of Y, is it also an eigenvector of X+Y?
In a machine learning problem briefly explain what are the following: test error, training error, overfitting, cross-validation.


Start date

As soon as possible

How to Apply

Apply at: recruit.apo.ucla.edu/JPF05170