Senior Associate Machine Learning Data Engineering

Description

ole Responsibilities

The successful candidate will work with ML research scientists to enable our proprietary data and external datasets to be leveraged for ML modeling. This will be accomplished by implementing end-to-end data workflows for large-scale data ingestion, processing, tagging, and publishing, with an eye towards improving ML model performance over time.

Basic Qualifications

Bachelor’s degree in computer science, Statistics, Applied Mathematics, Chemistry, Physics, a life science discipline, or related technical discipline.
Training or work experience in Python, Java, Scala, C++, or SQL.
Training or work experience in software design, development, and algorithm-related solutions for production-grade systems using machine learning.
Knowledge of one or more scientific data types (e.g., biomedical images, biomedical text, large-scale, multidimensional 'omics, large- or small- molecule therapeutics, clinical or Real-World Data, etc.)
Preferred Qualifications

MS or 2 years of relevant research experience
Familiarity with high performance computing (HPC) environments (SLURM/LSF/SGE schedulers)
Familiarity with cloud computing infrastructure including Amazon Web Services (AWS) and distributed computing libraries (e.g., Spark, Hive, Impala, Kafka, etc.)
Understanding of containerization and orchestration tools (e.g., Docker, Singularity, Airflow, Luigi, Kubernetes, etc.)
Basic knowledge of CI/CD and automation tools (Terraform, CloudFormation, Jenkins, Ansible, etc.)
Passion and curiosity for data and proven ability to take ideas from prototype to production.
Technologies We Use:

Python, Java, C++, Slurm-based on-premises compute clusters, Google Cloud Platform, AWS, Docker, Singularity, Kubernetes, Python (Numpy, Pandas, Dask, PyTorch, TensorFlow, sci-kit learn, RDKit, Weights and Biases etc.


Qualifications


Start date

July 07, 2023

How to Apply

www.pfizer.com/careers