Data Scientist for AI-powered Drug Discovery Tools

Insilicom
United States
insilicom.com/

Description

We are seeking a highly skilled and motivated Data Scientist with background in computational biology, biophysics, bioinformatics or computational chemistry to join our innovative team. The successful candidate will focus on developing and maintaining deep learning models in natural language processing and/or graph neural networks such as link prediction and other machine learning tasks to develop drug discovery tools. This role requires a deep understanding of machine learning, deep learning, and graph-related models. The candidate will leverage state-of-the-art architectures, including various open-source transformer-based models.

Responsibilities:
• Design and implement advanced machine learning models and algorithms to solve complex business problems.
• Develop and optimize models for tasks such as link prediction, entity recognition, and relation extraction.
• Engage in end-to-end development, from data preprocessing, model training, and evaluation to deployment in production.
• Contribute to developing and fine-tuning large language models (LLMs), integrating them into our existing frameworks.
• Collaborate with cross-functional teams to integrate models into production environments, including data engineers, software developers, and domain experts.
• Perform rigorous evaluation and experimentation to ensure model accuracy and robustness.
• Stay current with the latest research in machine learning, deep learning, graph neural network models, and NLP to continuously improve model performance.
• Document model development processes and experimental results and maintain code repositories using best practices in version control.


Qualifications

Basic Qualifications:
• PhD in computational biology, biophysics, or computational chemistry.
• 2+ years of experience in developing machine learning models and algorithms.
• Proficiency in Python and experience with machine learning frameworks such as TensorFlow, PyTorch, or similar.
• Strong knowledge of graph theory and experience with graph-based models and algorithms.
• Proven experience with large language models (LLMs) and familiarity with various open-source transformer-based architectures.
• Excellent understanding of algorithms, data structures, and complexity analysis.
• Strong analytical and problem-solving skills, with a keen attention to detail.
• Ability to work independently and collaboratively within a team setting.

Preferred Qualifications:
• Experience with cloud computing platforms such as AWS, GCP, or Azure.
• Advanced knowledge of statistical analysis and data visualization tools, such as R, SAS, or Matplotlib.
• Experience with natural language processing (NLP) techniques and applications.
• Familiarity with version control systems, particularly Git.
• Understanding of MLOps principles and best practices for deploying machine learning models in production environments.
• Experience in feature engineering, model selection, and hyperparameter tuning.
• Strong publication record in machine learning, data science, or related fields, including contributions to conferences and journals.
• Excellent communication skills, with the ability to explain complex technical concepts to non-technical stakeholders.
• Experience as a prompt engineer, including crafting and optimizing prompts for LLMs to improve model performance and relevance.

This is a remote position; however, candidates must be legally authorized to work and receive a salary in the United States. This includes individuals on OPT, H-1B visas, or other valid work authorizations.


Start date

As soon as possible

How to Apply

Please send your CV in a PDF file to Ms. Yi Wu (wuy@insilicom.com). In your email, please clearly address the following questions:

1) Do you hold a PhD degree in computational biology, biophysics, or computational chemistry?
2) Do you have over 2 years of experience developing machine learning models and algorithms?
3) Are you proficient in Python, and do you have experience with machine learning frameworks such as TensorFlow, PyTorch, or similar tools?
4) Are you legally authorized to work in the United States? This includes individuals on OPT, H-1B visas, or other valid work authorizations.

Applications will be reviewed on a rolling basis, and early submissions are strongly encouraged, as we aim to begin the review and decision-making process as soon as possible.