Predictive models built with artificial intelligence (AI) methods are powerful tools to discover molecules with the potential to become drugs to treat a given disease. These models can leverage training datasets to identify such drug leads by computational (virtual) screening of massive libraries of molecules.
In particular, AI models can be trained on atomic-resolution structures of macromolecular targets and the activities of their cognate molecules to predict the activities of other molecules across targets.
Despite important successes, there are major challenges limiting the potential of such AI models. Some are specific to this problem (e.g. how to augment training datasets in a way that improves the performance of these models). Other challenges are also found in other supervised learning problems (e.g. anticipating how well the models perform outside their applicability domain).
This PhD project aims to make progress toward overcoming these challenges using both synthetic and real datasets. The successful applicant will join the group of Pedro Ballester at Imperial College London, and the PhD will be carried out under his direct supervision.
Relevant papers from the group
• www.nature.com/articles/s41596-023-00885-w
• www.nature.com/articles/d41586-023-03948-w
• www.sciencedirect.com/science/article/pii/S2090123224000377
• link.springer.com/article/10.1186/s13321-024-00832-1
• onlinelibrary.wiley.com/doi/abs/10.1002/wcms.1478
• onlinelibrary.wiley.com/doi/10.1002/wcms.1465
• wires.onlinelibrary.wiley.com/doi/10.1002/wcms.1225
• doi.org/10.1016/j.ddtec.2020.09.001
• dx.doi.org/10.1093/bioinformatics/btz183
What We Offer
The studentship covers:
• Living expenses at an enhanced tax-free rate of £23,805 per year.
• PhD tuition fees of £31,100 per year.
Funding is for three years, with the possibility of extension to a fourth year.
This is an exciting opportunity for a bright and motivated scientist to work on a timely and important data science problem with strong therapeutic relevance.
The student will join the Ballester Group (ballestergroup.github.io/) at the Department of Bioengineering at Imperial College London, which provides an international and stimulating research environment.
In terms of student experience, London has been ranked the best city in the world for university students (www.topuniversities.com/city-rankings/2026).
Selection Criteria
Essential
• University degree(s) awarded in an area directly relevant to the project.
• Courses in the application of machine learning algorithms to scientific problems.
• Excellent grades in first and/or master’s degrees, especially in research projects with a strong focus on computational data analysis.
• Skilled in implementing Python or R code for scientific data analysis.
• English language proficiency requirements (see: www.imperial.ac.uk/study/pg/apply/requirements/english/).
Desirable
• Research projects applying supervised learning to solve real-world biomedical problems, especially virtual screening.
• Experience with open-source chemical informatics toolkits (e.g. RDKit, OpenBabel).
• Experience with machine learning platforms (e.g. DeepChem, TorchDrug, Scikit-Learn, Caret).
• Exposure to structural biology databases (e.g. PDBe, AlphaFold, PDBbind).
• Experience with medicinal chemistry databases (e.g. ChEMBL, SureChEMBL, PubChem, ZINC).
• Exposure to machine learning methods applied to drug design (e.g. QSAR).
• Experience with computational chemistry software (e.g. Vina, DOCK).
Candidates must send an email including:
• CV.
• Grades for each completed university degree.
• A covering letter (maximum two pages).
Please send the application to: p.ballester@imperial.ac.uk
Subject line: “PhD in AI for SBVS”
The covering letter must explain:
• How you meet the essential selection criteria.
• Any desirable criteria that you also meet.
• How this position fits with your future career plans.
The email must also include:
• Names and email addresses of two academic referees who can comment on your academic performance.
Please also mention where you saw this position advertised.