Computational pangenomics

Wageningen University
Bioinformatics Group
Netherlands GE Wageningen


Are you looking for a PhD position and interested in algorithm development for genomics? Do you want to innovate methods for computational pangenomics and mine massive amounts of genomics data? Then this PhD vacancy may be of interest to you .

In bioinformatics, traditional approaches to compare genomes, centered on a single reference, no longer suffice when faced with hundreds to thousands of genomes. The field of genomics is therefore switching to so-called pangenomics. PanTools is a platform developed in the Bioinformatics Group that uniquely integrates a sequence-based and a gene-based pangenome representation. Resting on a compacted De Bruijn graph sequence representation, PanTools has been extended with functions to accommodate comparative genomic analyses at different levels of genomic organization, from genes to gene clusters and syntenic regions. However, the single underlying sequence representation is increasingly becoming a bottleneck in the development of new, scalable algorithms. There is a need for a novel, space-efficient sequence representation of genomes at various levels.

As a PhD student you will work on addressing this challenge, by:
- performing independent research, supported by your supervisors and the Comparative Pangenomics team;
- developing software contributing to the improvement of PanTools;
- writing publications on your scientific work, culminating in a PhD thesis.
In particular, you will investigate, select and implement an efficient representation of DNA and protein sequence in PanTools; extend this representation to allow the incorporation of incomplete/ fragmented genome representations; and add layers to this representation to represent genome similarity over longer evolutionary distances.

The research is embedded in the Comparative Pangenomics team led by dr. Sandra Smit. The PhDs and postdocs in the team work on different organisms, but all contribute to the common aim of method development for pangenomics. The Bioinformatics Group focuses on fundamental and applied bioinformatics research in the green life sciences. In particular, they develop and apply novel computational methods for the analysis and integration of –omics data. The group has a strong track record in genome analysis (including evolutionary genomics), algorithm development (including the development and application of machine learning approaches to study proteins), and tool construction. There are many national and international collaborations with researchers studying molecular evolution and the application of machine learning in bioinformatics.


You are driven to solve biological challenges using your computer science expertise. You are also an enthusiastic team player, with well-developed communication and collaboration skills. You also possess:
- a successfully completed MSc degree in computer science, bioinformatics, or a related discipline;
- experience in the development of algorithms, preferably for the analysis of (gen)omics data;
- a demonstrable proficiency in programming (in Java and Python);
- a collaborative attitude towards algorithm/software development;
- affinity for the green life sciences;
- perseverance in problem solving;
- good writing and oral communication skills in English.
For this position your command of the English language is expected to be at C1 level. Sometimes it is necessary to submit an internationally recognised Certificate of Proficiency in the English Language.

Start date

As soon as possible

How to Apply

For more information and an online application form (closing April 15, 2024), please visit


Dick de Ridder