The Cancer Data Science Laboratory (www.caravagnalab.org) at the University of Trieste, Italy, is offering a 2 years position to support the research project: Machine learning algorithms for single-cell genomics from long-reads sequencing
= Application details
- Eligibility: Scholars with a scientific-professional curriculum suitable for research activities.
- Requirements: A master degree; the committee determines the equivalence of foreign degrees based on Italian regulations. A PhD title is not compulsory, but is preferential for this post. The applicants are not required to be proficient in Italian.
- Procedure: The selection is managed by the University of Trieste, and the application should be submitted through the PICA system following this URL: pica.cineca.it/units/23ar1247-20prinpnrr/. Please provide at least 3 potential contacts for references in the CV submitted with your PICA application.
- Deadline: Strictly before 20th January 2024.
- Duration and start: The post is covered by 2 years of funds, and is initially offered for 1 year, but can be extended to a second year. The selection process will take about one month, and the post is required to start as soon the selection procedure is finished.
- Salary: € 26.022,88 EUR/year gross, that should give a net ~1600 EUR/month.
Inquiries: gcaravagna@units.it
= The project
This Computer Science project is developed by Prof Giulio Caravagna (scholar.google.com/citations?user=iktXWosAAAAJ) and Dr Alberto Cazzaniga (co-PI) from the Laboratory for Data Engineering of Area Science Park (Trieste, Italy), and is funded with ~250.000 EUR by the 2022 PNRR call of the Ministry of University and Research, Italy.
We seek to develop Machine Learning (ML) algorithms to exploit the recent convergence of single-cell (SC, PMID: 32855414) and long-read (LR, PMID: 32033565) sequencing technologies, which we expect to revolutionise our understanding of diseases like cancer. These new technologies come however with many unsolved analysis challenges for which efficient technology-specific ML tools do not exist yet. We therefore focus this project on the generation of new SC/LR ML tools, which we will apply to new data generated within the project thanks to a strict partnership with the Genomics and Epigenomics Laboratory of Area Science Park. The project is therefore extremely timely, well-organised, and aims at creating a new generation of ML tools for a groundbreaking technology.
Previous experience with SC/LR data is not mandatory, but will be a strong plus if available.
= Your role and what you will learn
- develop new ML models for LR/SC sequencing technologies, under the supervision of the PI and co-PI, and in collaboration with another scientist hired, for this same project, by the co-PI unit;
- gather, curate and maintain data-analysis pipelines for data generated in this project, and for data collected from the public domain or collaborators;
- participate in the project organisation, implementing tasks in a team-work fashion;
- contributing to the overall laboratory activity, in collaboration with other scientists and students
= What you will learn:
- to master state-of-the-art computational modelling of sequencing data;
- to develop advanced probabilistic ML models in probabilistic languages such as Pyro and STAN;
- to develop professional software tools in R/Python;
- based on seniority, to co-mentor data science students (from MSc to PhDs).
= The hosting Lab
The Caravagna and Cazzaniga labs have top-notch expertise in Bayesian probabilistic modelling, Deep Learning and software development for bioinformatics. The Cancer Data Science Laboratory started in summer 2020, and has 7 PhD students and 2 postdocs involved in a number of ongoing collaborative projects, as well as several grants to sustain data-intensive research. We develop, beginning to end, data science and artificial intelligence methods to statistical patterns across time and space from sequencing data. Among the most successful tools we have developed, there are those that combine Machine Learning and mathematical modelling. We work with a variety of collaborators in Italy and abroad, and we are involved in many graduate and post-graduate initiatives in data science in Trieste. Engaging with strong experimental units, we work on real-world problems and data, contributing from experimental design to data analysis.
= The Trieste ecosystem
Trieste is an excellent place for research not only because of its prestigious institutions (UNTIS, Area, SISSA, ICTP), but also due to its high quality of life and cheap lifestyle. Researchers enjoy a vibrant intellectual atmosphere amidst the city's historical charm, cultural diversity, and relaxed Mediterranean lifestyle. The city's seaside location offers beautiful landscapes and outdoor activities, enhancing the overall living experience. Additionally, Trieste's location at the crossroads of Latin, Germanic, and Slavic cultures makes it a unique melting pot of ideas and influences, fostering a creative and dynamic research environment. This blend of professional opportunities and enjoyable living conditions make Trieste an appealing destination for scientists and academics worldwide.
The ideal candidate:
- a STEM-trained computational scientist with a strong interest in applying his/her skills to real biological data;
- capable of inventing, implementing and deploying ML models using probabilistic frameworks such as PyTorch etc;
- capable to carry out bioinformatics analysis of next generation sequencing data using an high-performance computing environment.
Evidence of these features should be made clear in the application process, in the interview phase, and using reference letters.
The selection is managed by the University of Trieste, and the application should be submitted through the PICA system following this URL: pica.cineca.it/units/23ar1247-20prinpnrr/. Please provide at least 3 potential contacts for references in the CV submitted with your PICA application.