As a Data Engineer, you will play a crucial role in building and scaling data operations for a pioneering ML-driven target discovery platform in the plant sciences domain. Your primary responsibility will involve designing and constructing scalable ETL pipelines for diverse data sources, including academic literature, multi-omics, and population genetics. Managing large-scale neo4j graph databases, ensuring data quality, and facilitating secure database access will be key aspects of your role. Collaborating closely with internal ML and experimental teams, as wells as external stakeholders, to ensure seamless integration between data generation, validation, and analytics tools will be essential.
What the Company Hiring is Looking For
The ideal candidate will hold an MSc or PhD in Computer Science, Engineering, Computational Biology, or a related field. You should have significant experience with contemporary data engineering tools and practices, including data governance and versioning. Proficiency in building scalable ETL pipelines for multi-modal and large datasets, along with strong scripting skills in Bash and Python, is required. Experience with deep learning frameworks, particularly PyTorch, and cloud environments is essential. Excellent communication skills and the ability to collaborate effectively with interdisciplinary teams are a must.
Desirable skills
3 years experience in managing ML or AI teams in the Life Science industry
A proven track record of successful leadership
Familiarity with various neural network architectures
Expertise in protein bioinformatics or molecular modeling software.
Knowledge of graph algorithms, graph convolutional neural networks, data pipelining, cloud-based ML model deployment, and Agile systems will be advantageous.
A relevant publication record
Experience with neo4j/cypher are also desirable.
What the Company Hiring is Offering
A competitive salary & generous stock option plan.
Opportunity for leadership from day 1 and career development within our growing start-up
The opportunity to direct the creation of the world’s largest perturbative genomics dataset in the plant sciences
Ownership of fast-moving, ambitious projects with real-world impact.
A fun, flexible and supportive work environment in the centre of London with an emphasis on team performance and personal development
Conferences, events, and training resources
Resources to build a network of advisors and peers who are leaders in the field of graph-ML and life sciences
Unlimited days holiday