Principal Software Engineer
Waterford City, Ireland
InstructLab Data Ingestion Team
This is a software engineering role that involves designing, extending, improving, and maintaining open source codebases for the InstructLab project that assists in the preprocessing of data for InstructLab. A key project for this role is the docling project. Experience working in upstream, open source, community-based projects, ideally as project maintainer, is key. Engineers in this role will need to adhere to coding best practices and standards, including well-documented, scalable, and efficient code. Experience with tooling for data collection, data streaming APIs, data preprocessing, data cleansing and formatting, working with large datasets, and using distributed and cloud-based processing of data will be ideal. Experience with ML/AI frameworks and vector databases will be helpful for this role.
Note “Apply Now” job descriptions are the official job postings.