Tag: Data Ingestion Team
This team works on tooling for the pre-processing and ingestion of data into InstructLab including documents (docling, vision support), code, multimedia, and 3rd-party plugins.
-
Software Quality Engineer
Waterford City, Ireland
InstructLab Data Ingestion TeamThis quality engineer will be focused on the quality of software that handles data preprocessing and data ingestion for InstructLab. They will be responsible for building a test suite and test automation for data ingestion tooling, and they will be responsible for evaluating how the quality of the synthetic data output is impacted by changes to data ingestion related codebases. Quality engineers who have an interest / familiarity with data science and/or machine learning would be amazing candidates for this role.
Note “Apply Now” job descriptions are the official job postings.
-
Principal Software Engineer
Waterford City, Ireland
InstructLab Data Ingestion TeamThis is a software engineering role that involves designing, extending, improving, and maintaining open source codebases for the InstructLab project that assists in the preprocessing of data for InstructLab. A key project for this role is the docling project. Experience working in upstream, open source, community-based projects, ideally as project maintainer, is key. Engineers in this role will need to adhere to coding best practices and standards, including well-documented, scalable, and efficient code. Experience with tooling for data collection, data streaming APIs, data preprocessing, data cleansing and formatting, working with large datasets, and using distributed and cloud-based processing of data will be ideal. Experience with ML/AI frameworks and vector databases will be helpful for this role.
Note “Apply Now” job descriptions are the official job postings.