InstructLab

A new community-based approach to build truly open-source LLMs

InstructLab Architecture & Implementation Overview

InstructLab logo

Introduction

InstructLab is a complex project that spans multiple, involved components, each serving a different part of the workflow. This document aims to provide a high-level overview of the various components, how they are currently organized and related, and the overall flow of control between them.

High-Level Overview

drawing

Repositories

InstructLab is spread across multiple repositories based on function. These repositories all reside within the InstructLab GitHub organization and can be viewed here: InstructLab Repositories.

Here is a quick overview of them:

  • instructlab/taxonomy – This repository holds the taxonomy, where user data that needs to be taught to the model is organized.
  • instructlab/instructlab – This is the Command Line Interface (CLI) repository for InstructLab.
  • instructlab/sdg – This repository contains the synthetic data generation engine, which is responsible for producing training data in the workflow.
  • instructlab/training – This repository contains the main training logic used for multi-phase training of models.
  • instructlab/eval – This repository contains the logic for the evaluation component, responsible for running benchmarks and producing scores to evaluate the model’s performance after training.
  • instructlab/quantize – This is a helper repository used to quantize (shrink) models.

Workflow

drawing

  • The InstructLab taxonomy is the first point of entry for users interacting with InstructLab. Users make a contribution to their local taxonomy clone in the form of a skill or knowledge that they want their model to learn.
    • A WIP UI is being developed to allow users to interact with their taxonomy more intuitively.
  • Once the taxonomy is updated, users can then use the CLI to interact with the other components.
  • The first step is to initiate synthetic data generation. This step takes the seed examples provided by the user in their taxonomy contribution and generates synthetic data samples based on them.
  • Once the data is generated, users can start training their model using that data. This process produces several checkpoints based on the number of epochs run during training.
  • Finally, users can run an evaluation on a chosen checkpoint to gauge objective performance. The evaluation suite includes standardized benchmarks that allow comparison of the model’s performance against other models evaluated against those benchmarks.

Component-Wise Breakdown

Serving

drawing

The serving component is responsible for starting an OpenAI-compatible server that hosts the model, allowing users to interact with it, e.g., for chatting. InstructLab supports two serving backends: llama.cpp and vLLM.

  • llama.cpp: Designed to be laptop-friendly due to being relatively less resource-intensive. It is supported on both macOS and Linux and serves models of type .gguf.
  • vLLM: : More compute-intensive and supports serving models across multiple GPUs. It is the preferred runtime for model serving on server-grade hardware, supported only on Linux, and serves models of type .safetensors.

There is existing logic within the InstructLab CLI to automatically pick the right serving backend based on the supplied model and the environment it is running in.

Once a model is served, it can be used either for chatting or data generation.

Data Generation

drawing

The synthetic data generation logic is contained almost entirely within instructlab/sdg. The process requires supplying a "teacher" model, responsible for generating the synthetic data based on the seed examples provided in the taxonomy.

This step depends on the serving module, as the teacher model needs to be hosted and accessible via a server before data generation can occur.

This step also includes data mixing. The output of this step is the creation of knowledge and skill datasets fed into the training module.

Training

drawing

The training module is somewhat split, with some parts of the overall training logic contained within the CLI itself and some within a separate repository.

The training logic is currently broadly divided between CPU and GPU-enabled training loops.
The training module expects the training data as input, along with the model that needs to be trained.

Consumer-Grade Training

This side focuses more on the laptop use case. On macOS, training is performed using Apple’s MLX library, optimized for Apple silicon. As of newer releases, InstructLab has focused on the PyTorch MPS device. Using MPS is a much more sustainable practice as is integrates with more typical training loops. As such, the recommendation is that macOS users should use MacBooks with M1 or newer chips.

On Linux, training relies on Hugging Face’s SFT Trainer implementation.

Windows is currently not supported.

Server-Grade Training

Users with access to GPU-accelerated hardware can leverage the full fine-tuning training loop contained in instructlab/training. This repository uses PyTorch for the training loop and is optimized for Nvidia hardware by leveraging CUDA kernels and APIs. It also uses DeepSpeed to perform distributed training across all available GPUs.

The result after training is typically several checkpoints captured at specified intervals during the training process.

Evaluation

drawing

The evaluation logic is entirely contained within instructlab/eval. The evaluation component is responsible for performing a number of standardized benchmarks against the chosen checkpoint after training and producing objective scores to compare the model’s performance before and after training.

InstructLab evaluation includes four benchmarks:

This component borrows code from FastChat and also leverages Lm-evaluation-harness for running MMLU tasks.

Model Conversion and Quantization (Optional)

All training takes place on models in .safetensors format. Once a trained and evaluated model is available, users on macOS or users with limited compute resources may want to serve the trained model and chat with it. In this case, users may convert their trained models back to .gguf format and optionally quantize them to 4 bits. Quantization is handled by instructlab-quantize, which contains pre-compiled binaries of llama.cpp‘s quantization script for various platforms. The appropriate binary is chosen based on the user’s environment, and the supplied model is quantized using the Q4_K_M method by default.

Written by

Join the InstructLab Community!

Come learn how to customize your own LLM, play and learn AI technology, and help us build the open source tools that make it all possible!