ZigFormer

An educational transformer-based LLM in pure Zig

ZigFormer is a fully functional implementation of a transformer-based large language model (LLM) written in Zig programming language. It aims to provide a clean, easy-to-understand LLM implementation with no large dependencies like PyTorch or TensorFlow. ZigFormer was mainly made for learning how a conventional transformer-based LLM works under the hood and is inspired by Andrej Karpathy's nanoGPT and nanochat projects, and follows the architecture described in the "Attention Is All You Need" and "Language Models are Unsupervised Multitask Learners" papers. It can be used as a Zig library for building LLMs or as a standalone application for training, inference, and chatting with the model.

The diagrams below show the high-level architecture and its core components.

Features

Implements core transformer architecture with multi-head self-attention
Supports both pretraining and instruction fine-tuning
Provides multiple decoding strategies (like greedy and beam search)
Includes a CLI for training and inference
Has a web-based UI for chatting with the model
Supports model checkpointing and the use of configuration files

See the ROADMAP.md for the list of implemented and planned features.

IMPORTANT ZigFormer is in early development, so bugs and breaking changes are expected. Please use the issues page to report bugs or request features.

Getting Started

You can get started with ZigFormer by following the steps below.

Installation

git clone https://github.com/CogitatorTech/zigformer.git
cd zigformer
zig build

IMPORTANT ZigFormer is developed and tested with Zig 0.15.2. It should work with newer versions, but it is not guaranteed.

Training a Model

zig build run -- --save-model model.bin

This will:

Load the training datasets from datasets/simple_dataset/
Build a vocabulary of tokens from the data
Pretrain the model on the pretraining examples (raw text)
Train (or fine-tune) the model on question-answer pairs
Save the trained model to model.bin

Training parameters can be given through a configuration file or CLI arguments.

zig build run -- --config my_config.json

Sample CLI Configuration:

{
    "pretrain_path": "datasets/simple_dataset/pretrain.json",
    "train_path": "datasets/simple_dataset/train.json",
    "pre_epochs": 10,
    "chat_epochs": 10,
    "batch_size": 32,
    "accumulation_steps": 1,
    "pre_lr": 0.0005,
    "chat_lr": 0.0001,
    "save_model_path": "model.bin",
    "interactive": true
}

IMPORTANT A saved model only works with the model with the same configuration.

Using the Web UI

You can run the web-based UI to chat with the trained model:

zig build run-gui -- --load-model model.bin

The UI can be accessed at http://localhost:8085 by default.

You can also provide a configuration file for the UI:

zig build run-gui -- --config gui_config.json

Sample Web UI Configuration:

{
    "port": 8085,
    "host": "0.0.0.0",
    "pretrain_path": "datasets/simple_dataset/pretrain.json",
    "train_path": "datasets/simple_dataset/train.json",
    "load_model_path": "model.bin",
    "max_request_size": 1048576,
    "max_prompt_length": 1000,
    "timeout_seconds": 30
}

Available Options (CLI and Web UI)

zig build run -- --help
zig build run -- predict --help
zig build run-gui -- --help

Example Usage

# Train the model (using the default dataset and save the weights to 'model.bin')
zig build run -- --save-model model.bin

# Generate coherent text (using Beam search with beam width of 5)
zig build run -- predict --prompt "How do mountains form?" --beam-width 5

# Generate more diverse text (using top-k sampling with k=5)
zig build run -- predict --prompt "How do mountains form?" --top-k 5 --load-model model.bin

# Launch the web UI server and chat with the trained model on http://localhost:8085
zig build run-gui -- --load-model model.bin

Documentation

You can find the full API documentation for the latest release of ZigFormer here.

Contributing

Contributions are always welcome! See CONTRIBUTING.md for details on how to make a contribution.

License

ZigFormer is licensed under the MIT License (see LICENSE).

Acknowledgements

The logo is from SVG Repo with some modifications.
This project uses the Chilli CLI framework.

Dependencies

README