CogitatorTech/zigformer
An educational transformer-based LLM in pure Zig
v0.2.2.tar.gzZigFormer is a fully functional implementation of a transformer-based large language model (LLM) written in Zig programming language. It aims to provide a clean, easy-to-understand LLM implementation with no large dependencies like PyTorch or TensorFlow. ZigFormer was mainly made for learning how a conventional transformer-based LLM works under the hood and is inspired by Andrej Karpathy's nanoGPT and nanochat projects, and follows the architecture described in the "Attention Is All You Need" and "Language Models are Unsupervised Multitask Learners" papers. It can be used as a Zig library for building LLMs or as a standalone application for training, inference, and chatting with the model.
The diagrams below show the high-level architecture and its core components.
See the ROADMAP.md for the list of implemented and planned features.
IMPORTANT ZigFormer is in early development, so bugs and breaking changes are expected. Please use the issues page to report bugs or request features.
You can get started with ZigFormer by following the steps below.
git clone https://github.com/CogitatorTech/zigformer.git
cd zigformer
zig build
IMPORTANT ZigFormer is developed and tested with Zig 0.15.2. It should work with newer versions, but it is not guaranteed.
zig build run -- --save-model model.bin
This will:
datasets/simple_dataset/model.binTraining parameters can be given through a configuration file or CLI arguments.
zig build run -- --config my_config.json
Sample CLI Configuration:
{
"pretrain_path": "datasets/simple_dataset/pretrain.json",
"train_path": "datasets/simple_dataset/train.json",
"pre_epochs": 10,
"chat_epochs": 10,
"batch_size": 32,
"accumulation_steps": 1,
"pre_lr": 0.0005,
"chat_lr": 0.0001,
"save_model_path": "model.bin",
"interactive": true
}
IMPORTANT A saved model only works with the model with the same configuration.
You can run the web-based UI to chat with the trained model:
zig build run-gui -- --load-model model.bin
The UI can be accessed at http://localhost:8085 by default.
You can also provide a configuration file for the UI:
zig build run-gui -- --config gui_config.json
Sample Web UI Configuration:
{
"port": 8085,
"host": "0.0.0.0",
"pretrain_path": "datasets/simple_dataset/pretrain.json",
"train_path": "datasets/simple_dataset/train.json",
"load_model_path": "model.bin",
"max_request_size": 1048576,
"max_prompt_length": 1000,
"timeout_seconds": 30
}
zig build run -- --help
zig build run -- predict --help
zig build run-gui -- --help
# Train the model (using the default dataset and save the weights to 'model.bin')
zig build run -- --save-model model.bin
# Generate coherent text (using Beam search with beam width of 5)
zig build run -- predict --prompt "How do mountains form?" --beam-width 5
# Generate more diverse text (using top-k sampling with k=5)
zig build run -- predict --prompt "How do mountains form?" --top-k 5 --load-model model.bin
# Launch the web UI server and chat with the trained model on http://localhost:8085
zig build run-gui -- --load-model model.bin
You can find the full API documentation for the latest release of ZigFormer here.
Contributions are always welcome! See CONTRIBUTING.md for details on how to make a contribution.
ZigFormer is licensed under the MIT License (see LICENSE).