theseyan/bufzilla
Fast & compact serialization format in Zig.
buffer • zilla
A compact and fast binary encoding format in pure Zig. Originally based on rxi's article - "A Simple Serialization System".
bufzilla is ideal for serializing JSON-like objects and arrays, and has the following qualities:
Inspect API.0.15.2zig fetch https://github.com/theseyan/bufzilla/archive/refs/tags/{VERSION}.tar.gz
Copy the hash generated and add bufzilla to your build.zig.zon:
.{
.dependencies = .{
.bufzilla = .{
.url = "https://github.com/theseyan/bufzilla/archive/refs/tags/{VERSION}.tar.gz",
.hash = "{HASH}",
},
},
}
bufzilla simply takes a std.Io.Writer interface, and writes encoded data to it. Such a writer can be backed by a growing buffer, a fixed array, a file, or a network socket, etc.
Use std.Io.Writer.Allocating when you need a dynamically growing buffer:
const std = @import("std");
const Io = std.Io;
const Writer = @import("bufzilla").Writer;
// Create an allocating writer
var aw = Io.Writer.Allocating.init(allocator);
defer aw.deinit();
// Initialize bufzilla writer
var writer = Writer.init(&aw.writer);
const DataType = struct {
a: i64,
b: struct {
c: bool,
},
d: []const union(enum) {
null: ?void,
f64: f64,
string: []const u8,
},
};
const data = DataType{
.a = 123,
.b = .{ .c = true },
.d = &.{ .{ .f64 = 123.123 }, .{ .null = null }, .{ .string = "value" } },
};
try writer.writeAny(data);
// Get the encoded bytes
const encoded = aw.written();
std.debug.print("Encoded {d} bytes\n", .{encoded.len});
Use std.Io.Writer.fixed to prevent dynamic allocations when you know the maximum size upfront:
var buffer: [1024]u8 = undefined;
var fixed = Io.Writer.fixed(&buffer);
var writer = Writer.init(&fixed);
try writer.writeAny("hello");
try writer.writeAny(@as(i64, 42));
const encoded = fixed.buffered();
You can also build messages incrementally:
var writer = Writer.init(&aw.writer);
try writer.startObject();
try writer.writeAny("name");
try writer.writeAny("Alice");
try writer.writeAny("scores");
try writer.startArray();
try writer.writeAny(@as(i64, 100));
try writer.writeAny(@as(i64, 95));
try writer.endContainer(); // end array
try writer.endContainer(); // end object
The Inspect API renders encoded bufzilla data as pretty-printed JSON:
const Inspect = @import("bufzilla").Inspect;
// Output to an allocating writer
var aw = Io.Writer.Allocating.init(allocator);
defer aw.deinit();
var inspector = Inspect(.{}).init(encoded_bytes, &aw.writer, .{});
try inspector.inspect();
std.debug.print("{s}\n", .{aw.written()});
Or output directly to a fixed buffer:
var buffer: [4096]u8 = undefined;
var fixed = Io.Writer.fixed(&buffer);
var inspector = Inspect(.{}).init(encoded_bytes, &fixed, .{});
try inspector.inspect();
std.debug.print("{s}\n", .{fixed.buffered()});
Output:
{
"a": 123,
"b": {
"c": true
},
"d": [
123.12300000000000,
null,
"value"
]
}
The Reader provides zero-copy access to encoded data:
const Reader = @import("bufzilla").Reader;
var reader = Reader(.{}).init(encoded_bytes);
// Read values sequentially
const val = try reader.read();
switch (val) {
.object => { /* iterate object */ },
.array => { /* iterate array */ },
.i64 => |n| std.debug.print("int: {d}\n", .{n}),
.bytes => |s| std.debug.print("string: {s}\n", .{s}),
// ... other types
}
// Or iterate containers
while (try reader.iterateObject(obj)) |kv| {
// kv.key and kv.value
}
You can find more examples in the unit tests.
When reading untrusted data, bufzilla provides configurable limits at compile time to prevent infinite recursion/stack overflow errors, with negligible performance loss.
const Reader = @import("bufzilla").Reader;
// Default limits
var reader = Reader(.{}).init(data);
// Custom limits
var reader = Reader(.{
.max_depth = 50, // Max nesting depth
.max_bytes_length = 1024 * 1024, // Max string/binary blob size
.max_array_length = 10_000, // Max array elements
.max_object_size = 10_000, // Max object key-value pairs
}).init(data);
// Unlimited depth
var reader = Reader(.{ .max_depth = null }).init(data);
| Limit | Default | Error |
|---|---|---|
max_depth |
2048 | MaxDepthExceeded |
max_bytes_length |
unlimited | BytesTooLong |
max_array_length |
unlimited | ArrayTooLarge |
max_object_size |
unlimited | ObjectTooLarge |
Notes:
max_array_length and max_object_size require max_depth to be set. Setting them with max_depth = null is a compile error.max_depth for iteration counters when array/object limits are enabled. Keep max_depth reasonable (default 2048 uses ~16KB).The Inspect API also accepts limits as a parameter:
var inspector = Inspect(.{ .max_depth = 100 }).init(data, &writer, .{});
Comprehensive unit tests are present in the test/ directory.
zig build test
Run the benchmark suite with zig build bench -Doptimize=ReleaseFast.
bufzilla is competitive in performance with similar self-describing binary formats, and in most cases, much faster. Avg. throughput (ops/sec): bufzilla vs zbor vs zig-msgpack
Results on x86_64 Linux, Ryzen 7 9700X CPU:
Basic Types:
--------------------------------------------------------------------------------
Null Write | 1000000 iterations | 0 ns/op | 0 ops/sec
Null Read | 1000000 iterations | 2 ns/op | 500000000 ops/sec
Bool Write | 1000000 iterations | 0 ns/op | 0 ops/sec
Bool Read | 1000000 iterations | 2 ns/op | 500000000 ops/sec
Small Int Write | 1000000 iterations | 2 ns/op | 500000000 ops/sec
Small Int Read | 1000000 iterations | 2 ns/op | 500000000 ops/sec
Large Int Write | 1000000 iterations | 2 ns/op | 500000000 ops/sec
Large Int Read | 1000000 iterations | 2 ns/op | 500000000 ops/sec
Float Write | 1000000 iterations | 2 ns/op | 500000000 ops/sec
Float Read | 1000000 iterations | 1 ns/op | 1000000000 ops/sec
Strings:
--------------------------------------------------------------------------------
Short String Write (5 bytes) | 500000 iterations | 4 ns/op | 250000000 ops/sec
Short String Read (5 bytes) | 500000 iterations | 2 ns/op | 500000000 ops/sec
Medium String Write (~300 bytes) | 100000 iterations | 4 ns/op | 250000000 ops/sec
Medium String Read (~300 bytes) | 100000 iterations | 2 ns/op | 500000000 ops/sec
Binary Data:
--------------------------------------------------------------------------------
Small Binary Write (32 bytes) | 500000 iterations | 6 ns/op | 166666666 ops/sec
Small Binary Read (32 bytes) | 500000 iterations | 2 ns/op | 500000000 ops/sec
Large Binary Write (1KB) | 100000 iterations | 9 ns/op | 111111111 ops/sec
Large Binary Read (1KB) | 100000 iterations | 2 ns/op | 500000000 ops/sec
Arrays:
--------------------------------------------------------------------------------
Small Array Write (10 elements) | 100000 iterations | 25 ns/op | 40000000 ops/sec
Small Array Read (10 elements) | 100000 iterations | 27 ns/op | 37037037 ops/sec
Medium Array Write (100 elements) | 50000 iterations | 260 ns/op | 3846153 ops/sec
Medium Array Read (100 elements) | 50000 iterations | 243 ns/op | 4115226 ops/sec
Objects (Maps):
--------------------------------------------------------------------------------
Small Object Write (10 entries) | 100000 iterations | 143 ns/op | 6993006 ops/sec
Small Object Read (10 entries) | 100000 iterations | 116 ns/op | 8620689 ops/sec
Medium Object Write (50 entries) | 50000 iterations | 712 ns/op | 1404494 ops/sec
Medium Object Read (50 entries) | 50000 iterations | 629 ns/op | 1589825 ops/sec
Complex Structures:
--------------------------------------------------------------------------------
Nested Structure Write | 50000 iterations | 36 ns/op | 27777777 ops/sec
Nested Structure Read | 50000 iterations | 71 ns/op | 14084507 ops/sec
Mixed Types Write | 50000 iterations | 31 ns/op | 32258064 ops/sec
Mixed Types Read | 50000 iterations | 57 ns/op | 17543859 ops/sec
Struct Serialization:
--------------------------------------------------------------------------------
Simple Struct Write | 100000 iterations | 29 ns/op | 34482758 ops/sec
Simple Struct Read | 100000 iterations | 46 ns/op | 21739130 ops/sec
Complex Struct Write | 50000 iterations | 136 ns/op | 7352941 ops/sec
Complex Struct Read | 50000 iterations | 272 ns/op | 3676470 ops/sec