pyk/bench
Fast & Accurate Benchmarking for Zig
Fast & Accurate Benchmarking for Zig
Fetch latest version:
zig fetch --save=bench https://github.com/pyk/bench/archive/main.tar.gz
Then add this to your build.zig:
const bench = b.dependency("bench", .{
.target = target,
.optimize = optimize,
});
// Use it on a module
const mod = b.createModule(.{
.target = target,
.optimize = optimize,
.imports = &.{
.{ .name = "bench", .module = bench.module("bench") },
},
});
// Or executable
const my_bench = b.addExecutable(.{
.name = "my-bench",
.root_module = b.createModule(.{
.root_source_file = b.path("bench/my-bench.zig"),
.target = target,
.optimize = .ReleaseFast,
.imports = &.{
.{ .name = "bench", .module = bench.module("bench") },
},
}),
});
If you are using it only for tests/benchmarks, it is recommended to mark it as lazy:
.dependencies = .{
.bench = .{
.url = "...",
.hash = "...",
.lazy = true, // here
},
}
To benchmark a single function, pass the allocator, a name, and the function
pointer to run.
const res = try bench.run(allocator, "My Function", myFn, .{});
try bench.report(.{ .metrics = &.{res} });
You can generate test data before the benchmark starts and pass it via a tuple. This ensures the setup cost doesn't pollute your measurements.
// Setup data outside the benchmark
const input = try generateLargeString(allocator, 10_000);
// Pass input as a tuple
const res = try bench.run(allocator, "Parser", parseFn, .{input}, .{});
You can run multiple benchmarks and compare them against a baseline. The
baseline_index determines which result is used as the reference (1.00x).
const a = try bench.run(allocator, "Implementation A", implA, .{});
const b = try bench.run(allocator, "Implementation B", implB, .{});
try bench.report(.{
.metrics = &.{ a, b },
// Use the first metric (Implementation A) as the baseline
.baseline_index = 0,
});
If your function processes data (like copying memory or parsing strings),
provide bytes_per_op to get throughput metrics (MB/s or GB/s).
const size = 1024 * 1024;
const res = try bench.run(allocator, "Memcpy 1MB", copyFn, .{
.bytes_per_op = size,
});
// Report will now show GB/s instead of just Ops/s
try bench.report({ .metrics = &.{res} });
You can tune the benchmark behavior by modifying the Options struct.
const res = try bench.run(allocator, "Heavy Task", heavyFn, .{
.warmup_iters = 10, // Default: 100
.sample_size = 50, // Default: 1000
});
The default bench.report prints a human-readable table to stdout. It handles
units (ns, us, ms, s) and coloring automatically.
$ zig build quicksort
Benchmarking Sorting Algorithms Against Random Input (N=10000)...
Benchmark Summary: 3 benchmarks run
├─ Unsafe Quicksort (Lomuto) 358.64us 110.98MB/s 1.29x faster
│ └─ cycles: 1.6M instructions: 1.2M ipc: 0.75 miss: 65
├─ Unsafe Quicksort (Hoare) 383.02us 104.32MB/s 1.21x faster
│ └─ cycles: 1.7M instructions: 1.3M ipc: 0.76 miss: 56
└─ std.mem.sort 462.25us 86.45MB/s [baseline]
└─ cycles: 2.0M instructions: 2.6M ipc: 1.30 miss: 143
The run function returns a Metrics struct containing all raw data (min, max,
median, variance, cycles, etc.). You can use this to generate JSON, CSV, or
assert performance limits in CI.
const metrics = try bench.run(allocator, "MyFn", myFn, .{});
// Access raw fields directly
std.debug.print("Median: {d}ns, Max: {d}ns\n", .{
metrics.median_ns,
metrics.max_ns
});
The run function returns a Metrics struct containing the following data
points:
| Category | Metric | Description |
|---|---|---|
| Meta | name |
The identifier string for the benchmark. |
| Time | min_ns |
Minimum execution time per operation (nanoseconds). |
| Time | max_ns |
Maximum execution time per operation (nanoseconds). |
| Time | mean_ns |
Arithmetic mean execution time (nanoseconds). |
| Time | median_ns |
Median execution time (nanoseconds). |
| Time | std_dev_ns |
Standard deviation of the execution time. |
| Meta | samples |
Total number of measurement samples collected. |
| Throughput | ops_sec |
Calculated operations per second. |
| Throughput | mb_sec |
Data throughput in MB/s (populated if bytes_per_op > 0). |
| Hardware* | cycles |
Average CPU cycles per operation. |
| Hardware* | instructions |
Average CPU instructions executed per operation. |
| Hardware* | ipc |
Instructions Per Cycle (efficiency ratio). |
| Hardware* | cache_misses |
Average cache misses per operation. |
*Hardware metrics are currently available on Linux only. They will be null
on other platforms or if permissions are restricted.
This library is designed to show you "what", not "why". I recommend using a
proper profiling tool such as perf on linux + Firefox Profiler to answer
"why".
doNotOptimizeAway is your friend. For example if you are benchmarking some
scanner/tokenizer:
while (true) {
const token = try scanner.next();
if (token == .end) break;
total_ops += 1;
std.mem.doNotOptimizeAway(token); // CRITICAL
}
To get cycles, instructions, ipc (instructions per cycle) and
cache_misses metrics on Linux, you may need to enable the
kernel.perf_event_paranoid.
Install the Zig toolchain via mise (optional):
mise trust
mise install
Run tests:
zig build test --summary all
Build library:
zig build
Enable/disable kernel.perf_event_paranoid for debugging:
# Disable
sudo sysctl -w kernel.perf_event_paranoid=2
# Enable
sudo sysctl -w kernel.perf_event_paranoid=-1
std.mem.doNotOptimizeAway was ignoredMIT. Use it for whatever.