aleh/bluepill-zig
Bare Metal Zig on STM32
This is about using Zig alone to directly program boards based on STM32F103xx MCU, such as "Blue Pill" clones of Maple Mini.
A Blue Pill board or similar, see here for general and physical info.
Zig to build our examples. (I used version 0.13.0 here.)
ST-Link Tools to flash them.
STM32 Cortex®-M3 Programming Manual for general info on Cortex-M3.
STM32F10x Reference Manual to know how to program all available peripherals.
STM32F103x8 Datasheet to know what exactly is available in our MCU as the above reference manual describes the whole family.
Info on Linker Scripts to be able to describe memory layout to the linker.
One of the cool things about Zig is that (thanks to LLVM) it can compile for many architectures out of the box and yet manages to keep its MacOS package size under 50MB without any external dependencies!
Zig expects its -target
command line switch to be a dash-separated triple identifying architecture, operating system
and ABI (application binary interface) of the target system. See the whole list by running:
zig targets | less
Our architecture is ARM, we don't have any OS and we don't care about any particular ABI, so we are going to use
arm-freestanding-none
as our -target
.
Another command line switch, -mcpu
, should be used to specify the processor to generate the code for. STM32F103xx is
based on Cortext-M3, so consulting the list of supported targets the most logical choice seemed to be cortex_m3
. I am
getting build errors when using one however (something about an instruction in IT
block), so I decided to step back
to cortex_m23
that seems to be a subset of Cortext-M3 instruction-wise, missing exactly the IT
instruction. (I
don't expect the compiler to use "TrustZone" security extensions. We could also use cortex_m0
to be 100% sure it
won't generate something unsupported.)
Before we go to the remaining compiler settings let's type in some code first. Let's make the classic "Blink" example toggling the on-board LED (see v0/main.zig
). It's not going to be nice, but we'll improve it later.
Looking at the Blue Pill Schematic we see that the LED is attached to PC13 of the MCU which can be controlled via port 13 of GPIO bank C. GPIO and the corresponding registers are described in Chapter 9 of the Reference Manual, while the base address of bank C, 0x4001_1000
, can be found in the STM32F103x8 Datasheet.
Before we can use the GPIO bank C however we need to enable it via one of the Reset and Clock Control (RCC) registers (base 0x4002_1000
), see chapter 7.3. GPIO bank C is controlled by bit 4 of RCC_APB2ENR
, offset 0x18
:
reg(0x4002_1000 + 0x18).* |= 1 << 4;
Where reg()
is a simple wrapper that gets us a volatile
pointer, which we need when working with memory mapped registers for Zig's optimizer to not try removing or reorder our reads/writes.
fn reg(comptime address: u32) *volatile u32 {
return @ptrFromInt(address);
}
Before we can start toggling port 13 however we need to configure it as output via GPIOx_CRH
register (offset 0x04
, see chapter 9.2.2). Each nibble in this register is responsible for configuration of ports 8-15:
reg(0x4001_1000 + 0x04).* |= 0b01_10 << (4 * (13 - 8)); // Relying on the reset value being 0b01_00.
We'll be using GPIOx_BSRR
register (offset 0x10
from the base, see chapter 9.2.5) to control the output state of our port. Setting bits 0-15 here sets the output on ports 0-15 to 1, while setting bits 16-32 resets the output on the same ports. (This register is more convenient than more traditional GPIOx_ODR
, because there is no need to read its current state to modify a single bit.)
const GPIOC_BSRR = reg(0x4001_1000 + 0x10);
while (true) {
GPIOC_BSRR.* = 1 << (13 + 16);
delay(50);
GPIOC_BSRR.* = 1 << 13;
delay(950);
}
To implement delay()
we'll just do something in a long loop. (We'll return to better implementation in the next version of the example.)
fn delay_ticks(ticks: u32) void {
var i = ticks;
while (i > 0) {
// Reading any location to prevent the loop from being optimized out.
_ = reg(0x2000_0000).*;
i -= 1;
}
}
To calculate the number of ticks we need to iterate to get a millisecond delay we need to know that after reset our CPU runs approximately at 8MHz and that every iteration in the above loop takes 6 CPU cycles (more on this below):
fn delay(comptime ms: u32) void {
delay_ticks(ms * 8_000 / 6);
}
OK, now when we have a basic program (see v0/main.zig
) we can try to compile it:
zig build-exe -target arm-freestanding-none -mcpu cortex_m23 -O ReleaseSmall -femit-asm main.zig
The extra -femit-asm
flag makes Zig produce assembly output which is handy when examining our code. For example, this is how we can calculate how many CPU clock cycles delay_ticks()
spends per tick (comments added by me using info on this page):
main.delay_ticks: ; r0 contains the number of ticks already.
movs r1, #1
lsls r1, r1, #29 ; (1 << 29) is this 0x20000000 address we are reading from below.
.LBB1_1:
cbz r0, .LBB1_3 ; 1 cycle, branch not taken. (Jump out of the loop if tick counter in r0 is zero.)
ldr r2, [r1] ; 2 cycles. (Our fake read.)
subs r0, r0, #1 ; 1 cycle. (Decrement the tick counter in r0.)
b .LBB1_1 ; 2 cycles, branch is taken. (Repeat the loop.)
.LBB1_3:
bx lr ; Return from the function.
Speaking of assembly, we could also disassemble the output file directly with objdump -d main
, but it might be harder to see what's going on, here is the same delay_ticks()
function:
20130: 2101 movs r1, #1
20132: 0749 lsls r1, r1, #29
20134: b110 cbz r0, 0x2013c <.text+0x50> @ imm = #4
20136: 680a ldr r2, [r1]
20138: 1e40 subs r0, r0, #1
2013a: e7fb b 0x20134 <.text+0x48> @ imm = #-10
2013c: 4770 bx lr
Another thing that we can see with objdump
is that our code starts at address 000200ec
which is quite wrong for our MCU where flash memory begins at 0x08000000
:
main: file format elf32-littlearm
Disassembly of section .text:
000200ec <.text>:
200ec: 480c ldr r0, [pc, #48] @ 0x20120 <.text+0x34>
200ee: 6801 ldr r1, [r0]
200f0: 2210 movs r2, #16
...
Well, this is logical because Zig does not really know much about our MCU. We need to help it by writing a "linker script". The official documentation on the topic mentioned above is easy to read and actual scripts are fairly self-explanatory.
The first thing we do in our script (v0/bluepill.ld
) is describing relevant memory regions, which is quite simple in our case as we have 128K of flash memory starting at 0x08000000
and 20K of RAM starting at 0x20000000
(see chapter 4 in the Datasheet):
MEMORY {
flash (rx) : o = 0x08000000, l = 128K
sram (rw) : o = 0x20000000, l = 20K
}
(The names of the regions can be arbitrary here, the linker does not know what the "flash" is.)
The next part of the script tells what should be placed into the flash memory:
SECTIONS {
.text : {
...
} >flash
We cannot tell it to begin filling with the code from the start as the first word has to be the value of the main stack pointer (MSP), as per chapter 2.1.2 of the Programming Manual:
On reset, the processor loads the MSP with the value from address 0x00000000.
(The address is from the start of the flash, 0x08000000
in our case.)
Next go interrupt vectors (see table 63 in the Reference Manual) of which we are only interested in the first one, Reset, as we don't use interrupts just yet:
.text : {
/* The initial value of SP, past the end of RAM. */
LONG(ORIGIN(sram) + LENGTH(sram))
/* Reset vector. */
LONG(_start)
/* We should put a bunch of other vectors here, but since none are used yet we can use the space. */
/* So now goes our code. */
*(.text)
/* Then read-only data. */
*(.rodata.*)
*(.rodata)
} >flash
Next we tell that our writable data (static variables) is expected in RAM. We don't have such variables in our simple example yet, but that'll be handy later.
.bss : {
*(.bss)
} >sram
And finally we exclude a few extra code segments that otherwise would increase the size of our binary:
/DISCARD/ : {
/* I don't want to keep sections needed only when printing stack traces. */
*(.ARM.*)
}
Let's compile using our linker script now:
zig build-exe -target arm-freestanding-none -mcpu cortex_m23 -femit-asm -O ReleaseSmall --script bluepill.ld main.zig
Disassembling with objdump
shows that the addresses are correct now:
main: file format elf32-littlearm
Disassembly of section .text:
08000000 <.text>:
8000000: 20005000 andhs r5, r0, r0
8000004: 08000009 stmdaeq r0, {r0, r3}
8000008: 6801480c stmdavs r1, {r2, r3, r11, lr}
...
The first word appears to be the desired stack pointer just beyond the RAM followed by the Reset vector pointing to the next word. The number is even to indicate Thumb mode. Let's add --mcpu=cortex-m23
to force Thumb mode:
08000000 <.text>:
8000000: 5000 str r0, [r0, r0]
8000002: 2000 movs r0, #0
8000004: 0009 movs r1, r1
8000006: 0800 lsrs r0, r0, #32
8000008: 480c ldr r0, [pc, #48] @ 0x800003c <.text+0x3c>
800000a: 6801 ldr r1, [r0]
800000c: 2210 movs r2, #16
...
OK, now the part starting at 0x8000008
looks like the code in our .s
file.
We'll be using st-flash
utility which expects either a raw binary with the starting address passes separately or an Intel hex file that already contains addresses. Let's use the latter by converting our build to .hex
with Zig:
zig objcopy -O hex main main.hex
Flashing is then as simple as:
st-flash --reset --format ihex write main.hex
You'll see something like this and your LED will hopefully start blinking every second:
st-flash 1.8.0
2024-08-18T22:01:38 INFO common.c: STM32F1xx_MD: 20 KiB SRAM, 128 KiB flash in at least 1 KiB pages.
2024-08-18T22:01:38 INFO common_flash.c: Attempting to write 90 (0x5a) bytes to stm32 address: 134217728 (0x8000000)
-> Flash page at 0x8000000 erased (size: 0x400)
2024-08-18T22:01:38 INFO flash_loader.c: Starting Flash write for VL/F0/F3/F1_XL
2024-08-18T22:01:38 INFO flash_loader.c: Successfully loaded flash loader in sram
2024-08-18T22:01:38 INFO flash_loader.c: Clear DFSR
1/1 pages written
2024-08-18T22:01:38 INFO common_flash.c: Starting verification of write complete
2024-08-18T22:01:38 INFO common_flash.c: Flash written and verified! jolly good!
Also, as you can see our code is just 90 bytes, which is quite nice given all the required setup instructions.
Now let's improve the example showing some power of Zig:
export fn _start() noreturn {
const bankC = GPIOBank(.C);
bankC.init();
const led = bankC.port(13);
led.setOutput(.openDrain, .max2MHz);
while (true) {
led.reset();
delay(50);
led.set();
delay(950);
}
}
The GPIOBank
is an abstraction that is more readable, more reusable (we can use all banks/ports) but does not add any overhead as all the selection of the bank and port happen at compile time. Our program is 94 bytes now, which is just 4 bytes larger only because we are not relying on the reset values when writing to GPIOC_CRH
as we want to change pin configuration at runtime:
pub fn GPIOBank(comptime bank: GPIOBankIndex) type {
return struct {
/* ... */
pub fn port(comptime pin: u4) type {
return struct {
fn reg(comptime offset: u32) *volatile u32 {
return @ptrFromInt(switch (bank) {
.A => 0x4001_0800,
.B => 0x4001_0C00,
.C => 0x4001_1000,
.D => 0x4001_1400,
.E => 0x4001_1800,
} + offset);
}
fn setModeBits(comptime bits: u32) void {
const CRx = reg(if (pin >= 8) 0x04 else 0x00);
const shift = 4 * @as(u8, if (pin >= 8) pin - 8 else pin);
CRx.* = CRx.* & ~(@as(u32, 0xF) << shift) | (bits << shift);
}
const BSRR = reg(0x10);
pub fn set() void {
BSRR.* = 1 << pin;
}
/* ... */
};
}
};
}
As you can see reg()
depends on the bank, but since both bank
and pin
are comptime
, thus both GPIOx_CRx
(L or H) and GPIOx_BSRR
are picked at compile time as well and we always get highly optimized code.
Here we are adding a SysTick
timer for a better delay()
along with USART
to say the actual Hello
. (I've moved all helpers into a module called z41
here.)
Our binary is 792
bytes now, or 314
if we completely remove the line with usart.writer.print
, and 432
if we keep it but don't output the number. In other words, the use of std.fmt
adds overhead only when specifiers are actually used, something that would be hard to achieve with a printf()
-style C/C++ function.
const std = @import("std");
const z41 = @import("z41");
export fn _start() noreturn {
const rcc = z41.RCC(.internalRC);
rcc.init();
const SysTick = z41.SysTick(rcc.SYSCLK, 50);
SysTick.init();
const led = z41.GPIO(rcc, .C).port(13);
led.Bank.init();
led.setOutput(.openDrain, .max2MHz);
const usart = z41.USART(rcc, .usart1);
usart.init(115200);
usart.writeBytes("\nHello! It's V2\n\n");
while (true) {
led.reset();
SysTick.delay(50);
led.set();
SysTick.delay(950);
try usart.writer.print("\rUptime: {}s", .{SysTick.milliseconds() / 1000});
}
}
I've added RegisterSet
under the hood to help with definition of hardware registers. It's similar to this reg()
helper from v0
, but allows using structs as well. For example, this is how STK_CTRL
is described in SysTick
(you should appreciate Zig allowing anonymous enums like here in CLKSOURCE
):
const STK_CTRL = regs.at(0, packed struct(u32) {
/// Counter enable.
ENABLE: bool,
TICKINT: bool,
CLKSOURCE: enum(u1) {
/// AHB/8.
AHB_8 = 0,
/// Processor clock (AHB).
AHB = 1,
},
_r1: u13 = 0,
COUNTFLAG: bool = false,
_r2: u15 = 0,
});
You can still use raw u32
registers where needed:
const STK_LOAD = regs.at(4, u32);
The helper checks the type you pass to make sure it's u32
or u32
-backed packed struct
:
pub fn at(comptime offset: u32, comptime reg_type: type) *volatile reg_type {
const valid = switch (@typeInfo(reg_type)) {
.Struct => |s| switch (s.layout) {
.@"packed" => s.backing_integer == u32,
else => false,
},
.Int => |i| i.bits == 32,
else => false,
};
if (!valid) {
@compileError("Expected `reg_type` to be u32 or a packed struct backed by u32");
}
return @ptrFromInt(base + offset);
}
A SysTick
timer is described in the chapter 4.5 of the Programming Manual and is something common to all processors based on Cortex®-M3. It's a simple counter that is decremented on every (or every 8ths) CPU clock cycle and generates an interrupt when it reaches zero. It can be used to implement a notion of system time (e.g. milliseconds since system start) along with better delays, where we don't have to rely on how exactly our code is compiled.
We need to be able to handle interrupts for this helper and this is where our linker script needs to be changed. The handler itself is simple:
pub fn SysTick(comptime cpuFreq: u32, comptime msPerTick: u32) type {
return struct {
export fn SysTick_Vector() void {
total_ms +%= msPerTick;
}
var total_ms: u32 = undefined;
...
It needs to be export
ed for our linker script to place a pointer to it into an appropriate location. (Note that the export only happens when SysTick
is used, something that would be hard to achieve in C/C++ without macros.)
Other than export
no other attributes are needed here thanks to the clever way interrupts ("exceptions") are handled (see chapter 2.3.7 in the Programming Manual):
registers r0
-r3
are automatically pushed to the stack along with flags when an interrupt occurs, while remaining registers are already expected to be preserved by the compiler even in regular functions;
unlike other architectures no special "return from interrupt" instruction is needed, because the return address in LR
register is set to a special value that any return from function (bx lr
) will be recognized as a return from an interrupt restoring r0
-r3
, etc.
So we need to add a pointer to our handler into the interrupt vector table at the start of our code (see table 63 in the Reference Manual again):
.text : {
...
LONG(_start);
/* We don't use any interrupt vectors before SysTick, so let's just fill. */
FILL(0); . = ADDR(.text) + 0x003C;
LONG(DEFINED(SysTick_Vector) ? SysTick_Vector : 0xDEAD);
/* Other vectors follow, but since we are not using them we can just start our code earlier. */
...
Note the use of DEFINED
: it allows correct linking even when the target program does not need the SysTick
timer. (By the way, the use of 0xDEAD
for undefined handlers is temporary here, a central "panic" handler halting the MCU would be a better option eventually.)
build.zig
I've been using a simple shell script to build and flash the first 2 examples:
#!/bin/sh -e
zig build-exe \
-target arm-freestanding-none \
-mcpu cortex_m23 \
-femit-asm \
-O ReleaseSmall \
--script bluepill.ld \
main.zig
zig objcopy -O hex main main.hex
rm main main.o
st-flash --reset --format ihex write main.hex
However in this one we want to be able to pull our helpers from a "module" in ./lib
. This still could be described in a shell script of course, but I also was curious about Zig's build system, so I've added build.zig
.
Now the example can be compiled with zig build
or flashed with zig build flash
.