Photo by Valeria Strogoteanu on Unsplash

Zig Bare Metal Programming on STM32F103 — Booting up

13 min readOct 31, 2022

When talking about embedded software development one cannot consider all the programming languages that are normally available on major platforms.

This is because most languages rely to some extent on a supporting runtime — a tool that is usually easily provided by any operating system. But a fully fledged operating system is typically not something a microcontroller can provide.

Note: while I have personally witnessed developers (myself included) work with high level languages like Java, Python, Lua or
Javascript on an embedded target without an operating system, those scenarios require an extra layer of work to setup the runtime.
Even when a specific runtime implementation is available (or the job is worth the effort to develop one yourself), using it
usually entails some drawbacks in term of resource efficiency.

What is left is the set of programming languages that compile to a standalone executable, maybe with an included runtime of negligible size and complexity.

For the most part this translates to a single option, dominating the industry: C. It is an exceedingly successful language thanks to the underlying simplicity that allows it to be easily ported to any required target.

In the last decade, however, a few other competitors have emerged that can be considered a serious, practical tool for embedded development. One of them, the base for this tutorial, is Zig.

The Target

As common for this series of articles I’m going to work on a Blue Pill, the cheap and small devkit riding on a STM32F103 microcontroller.

ARM architecture is pivotal here: its open nature makes it an easy target for any programming language that relies on LLVM for compilation, such as Zig.

The Language

Zig is beautiful language aiming to occupy the spot of low level, efficiency-centered development with a modern approach. You can read all about it here.

Even with no prior Zig experience, any programmer should find themself understanding the code that follows with no issue.

As a language, it’s particularly well suited for embedded software development: For starters, it is a fully compiled language; it doesn’t (necessarily) depend on a standard library, it has explicit memory allocation and doesn’t have any runtime requirement. At the same time it brings a lot of features typical of a modern programming language, and then some more.

To top it all of, I find its syntax extremely simple and refreshing. I haven’t found the occasion to write much Zig code, but for now it has been a treat.

The Tools

For our task, grab:

Any editor of your choice. I like a mix of VSCode and NeoVim, but anything that can read and write a Zig source file is free game.
The Zig compiler.
Either arm-none-eabi-binutils or llvm. They are both available from the official repositories of all major distributions.

Note: In addition to Zig you will also need some tool to convert from the ELF file that Zig outputs to a binary format that you can load onto the Blue Pill. Unfortunately Zig does not support this feature out of the box yet, so an objcopy-capable tool is required (hence this requirement).

An external tool to load firmware onto the Blue Pill and the related software; see this article for an in-depth guide on flashing.

Zig is a young project and still in the early stages of development, but we won’t be using many of its most advanced features: I hope everything written here will stay mostly relevant regardless of the version of the language. Still, at the time of writing this is the version I’m working with:

$ zig version
0.9.1

The Project

Before jumping in, make sure to download the official datasheet and reference manual for the STM32F103 (the processor installed on the Blue Pill) and to keep them at hand.

Don’t let their sheer size scare you! Most of the time you don’t need to read more than a few pages if you know where to look, and I’ll guide you through all the required information.
For ST devices the datasheet contains notions that are specific to the device itself, whereas the reference manual has the bulk of the information that is valid for all parts of the same family.

The Startup Procedure

I already explained the basic details of the STM32F103 initialization process in this article.
I will cover again some parts and skim over others; in general, all of it still applies to Zig as well as C, so refer to it part if you’re interested in an in depth explanation.

To get started it’s sufficient to know that there are two specific locations in memory (RAM) that should be loaded with the starting point of the stack and the code (i.e. the function) to execute.

Those two locations are 0x08000000 and 0x08000004, respectively.
Those are locations in the flash portion of the memory and should be ready before any code has seen execution: the tool used to prepare them ahead of time is the linker script.

The Linker Script

When we compile any program to machine code the compiler passes through a series of steps. Among those there is the linking step, which takes the compiled binaries from each separate source file and actually fuse them together into the final binary executable.

A linker script is an artifact that precisely describes how this final result is composed. We can use one to explicitly position a stack pointer and reset handler at the beginning of the flash memory.
Create the file memory.ld with the following contents, and let’s analyze it step by step:

MEMORY 
{
    FLASH (rx)  : ORIGIN = 0x08000000, LENGTH = 128K
    RAM   (rwx) : ORIGIN = 0x20000000, LENGTH = 20K
}ENTRY(main);
__reset_stack_pointer = ORIGIN(RAM) + LENGTH(RAM);SECTIONS {
    .text : {
        LONG(__reset_stack_pointer);
        LONG(main | 1);
        /* The whole interrupt table is 332 bytes long. Advance to that position. */
        . += 332;
        /* And here comes the rest of the code, ie. all symbols starting with .text */
        *(.text)
        *(.rodata*)
        . = ALIGN(4);
    } > FLASH  /* Put this in the flash memory region */.ARM.exidx : {
        *(.ARM.exidx*)
        . = ALIGN(4);
    } > FLASH
}

The syntax of the linker script might be unfamiliar, but you don’t have to know every nuance to simply understand what’s going on. If you are interested you can consult this manual for reference.

First of all, the MEMORY command defines the overall memory layout where the program will be loaded to. It is made up of a FLASH section that starts at 0x08000000, is 128 KB long and can be read and executed (the rx bit) and a RAM section that starts at 0x02000000, is 20 KB long and can be read, written and executed (rwx).
The size and addresses must match the information found in the device’s memory map.

Next, we specify what is the entry point of our program: conventionally, the main function. If this is omitted the Zig compiler will complain that there is no start symbol found.

Then, we declare a variable containing the address of RAM to be used as the starting point for the stack (which starts from the far end of RAM memory and grows downwards as the program nests more function calls).

This was just a warm up. The layout of the binary we want to create is specified by the SECTIONS block.

A program is made up of several different sections; the placement of some is optional, while others must be explicitly set by the developer.

Note: interestingly, there are a few sections that weren’t needed for the C bare metal example but must be specified here, like rodata and ARM.exidx.

The first section is .text, which contains the compiled Zig code to be executed. By adding it first and appending it to FLASH memory we make sure it’s loaded at address 0x08000000, where the stack pointer and reset handler are supposed to be.

The stack pointer is provided by the LONG(__reset_stack_pointer) directive, which puts the 4-bytes wide value of __reset_stack_pointer at the very start of our program.

Just after that we place the main function with its least significant bit set to one.
This is because while the function address is aligned to the machine word (4 bytes), the manual states that all exception handlers must point to Thumb instructions, therefore the first-bit-to-1 requirement.

Once those two values are set, skip ahead for the total size of the exception vector (332) by increasing the value of the current address:
in a linker script the dot (.) corresponds to the current memory address at any given moment.

This is pretty much it for the most part. The two remaining sections, .rodata.str1.1 and .ARM.exidx are needed by the Zig compiler to store debug information (i.e. stack unwinding); it will throw a fit at compilation time if they are not found.

I must say that we have gone deep enough in this programming tutorial without writing any code. Let’s put down a minimal Zig program to be paired with this linker script, calling it main.zig.

export fn main() void {
    while (true) {
    }
}

This main function is quite unremarkable; it just idly loops forever without doing anything, but it will suffice for now.

Compilation

We are now ready to compile the first binary. The zig compiler allows to do this with an elegant one-liner:

$ zig build-exe -target thumb-freestanding -mcpu cortex_m3 --name application.elf --script memory.ld main.zig

If you already worked a little with Zig you should be familiar at least with the build-exe command. The remaining flags specify that we want to cross compile for the Blue Pill:

-target thumb-freestanding is the ARM32 target architecture of our hardware. Freestanding means that we have no default libraries other than what we provide (the lone main.zig).
-mcpu cortex_m3 specifies that the target microcontroller belongs to the Cortex-M3 family.
--name application.elf is simply the name of the output.
With --script memory.ld we specify an explicit linker script.
Finally, main.zig is the compilation input.

If you followed every step correctly you should find yourself with application.elf. That is not ready to be loaded though: the ELF format contains many metadata information that are not needed for execution.
It must be converted to a binary file either with an llvm or gcc toolchain:

# pick one
$ arm-none-eabi-objcopy -O binary application.elf application.bin
$ llvm-objcopy -O binary application.elf application.bin

Now application.bin is really ready. We just have to push this binary into the flash memory of the Blue Pill.

Blink

If you followed correctly all the steps from the flashing tutorial, the firmware should be now loaded on the Blue Pill.

Congratulations, you are running an embedded program! Is it exciting?

What do you mean, nothing happened?

Well, nothing is happening that you can notice, but I guarantee that under the hood the STM32F103 is buzzing with your program, hastily stepping through the empty loop that we set up.

If you’re thinking “that’s quite boring” you are right. Let’s look for a sign of life now.

Memory Mapped Peripherals

The Blue Pill is an affordable and elegant design but doesn’t include many interesting peripherals out of the box. Without additional hardware the only way we have to get a proof of our work is to blink the green LED positioned just below the red power indicator.

However, considering blinking an LED is conventionally considered the “Hello world” for embedded systems, so we are on the right track.

It’s time to talk about how peripherals work on an STM32 microcontroller. Perhaps you expect to find some kind of API providing a turn_on_led function, but remember that we are alone here.
There are no libraries: it’s just us and the memory space… and that’s exactly where we should look.

Most microcontrollers interact with peripherals by mapping specific memory addresses to registers that control their operation. Think of it like “write 1 to this address to turn on the light”, only slightly more complex.

The green LED on the board is connected internally to the PC13 GPIO (General Purpose Input/Output). You can find all the details oh how those work in section 9 of the reference manual, but I’ll give you a summary.

GPIOs are split into separate ports named with letters (A, B, C and so on). Each port can handle up to 16 pins, and the one we’re interested in happens to be the thirteenth of port C.

For each port there are several register that manage the IOs operation; we are interested in three of them:

The APB2 peripheral clock enable register (RCC_APB2ENR), described by section 8.3.7 of the reference manual and found at address 0x40021018.
Every peripheral on the chip must be fed a clock pulse that has to be enabled before we are able to use it. In case of the GPIOC port this means setting the fifth bit of this register.

The port configuration register high (GPIOC_CRH), described by section 9.2.2 of the reference manual and found at address 0x40011004.
This is the high counterpart in a pair of registers that decide the GPIO configuration. For each IO there are 4 bits, and for GPIOC13 the relevant ones are from 20 to 23 (included).

The port output data register (GPIOC_ODR), described by section 9.2.4 of the reference manual and found at address 0x4001100C. By setting or clearing a bit in this register the corresponding pin will be held high or low (which means the LED will be turned or or off).

The address for each register can be found by following the memory map described at section 3.3 of the reference manual. Each register is 32-bit wide, so its address can be considered as the pointer to an u32 in Zig, and that’s exactly how we are going to implement it.

Add these three definitions to main.zig:

pub const RCC_APB2ENR = @intToPtr(*volatile u32, 0x40021018);
pub const GPIOC_CRH = @intToPtr(*volatile u32, 0x40011004);
pub const GPIOC_ODR = @intToPtr(*volatile u32, 0x4001100C);

Now the three constant variables contain the addresses for each register, which makes them a little easier to access.

When we use @intToPtr to make the cast we specify *volatile u32 as destination type: the “pointer to u32” part should be clear by now; volatile is a keyword that lets the compiler know that the variable we are using might change without any apparent reason, and that reading and writing from and to it should never be optimized out.

For example, writing the same value twice to a memory mapped register might have a special meaning for the hardware, but if the compiler deems that the second write would have no effect (as it should be for normal memory locations) it might decide to remove it. With this we make sure this doesn’t happen.

Let’s use these registers to setup and periodically blink the green LED:

pub const RCC_APB2ENR = @intToPtr(*volatile u32, 0x40021018);
pub const GPIOC_CRH = @intToPtr(*volatile u32, 0x40011004);
pub const GPIOC_ODR = @intToPtr(*volatile u32, 0x4001100C);export fn main() void {
    RCC_APB2ENR.* |= @as(u32, 0x10);           // Enable GPIOC clkGPIOC_CRH.* &= ~@as(u32, 0b1111 << 20);  // Clear all the bits relative to PC13
    GPIOC_CRH.* |= @as(u32, 0b0010 << 20);   // Now set the desired configuration: Out PP, 2MHzwhile (true) {
        var i: u32 = 0;
        GPIOC_ODR.* ^= @as(u32, 0x2000);     // Toggle the bit corresponding to GPIOC13
        while (i < 100_000) {                // Wait a bit
            i += 1;
        }
    }
}

Notice how all the register accesses are modifications of the existing contents (|=, &=, ^=) rather than direct writes.
This is because the registers manage many different peripherals (all the IOs of port C), and we are only interested in changing the values of a few bits.
Leaving everything else how it was before is unnecessary here (because we are only using IO13), but is still good manners.

We use handy hexadecimal and binary literals to express the changes to the registers: for example, a logic or to 0x10 means setting the fifth bit — that in turn means enabling the PORTC clock when it is applied to the memory pointed by GPIOC_RCC.
The explicit cast to u32 is unnecessary but makes all the steps unambiguous to the reader.

The real action happens with the assignment GPIOC_ODR.* ^= @as(u32, 0x2000), which toggles the thirteenth bit and the LED with it. Afterwards we count up to 100_000 to wait some time before flipping it again.

Note: this is not a very good way to wait because we don’t actually know how much it takes the CPU to complete that loop, but it’s good enough for now.

If you did everything properly you can now compile and load the code again:

$ zig build-exe -target thumb-freestanding -mcpu cortex_m3 --name firmware.elf --script memory.ld application.elf
$ arm-none-eabi-objcopy -O binary application.elf application.bin
# Your favourite method for loading the binary on the target

The green LED should periodically turn on and off! The actual frequency can be tuned by reducing or increasing the amount to be counted to in between each loop cycle.

Conclusion

This was quite a journey. We built an embedded application from the ground up with assistance from third party tools like a manufacturer-provided IDE.

It was educational, but I think we can all agree it would be unwieldy to repeat the process for every new embedded project.

The good news is that we can have it easier. In the future there will be tools centered around Zig to streamline the process, so that everything you need to do is to specify what device you are targeting; then, the linker script, startup procedure and peripheral access will be generated automatically.

In fact, Zig as a language provides features like comptime instructions that should make this process fairly easy.

Once the foundations are solid, more complex structures can be built upon them. The next natural steps would be:

splitting the project into more files (one just for the startup, one for the main function, libraries,…) and using the Zig build library instead of the command line.
Configuring the device in more detail, for example by setting a clock speed different from the default.
Making peripheral access more convenient.
Precisely keep track of elapsed time.
Using more complex peripherals