UBMSTM32 Part 1 - Build Process

Intro

Initial idea for this post was to start writing about the startup code but in the process I’ve noticed it’s necessary to first address the elephant in the room which is the memory layout of a program. My goal for this entry is to explain it by presenting how each step of the build process shapes the final executable binary.

Disclaimer

I’ve decided to skip the topic of preprocessor in this post because I don’t find it that important from the perspective of the memory layout. The only necessary part for this post is the working of the include directive, which puts the content of a specified file in place of the directive.

Paradigm shift

While programming in high-level languages (that includes C too), we are used to the concept of variables and functions. For the program however those are just addresses in memory. Calling a function means jumping to the specific place in a program and accessing a variable means reading or writing to the memory at the specific address.

There are runtime dependent addresses like local variables allocated on the stack or heap. The real address is determined during execution, and can vary depending on different conditions.

Global and static variables (called later static storage duration variables) along with functions, point to addresses that are runtime independent. These are determined during the build process in 3 steps:

Compilation

The compiler generates a symbol (basically labeled address) for each function and static storage duration variable. Those are placed inside a specific section of an assembly file. The ELF file standard describes (but is not limited to) sections:

.text - contains program code in assembly, can also contain read only data.
.rodata - read only data
.data - read & write data with initial values
.bss - read & write data without initial values

The goal of such an assembly file is to make an initial description of the desired layout. The file specifies symbols which can be assigned to a specific section. Symbols mark the beginning of a certain area and then can either reserve space, fill the space with data or in the case of .text fill the space with assembly instructions.

Let’s take a C file:

// main.c
#include <stdint.h>

uint32_t global_initialized_var = 60;
uint32_t global_uninitialized_var;

uint32_t static static_initialized_var = 65;
uint32_t static static_uninitialized_var;

int main() {
  uint32_t local_initialized_var = 44;
  uint32_t local_uninitialized_var;
  return 0;
}

and compile it with command:

gcc -S main.c

That will produce an assembly file called main.s. You can see how the assembly code and variables are assigned do different sections and wrapped with some metadata. Indentation for this generated assembly is quite weird and doesn’t really show boundaries of symbols. Don’t let it fool you.

# main.s
	.file	"main.c"
	.text
	.globl	global_initialized_var
	.data
	.align 4
	.type	global_initialized_var, @object
	.size	global_initialized_var, 4
global_initialized_var:
	.long	60
	.globl	global_uninitialized_var
	.bss
	.align 4
	.type	global_uninitialized_var, @object
	.size	global_uninitialized_var, 4
global_uninitialized_var:
	.zero	4
	.data
	.align 4
	.type	static_initialized_var, @object
	.size	static_initialized_var, 4
static_initialized_var:
	.long	65
	.local	static_uninitialized_var
	.comm	static_uninitialized_var,4,4
	.text
	.globl	main
	.type	main, @function
main:
.LFB0:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	movl	$44, -4(%rbp)
	movl	$0, %eax
	popq	%rbp
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE0:
	.size	main, .-main
	.ident	"GCC: (GNU) 15.2.1 20251112"
	.section	.note.GNU-stack,"",@progbits

Assembly

After assembler processes the assembly file, it produces an object file. This is a binary file that contains machine code, however, this is still not the final executable, but something called a relocatable object. It means that, instead of having fixed address symbols, it uses an offset from the beginning of the section they are assigned to as an address. The reason for that is: we may want to combine multiple object files together into a single executable. A use case for that may be linking it to an external library or an implementation of incremental build_. In this approach we divide different source code files into groups called translation units and we independently compile each translation unit into a separate object file. When we already have the object file for each translation unit, then, when the source changes, we only have to recompile the updated translation unit, and link it with unchanged object files. It can improve build time by omitting a lot of unnecessary work.

Let’s add new files that will act as a separate translation unit: hello.h, hello.c:

// hello.h
#include <stdint.h>

void increment(uint32_t * value);

// hello.c
#include "hello.h"

void increment(uint32_t * value) {
  *(value) += 1;
}

File main.c after modification:

// main.c
#include <stdint.h>
#include "hello.h"

uint32_t global_initialized_var = 60;
uint32_t global_uninitialized_var;

uint32_t static static_initialized_var = 65;
uint32_t static static_uninitialized_var;

int main() {
  uint32_t local_initialized_var = 44;
  uint32_t local_uninitialized_var;
  increment(&local_initialized_var);
  return 0;
}

Separate compilation of translation units. The -c flag tells gcc to not perform linking:

gcc hello.c -c -o hello.o
gcc main.c -c -o main.o

We can use a program called objdump to extract information from object files. The -t flag helps us to see symbols.

objdump -t hello.o

hello.o:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 hello.c
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 g     F .text  000000000000001a increment

objdump -t main.o

main.o:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 main.c
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000004 l     O .data  0000000000000004 static_initialized_var
0000000000000004 l     O .bss   0000000000000004 static_uninitialized_var
0000000000000000 g     O .data  0000000000000004 global_initialized_var
0000000000000000 g     O .bss   0000000000000004 global_uninitialized_var
0000000000000000 g     F .text  0000000000000045 main
0000000000000000         *UND*  0000000000000000 increment
0000000000000000         *UND*  0000000000000000 __stack_chk_fail

The first column shows a symbol’s offset from the section it’s assigned to. You can notice that inside the .data and .bss sections, the static variable is after the global one. It’s quite interesting what the increment function declaration in the header file caused inside the main.o - it’s shown as a symbol of undefined address, trusting the linker to find it.

Linking

Once we have relocatable objects, we can use a linker to merge these objects into a single executable. This is done by merging together sections, and relocating symbols to different addresses. It’s possible to customize the exact behavior of the linker by feeding it a linker script, which is unnecessary in this example, but will be covered in more detail in a future article.

We can create the executable from object files with this command:

gcc hello.o main.o -o output.elf

Now let’s inspect it with objdump. In the executable, symbol addresses are no longer defined as offsets from sections, but rather as offsets from the beginning of the file (or other specified base address). That means we can sort the output of objdump by this address for easier examination.

objdump -t output.elf | sort

0000000000000000       F *UND*	0000000000000000              __libc_start_main@GLIBC_2.34
0000000000000000       F *UND*	0000000000000000              __stack_chk_fail@GLIBC_2.4
0000000000000000 l    df *ABS*	0000000000000000              
0000000000000000 l    df *ABS*	0000000000000000              hello.c
0000000000000000 l    df *ABS*	0000000000000000              main.c
0000000000000000  w    F *UND*	0000000000000000              __cxa_finalize@GLIBC_2.2.5
0000000000000000  w      *UND*	0000000000000000              __gmon_start__
0000000000000000  w      *UND*	0000000000000000              _ITM_deregisterTMCloneTable
0000000000000000  w      *UND*	0000000000000000              _ITM_registerTMCloneTable
0000000000001000 g     F .init	0000000000000000              .hidden _init
0000000000001040 g     F .text	0000000000000026              _start
0000000000001139 g     F .text	000000000000001a              increment
0000000000001153 g     F .text	0000000000000045              main
0000000000001198 g     F .fini	0000000000000000              .hidden _fini
0000000000002000 g     O .rodata	0000000000000004              _IO_stdin_used
0000000000002004 l       .eh_frame_hdr	0000000000000000              __GNU_EH_FRAME_HDR
0000000000003de0 l     O .dynamic	0000000000000000              _DYNAMIC
0000000000003fe8 l     O .got.plt	0000000000000000              _GLOBAL_OFFSET_TABLE_
0000000000004008 g       .data	0000000000000000              __data_start
0000000000004008  w      .data	0000000000000000              data_start
0000000000004010 g     O .data	0000000000000000              .hidden __dso_handle
0000000000004018 g     O .data	0000000000000004              global_initialized_var
000000000000401c l     O .data	0000000000000004              static_initialized_var
0000000000004020 g       .bss	0000000000000000              __bss_start
0000000000004020 g       .data	0000000000000000              _edata
0000000000004020 g     O .data	0000000000000000              .hidden __TMC_END__
0000000000004024 g     O .bss	0000000000000004              global_uninitialized_var
0000000000004028 l     O .bss	0000000000000004              static_uninitialized_var
0000000000004030 g       .bss	0000000000000000              _end
output.elf:     file format elf64-x86-64
SYMBOL TABLE:

We can see that now the .bss, .data & .text sections have certain addresses (respectively __bss_start, __data_start, _start). There’s also a bunch of other stuff we haven’t specified. That’s because I’m using gcc for Linux, and it automatically generates startup code and links to GLIBC. We don’t have to worry about it, because future articles will show examples processed with gcc for embedded devices where GLIBC is not needed.

Outro

I hope that I was able to explain the general idea behind the build process, and how each phase structures the final binary. I know that some parts may be confusing, especially the linking part, which was quite a hard nut to crack for me. In the next post I’m going to elaborate on that topic but in the context of embedded programming.

If you have any questions, or you would like to correct me on the subject, don’t hesitate to check the contact page!