Intro
Initial idea for this post was to start writing about the startup code but in the process I’ve noticed it’s necessary to first address the elephant in the room which is the memory layout of a program. My goal for this entry is to explain it by presenting how each step of the build process shapes the final executable binary.
Disclaimer
I’ve decided to skip the topic of preprocessor in this post because I don’t
find it that important from the perspective of the memory layout. The only
necessary part for this post is the working of the include directive, which
puts the content of a specified file in place of the directive.
Paradigm shift
While programming in high-level languages (that includes C too), we are used to the concept of variables and functions. For the program however those are just addresses in memory. Calling a function means jumping to the specific place in a program and accessing a variable means reading or writing to the memory at the specific address.
There are runtime dependent addresses like local variables allocated on the stack or heap. The real address is determined during execution, and can vary depending on different conditions.
Global and static variables (called later static storage duration variables) along with functions, point to addresses that are runtime independent. These are determined during the build process in 3 steps:
Compilation
The compiler generates a symbol (basically labeled address) for each function and static storage duration variable. Those are placed inside a specific section of an assembly file. The ELF file standard describes (but is not limited to) sections:
.text- contains program code in assembly, can also contain read only data..rodata- read only data.data- read & write data with initial values.bss- read & write data without initial values
The goal of such an assembly file is to make an initial description of the desired
layout. The file specifies symbols which can be assigned to a specific section.
Symbols mark the beginning of a certain area and then can either reserve space, fill the
space with data or in the case of .text fill the space with assembly instructions.
Let’s take a C file:
// main.c
#include <stdint.h>
uint32_t global_initialized_var = 60;
uint32_t global_uninitialized_var;
uint32_t static static_initialized_var = 65;
uint32_t static static_uninitialized_var;
int main() {
uint32_t local_initialized_var = 44;
uint32_t local_uninitialized_var;
return 0;
}
and compile it with command:
gcc -S main.c
That will produce an assembly file called main.s. You can see how the assembly code
and variables are assigned do different sections and wrapped with some
metadata. Indentation for this generated assembly is quite weird and doesn’t
really show boundaries of symbols. Don’t let it fool you.
# main.s
.file "main.c"
.text
.globl global_initialized_var
.data
.align 4
.type global_initialized_var, @object
.size global_initialized_var, 4
global_initialized_var:
.long 60
.globl global_uninitialized_var
.bss
.align 4
.type global_uninitialized_var, @object
.size global_uninitialized_var, 4
global_uninitialized_var:
.zero 4
.data
.align 4
.type static_initialized_var, @object
.size static_initialized_var, 4
static_initialized_var:
.long 65
.local static_uninitialized_var
.comm static_uninitialized_var,4,4
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $44, -4(%rbp)
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (GNU) 15.2.1 20251112"
.section .note.GNU-stack,"",@progbits
Assembly
After assembler processes the assembly file, it produces an object file. This is a binary file that contains machine code, however, this is still not the final executable, but something called a relocatable object. It means that, instead of having fixed address symbols, it uses an offset from the beginning of the section they are assigned to as an address. The reason for that is: we may want to combine multiple object files together into a single executable. A use case for that may be linking it to an external library or an implementation of incremental build_. In this approach we divide different source code files into groups called translation units and we independently compile each translation unit into a separate object file. When we already have the object file for each translation unit, then, when the source changes, we only have to recompile the updated translation unit, and link it with unchanged object files. It can improve build time by omitting a lot of unnecessary work.
Let’s add new files that will act as a separate translation unit: hello.h, hello.c:
// hello.h
#include <stdint.h>
void increment(uint32_t * value);
// hello.c
#include "hello.h"
void increment(uint32_t * value) {
*(value) += 1;
}
File main.c after modification:
// main.c
#include <stdint.h>
#include "hello.h"
uint32_t global_initialized_var = 60;
uint32_t global_uninitialized_var;
uint32_t static static_initialized_var = 65;
uint32_t static static_uninitialized_var;
int main() {
uint32_t local_initialized_var = 44;
uint32_t local_uninitialized_var;
increment(&local_initialized_var);
return 0;
}
Separate compilation of translation units. The -c flag tells gcc to not perform linking:
gcc hello.c -c -o hello.o
gcc main.c -c -o main.o
We can use a program called objdump to extract information from object files. The -t flag helps us to see symbols.
objdump -t hello.o
hello.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 hello.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 g F .text 000000000000001a increment
objdump -t main.o
main.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 main.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000004 l O .data 0000000000000004 static_initialized_var
0000000000000004 l O .bss 0000000000000004 static_uninitialized_var
0000000000000000 g O .data 0000000000000004 global_initialized_var
0000000000000000 g O .bss 0000000000000004 global_uninitialized_var
0000000000000000 g F .text 0000000000000045 main
0000000000000000 *UND* 0000000000000000 increment
0000000000000000 *UND* 0000000000000000 __stack_chk_fail
The first column shows a symbol’s offset from the section it’s assigned to. You
can notice that inside the .data and .bss sections, the static variable is
after the global one. It’s quite interesting what the increment function
declaration in the header file caused inside the main.o - it’s shown as a symbol
of undefined address, trusting the linker to find it.
Linking
Once we have relocatable objects, we can use a linker to merge these objects
into a single executable. This is done by merging together sections, and
relocating symbols to different addresses. It’s possible to customize the exact
behavior of the linker by feeding it a linker script, which is unnecessary in
this example, but will be covered in more detail in a future article.
We can create the executable from object files with this command:
gcc hello.o main.o -o output.elf
Now let’s inspect it with objdump. In the executable, symbol addresses are no longer defined as offsets from sections, but rather as offsets from the beginning of the file (or other specified base address). That means we can sort the output of objdump by this address for easier examination.
objdump -t output.elf | sort
0000000000000000 F *UND* 0000000000000000 __libc_start_main@GLIBC_2.34
0000000000000000 F *UND* 0000000000000000 __stack_chk_fail@GLIBC_2.4
0000000000000000 l df *ABS* 0000000000000000
0000000000000000 l df *ABS* 0000000000000000 hello.c
0000000000000000 l df *ABS* 0000000000000000 main.c
0000000000000000 w F *UND* 0000000000000000 __cxa_finalize@GLIBC_2.2.5
0000000000000000 w *UND* 0000000000000000 __gmon_start__
0000000000000000 w *UND* 0000000000000000 _ITM_deregisterTMCloneTable
0000000000000000 w *UND* 0000000000000000 _ITM_registerTMCloneTable
0000000000001000 g F .init 0000000000000000 .hidden _init
0000000000001040 g F .text 0000000000000026 _start
0000000000001139 g F .text 000000000000001a increment
0000000000001153 g F .text 0000000000000045 main
0000000000001198 g F .fini 0000000000000000 .hidden _fini
0000000000002000 g O .rodata 0000000000000004 _IO_stdin_used
0000000000002004 l .eh_frame_hdr 0000000000000000 __GNU_EH_FRAME_HDR
0000000000003de0 l O .dynamic 0000000000000000 _DYNAMIC
0000000000003fe8 l O .got.plt 0000000000000000 _GLOBAL_OFFSET_TABLE_
0000000000004008 g .data 0000000000000000 __data_start
0000000000004008 w .data 0000000000000000 data_start
0000000000004010 g O .data 0000000000000000 .hidden __dso_handle
0000000000004018 g O .data 0000000000000004 global_initialized_var
000000000000401c l O .data 0000000000000004 static_initialized_var
0000000000004020 g .bss 0000000000000000 __bss_start
0000000000004020 g .data 0000000000000000 _edata
0000000000004020 g O .data 0000000000000000 .hidden __TMC_END__
0000000000004024 g O .bss 0000000000000004 global_uninitialized_var
0000000000004028 l O .bss 0000000000000004 static_uninitialized_var
0000000000004030 g .bss 0000000000000000 _end
output.elf: file format elf64-x86-64
SYMBOL TABLE:
We can see that now the .bss, .data & .text sections have certain
addresses (respectively __bss_start, __data_start, _start). There’s also
a bunch of other stuff we haven’t specified. That’s because I’m using gcc for
Linux, and it automatically generates startup code and links to GLIBC. We
don’t have to worry about it, because future articles will show examples
processed with gcc for embedded devices where GLIBC is not needed.
Outro
I hope that I was able to explain the general idea behind the build process, and how each phase structures the final binary. I know that some parts may be confusing, especially the linking part, which was quite a hard nut to crack for me. In the next post I’m going to elaborate on that topic but in the context of embedded programming.
If you have any questions, or you would like to correct me on the subject, don’t hesitate to check the contact page!