In this article we look at the C build process and different stages– that is, how we get from C source files to executable code, programmed on the target. Modern IDEs are making this knowledge ever-more arcane.
Embedded Systems Build Process
Embedded systems build process usually involves a host machine ( powerful computer) and resource-constrained target devices such as microcontrollers. The tools required to build embedded systems installed on the host machine. Because the target device does not have enough resources to install operating systems or embedded built tools such as GNU toolchain. Therefore, we generate a binary image or executable file for the target device using a host machine and compile the application code on a host machine using various tools.
What is Cross Compilation?
The process of generating executable code by using cross toolchains on the hosting machine and created executables runs on the target machine is known as cross-compilation. Because we compile executable code on the host machine that runs on the other machine such as target devices i.e. microcontrollers. The main difference between native compilation and cross-compilation is that in a native compilation, the generated executable runs on the same machine.
GNU toolchains Introduction
GNU toolchain consists of many tools required for embedded system build processes such as preprocessor, compiler, assembler, linker, locator, loader, and debugger. In an embedded system, we usually write embedded applications in c programming language. Your application code goes through various binaries to get an executable file for a target device. GNU cross-compilation toolchain is a collection of these binaries. GNU toolchain also sometimes refers to the GCC toolchain.
Now lets first discuss the function of each compilation step one by one:
We have a C Program file with an extension of .c i.e. blink.c file.
It performs preprocessing on application source codes ( C files and header files) by replacing macros, including header files (which are any lines starting with
#), and performing conditional compilation.
Some of the most common directives are #define, #include, #ifdef, #if, #else etc. The output of the preprocessor is a text.
The compiler takes the preprocessed source code and generates architecture specific assembly ( .s or .asm ). The Compiler itself is normally broken down into two parts:
- The front end, responsible for parsing the source code.
- The back end, responsible for code optimization and generation.
Front End Processing:
- First stage of the front-end part of the compiler is scanning the input text, strip out all whitespace and Tokenization by identifying tokens such as
keywords (for example ‘while’)
operators (for example, ‘*’)
identifiers; a variable name
literals (for example, 10 or “my string”)
comments (which is discarded at this point)
then passing the scanned token to the parsing tool that ensures tokens are organized according to C rules , and if there is any error, if there is any problem with the code, it will be reported to the programmer at this stage with syntax errors.
- Second stage of the front-end of the compiler is checking if the sentence that has been parsed has the right meaning. And, this semantic check, if it fails you get a Semantic Error.
- One of the key things that really happens in the semantic analysis is to do with all variables that are present in the program. And for that matter, the compiler maintains the information of all the declared variables in a structure called symbol table. Once the variable is locked up, it gets its attributes; the attributes are its type, scope, and so on.
- When the statement is found to be semantically meaningful and is correct, the compiler undertakes its next action, which is to translate this sentence that is being seen into an internal representation called as the intermediate representation; the idea here is to take the high level language construct, regardless of the language, and to convert it into form which is closer to the assembly; to be able to compile different languages on different targets.
Back End Processing:
- First stage of the back-end part of the compiler is optimization; nowadays, compilers are smart enough to able to not just compile the code, but also to provide some optimization.
- There are many forms of optimization such as transforming code into smaller or faster but functionally equivalent, inline expansion of functions, dead code removal, loop unrolling, and register allocation.
- The last stage of the back-end part of the compiler is the code generation; in which the compiler converts the optimized intermediate code structure into assembly code.
The output of the compiler is fed into assembler. The assembler converts the assembly code into an object code. The object code is actually a machine code and assembler generates object code according to selected microcontroller or microprocessor instruction set architecture. Your embedded application may have reference to built-in libraries and more than one source code file. Therefore, the assembler may generate multiple object files with an extension of .obj/.o
The linker takes input as multiple object files and gives output as a single object file which is also called as the relocatable code. The output of compiler is multiple object files. These files are incomplete in the sense that they may contain reference to variables and functions across multiple object files which need to be resolved. The job of the linker is to combine these multiple object files and resolve the unresolved symbols.
The Linker does this by merging the various sections like text, data, and bss of the individual object files. The output of the linker will be a single file which contains all of the machine language code from all of the input object files that will be in the text section of this new file, and all of the initialized and uninitialized variables will reside in the new data section and bss section respectively.
The role of the locator is to perform a memory map according to the selected microcontroller memory map. The locator gets the output file of the linker and maps the code and data into microcontroller memory by providing actual addresses to code and data. It actually defines in which memory region of microcontroller code and data will be saved. Finally, the output of this stage will be ready to use executable code.