Elements of Parsing an ELF Header

by Gene Cooperman
(E-mail: first name in lowercase and "at" ccs.neu.edu)
Copyright (c) 2017, Gene Cooperman
This work may be freely copied and modified as long as
(a) this copyright notice remains, (b) this author is credited, (c) and it is not sold..

The high-level principles of ELF concepts are often omitted, because each group documents only its own piece of the puzzle. This document is intended to "put it all together". Below, we see the ELF format, along with well-commented files to describe the link map, a doubly linked list, where each link points to a different library. This gives rise to the library search order concept used by things like LD_PRELOAD in man ld.so.

ELF_Format.pdf
/usr/include/link.h (user-visible portion of doubly linked list of libraries; the rest is in source of glibc.c)
/usr/include/elf.h
util-plugin.c (example of parsing ELF, not intended as a standalone program)

Next, recall that man readelf will allow you to learn anything you want about an ELF file on disk. In particular, an ELF ".o" file will include a symbol table and a relocation table. For full details, see ELF_format.pdf, above.

Roughly, a symbol table is a table of symbol and address where the symbol is stored. ELF splits that into a symtab ELF section and a strtab ELF section. The symbols of the symtab appear as an array of structs. The ELF sections, strtab and syment, are used as a table of contents. The strtab contains a sequence of strings separated by traditional null characters. The i-th string will correspond to the i-th entry of syment. The i-th entry of syment contains an index (typically different from i), and that index is an index into symtab, used to find the corresponding symbol.

Roughly, a relocation table is an address (for something like the address of a machine instruction such as for "jmp foo"), and the symbol "foo" being used in the instruction. The relocation process consists of looking at each relocation entry, finding the corresponding symbol in the symbol table, and entering the symbol address from the symbol table within the machine instruction at the address specified in the relocation entry. This type of relocation is the essence of static linking.

Next, a statically linked executable will use libraries like libmyfiles.a, which is simply a bunch of ".o" files and nothing more. Note that a ".o" file is a separate compilation unit that was compiled independently of any knowledge of other ".o" files. When we statically link, we use the symbol table and relocation table of each .o file to replace an assembly instruction (in machine language) for something like "jmp foo" with the actual address of foo, inline. This is done at compile/link time.

Similarly, a dynamically linked library, libmyfiles.so is a bunch of ".o" files, but in addition, it contains an ELF header, some program headers after the ELF header (for the memory segments), and some section headers at the end of the file. The ".o" files in libmyfiles.so have also typically been linked together at compile/link time. And if we had declared the symbols as __attribute__((visibility("hidden"))) in the C/C++ code, then these symbols will no longer be visible or linkable from other ".so" files. That's because they will be omitted from the symbol table of this libmyfiles.so file.

Next, comes the hard part. Suppose we want to load the various ".so" files into RAM and even link symbols across these libraries. At this point, you should read Figure 1-1 of the Introduction in ELF_format.pdf.

The segments specified by the program header are loaded into memory by the loader. In addition, some of the section headers are also loaded into memory, and they will be used by the run-time linker to finish linking between ".so" files. The files are loaded by the loader specified in the executable. Try doing
vi /bin/ls
and you will notice something like the string /lib64/ld-linux-x86-64.so.2. The kernel finds this loader by parsing the ELF header, and the kernel then executes:
/lib64/ld-linux-x86-64.so.2 /bin/ls
Try executing the above line yourself, and you'll see that it works!

So, this loader works because it loads /bin/ls and the required libraries. Now try executing:
ldd /bin/ls
You'll see the extra ".so" libraries to be loaded. One of them is a variant of "ld.so". When this is loaded in memory, it becomes the runtime dynamic linker.

Next, if one library makes a call to foo in a different library, then it will actually call a stub function that will call a patch-up routine in the runtime dynamic linker, ld.so. TODO: FINISH THIS, AND THEN THE HIGH-LEVEL VIEW, WITH DYNAMIC TAGS.