ELF relative relocations explained
2023-09-18Background
Consider this code, where one piece of global data is a pointer to another piece of global data:
void *foo;
void *bar = &foo;
During program execution, the memory at bar
contains an
address of foo
. One way to achieve this is to require all
the data to be loaded at a certain address. Then the linker can simply
pre-calculate the address at link time and fill in the address for
bar
.
Having a program runnable from only one fixed address may be undesirable, however:
- If we’re a shared library, this would require a global registry of library addresses to avoid collision, and would be a massive pain.
- ASLR by definition requires randomized addresses and is in direct conflict of fixed addresses.
- In early boot scenarios such as bootloaders or EFI applications or drivers, virtual memory is not available, and it’s desirable to place executables out of the way of one another at runtime.
If something can work at any (suitably aligned) address, it is position-independent. Otherwise it’s position-dependent.
By compiling code to use PC-relative addressing, executable code can be made position-independent relatively easily. (Ha!) However the problem with global pointers is more fundamental. Again, take a look back at the code:
void *foo;
void *bar = &foo;
If we don’t want to change the representation of a
void *
, and the address of global data is only known at
runtime, something has to, at runtime, figure out the address
of foo
and put it in bar
before user code can
see it.
Relative relocations
On ELF systems, this problem is handled with dynamic relocations. We’ll tackle the most simple case here: relative relocations.
We’ll handle the simplest case and assume everything (code, data) in an executable must still be contiguous and cannot move relative to one another. The entire executable will move as a whole to some “base address” that is only known at runtime. Then everything in the executable will be at a fixed offset from the base address.
We will also not need to link to dynamic shared libraries, and everything we need is in the executable. This is known as a statically-linked position-independent executable, or static-pie.
Conceptually, the three major steps that occur to make the code example work are:
Compiler: Generates a relocatable file (object file) with static relocations saying this:
(8 bytes) // foo (8 bytes) // bar, Please fill in the address of foo
Linker: Places everything the compiler says to place at offsets from the base and generates a static-pie with dynamic relocations:
offset contents 0x100 (8 bytes) 0x108 (8 bytes) // Please fill in value of (base + 0x100)
Startup code: Figures out
base
, looks at the dynamic relocations, and fills in the correct addresses:addr contents 0xabc100 0 0xabc108 0xabc100
We’ll ignore the relocatable file and focus on the static-pie.
RELA and REL
An instruction like “please fill in value of
(base + one_offset)
at
(base + another_offset)
” is called a relocation.
The most obvious way to encode such a relocation is to just have a table
of offsets. Like maybe an array of “RELA” entries:
struct {
uintptr_t offset;
uintptr_t info;
intptr_t addend;
}
info
encodes information. The low 8 (for 32-bit ELF) or
32 (for 64-bit ELF) bits encodes the type of relocation and higher bits
is a symbol index. For our purposes the type is just
R_*_RELATIVE
, and since it’s just an offset and there’s no
symbol, all the higher bits are 0.
(In elf.h
speak, ELF*_R_INFO(sym, type)
encodes the symbol index and relocation type into info
(*
is 32
or 64
), and
ELF*_R_TYPE(info)
, ELF*_R_SYM(info)
unpacks
the fields back.)
offset
is the offset from the base address to which the
entry applies to, and the addend is… the value to be added, in this case
added to the base address.
(Note: The same structure is used for static relocations but the meanings of fields are not identical. That’s irrelevant to this article.)
To put everything together again, for each entry in the array, if
info == R_*_RELATIVE
, then it means “Please fill in the
value of (base + addend)
at (base + offset)
”.
The algorithm to run to apply these relocations is also pretty
simple:
uintptr_t base = ...;
for (rela = ...) {
if (ELF*_R_TYPE(rela->info) == R_*_RELATIVE) {
*(uintptr_t *)(base + rela->offset) = base + rela->addend;
} else ...
}
On some architectures a simplified structure “REL” is used, like:
struct {
uintptr_t offset;
uintptr_t info;
}
The addend
field is gone from the table, and is just
stashed into where offset
points. The meaning of entries is
otherwise the same.
for (rel = ...) {
if (ELF*_R_TYPE(rel->info) == R_*_RELATIVE) {
*(uintptr_t *)(base + rel->offset) += base;
} else ...
}
The .rel[a].dyn
section, and DT_REL[A]*
The linker synthesizes an array of tag-value pairs of dynamic linking
information, and also synthesizes a symbol _DYNAMIC
to
point to the start of it:
struct {
uintptr_t tag;
uintptr_t value;
} _DYNAMIC[];
The format of the dynamic array does not have an explicit size and is
terminated by an entry with tag
as
DT_NULL = 0
. The relevant tags for us are:
tag
isDT_RELA = 7
,value
is offset of array of RELA entriestag
isDT_RELASZ = 8
,value
is total size in bytes of the array of RELA entriestag
isDT_RELAENT = 9
,value
is the size in bytes of one RELA entry. If there ever is a future need for extending the contents RELA entry, this will grow larger.
This allows us to find where the array of RELA entries is. We can
just find _DYNAMIC
and the base address using PC-relative
addressing, and we can run the relocation algorithm from the previous
section.
// Find these from _DYNAMIC
size_t dt_rela, dt_relasz, dt_relaent;
for (char *ptr = (char*)(dt_rela + base);
ptr < (char*)(base + dt_rela + dt_relasz);
ptr += dt_relaent) {
Rela *rela = (Rela*)ptr;
// Handle rela
}
Perhaps just for convenience, the linker puts the
_DYNAMIC
array in a section called .dynamic
,
and the dynamic RELA relocations in .rela.dyn
.
(If you have an operating system giving you ELF auxillary vectors,
you can look up the base address as AT_BASE
, and look up
where _DYNAMIC
is by looking for AT_PHDR
to
find the PT_DYNAMIC
segment. I suppose this is more
portable.)
A similar thing happens for REL, with the .rel.dyn
section, and tags DT_REL = 17
, DT_RELSZ = 18
,
DT_RELENT = 19
. Not much more to say about those.
Linker script
trickery for the .rel[a].dyn
section
If you’re building a bootloader or something and just want to, you
don’t have to use _DYNAMIC
. You can just use the
linker script to give you symbols that bound the .rel[a]
array.
.rela.dyn : {
__rela_start = .;
*(.rela*)
__rela_end = .;
}
RELR
The RELA format is grossly inefficient space-wise at representing relocations. REL is 1/3 better but is only really a thing on some architectures. Can we do even better?
For position independent executables it is often the case that there are a lot of contiguous addresses needing relative relocation. For example, this is one relocation per every string:
const char *string_table = {
"one string",
"another string",
"a third string",
...
};
RELR is a new packed format to store dynamic relative relocations more efficiently with bitmaps. From what I can tell it’s not really well documented, but support exists in e.g. glibc and musl. The linker LLD supports it fully. BFD ld supports it for x86 only.
The closest to an official documentation is one email on the general-abi mailing list that is a proposal for the RELR format.
The idea originates from Google for their Android stuff, but the format eventually settled on is this:
- Each entry is
uintptr_t
long. - If an entry is even, it means the
uintptr_t
at this address needs relocating. - If an entry is odd, the bits except the least significant bit, from
lowest to highest, is a bitmap encoding which of the
sizeof(uintptr_t) * 8 - 1
uintptr_t
values after the last address handled needs relocating.
(It is simply assumed that all offsets needing relocation is aligned.)
For example, on a 64-bit system, if we want 64 contiguous relocations, we can just represent it with two entries:
0x0000_0000_0001_0000 The one uintptr_t at offset 0x10000 needs relocating
0xffff_ffff_ffff_ffff The next 63 uintptr_t values after that needs relocating
(Offset 0x10008 through 0x101f8)
The last address handled is 0x101f8
, so if the next
entry is still a bitmap it would start from 0x10200
. If we
want one more relocation, making it 65 instead of 64:
0x0000_0000_0000_0003 Of the next 63 words, only the first one needs relocating
(Offset 0x10200)
(Remember that the least significant bit isn’t part of the bitmap.)
Even though bitmap only has one bit set, it still handles 63
uintptr_t
values, so if the next entry is still a bitmap it
would start from 0x103f8
.
If we want to skip to another address, just put in the address. In fact if you want to be lazy and don’t want to figure out how to generate bitmaps, you can skip this altogether and just put down a list of addresses, and it’s still valid.
As simple as RELR is, the savings are remarkable. For our example, 65 relocations are represented with 24 bytes with RELR. In RELA that’s 1560 bytes. This is a space saving of 98%. According to some measurements this saves around 5% to 20% of binary size for PIEs. On most modern computers it is likely that the more compact in-memory representation is also faster due to less memory accesses.
The algorithm in more detail is like this:
uintptr_t next; // Next address to relocate
for (uintptr_t entry ...) {
if ((entry & 1) == 0) {
*(uintptr_t *)(base + entry) += base; // Like REL
// Next to relocate is the word after the one pointed to by entry
next = base + entry + sizeof(uintptr_t);
} else {
// The bitmap handles the next (sizeof(uintptr_t) * 8 - 1) words
for (i = 0; i < sizeof(uintptr_t) * 8 - 1; i ++) {
if ((entry >> (i + 1)) & 1) {
// If bit (i + 1) is set, word i needs relocating
*(uintptr_t *)(next + sizeof(uintptr_t) * i) += base;
}
}
// Next to relocate is after that
next += sizeof(uintptr_t) * (8 * sizeof(uintptr_t) - 1);
}
}
For the bitmap case, it might be more convenient to write it like this:
uintptr_t tmp = next;
for (; entry >>= 1; tmp += sizeof(uintptr_t)) {
if (entry & 1) {
*(uintptr_t *)tmp += base;
}
}
next += sizeof(uintptr_t) * (8 * sizeof(uintptr_t) - 1);
RELR uses .relr.dyn
section and tags
DT_RELR = 36
, DT_RELRSZ = 35
,
DT_RELRENT = 37
. (Note: The values aren’t sequential!)
Contrary to RELA, and most struct arrays in ELF, for that matter, RELR
doesn’t seem to designed to be extensible, since it’s just a compact
form of REL.
Since only dynamic relative relocations are represented in RELR, it does not replace RELA/REL. The proposal states that dynamic RELR relocations should be processed before REL/RELA.
A note on “base address”
The base address mentioned above is not actually the start of the
binary, but the offset of which the segments in a PIE (or shared
library) has been moved, so real_address - segment_address
.
Since PIEs are usually linked to starting at 0
this checks
out. If your linker script links everything starting at somewhere else,
like:
SECTIONS {
. = 0x80000000;
__executable_start = .;
// ...
}
The base address is actually
real __executable_start - 0x80000000
. No idea why you would
do that though, since the whole point of PIE is to not have a fixed
start address…