0x01 Before start
Last year, I spent most of my time developing on macOS. However, I’m not going on anymore on it. So I decided to write a series of blog posts to document some interesting things. And this is the first post, about how to hook syscalls in the macOS kernel.
By the way, it’s been a long time since I’ve written any documents in English. This blog may not be smooth, please be patient.
0x02 Why hook in kernel
As you may know, macOS’s dynamic loader (dyld) provides a simple way for code injection, which Apple calls DYLD_INTERPOSE
. All we need to do is declare and implement a replacement function, then declare a structure that includes the original function and the replacement function, and use attribute syntax to store the structure in the __DATA,__interpose
section. Then, once the program is loaded by dyld, the hook will automatically take effect.
For example, hook open syscall.
1 | /* |
In addition, to hook processes that are not developed by us, we can build a shared library containing a hook and a DYLD_INTERPOSE
structure. Then, inject the library into the process using the DYLD_INSERT_LIBRARIES
environment variable. This allows us to replace the specific function with the hook from the library.
Compared to kernel hooking, this method can hook C functions in userspace and is not limited to syscalls. However, it also has the limitation of being unable to hook system processes, which makes kernel hooking sometimes necessary.
0x01 sysent
System calls are defined by a sysent structure in the kernel, which can be found in the <sys/sysent.h>
header as follows:
1 | struct sysent { |
In macOS, the sysent table is simple an array of sysent. It is declared in the <sys/sysent.h>
header as follows:
1 | extern struct sysent sysent[]; |
When a syscall is called by userspace process, the syscall number is used to find the corresponding syscall implemention.
For example, the number of exit
syscall is 1.
1 | #define SYS_exit 1 |
The corresponding entry for the exit
syscall can be found in sysent[1].
Another thing important is that kernel extensions(kexts) share the same memory space with the kernel, so we can build a kext and load it into the system. Then, we can locate the sysent table in the kext and replace the sy_call
address of the specified syscall to the hook.
0x02 Find the sysent table
The first step is to find the sysent table.
For macOS 10.13 - 10.14, the sysent table is saved in __CONST, __constdata
section.
For macOS 10.15 - 14, the sysent table is saved in __DATA_CONST, __const
section.
Find the section address
In order to find the __CONST, __constdata
or __DATA_CONST, __const
section which contains the sysent table, we need to load the kernel Mach-O file and praser the Mach-O header in our kext.
pesudo code as follows:
1 | // convert kernel path into a vnode. |
In parser_kernel_header, we can get the following important information:
1 | The VM address, VM size, file offset, file size of the __TEXT segment and its section. |
One additional point is that the VM address does not contains the KASLR slide(which is a part of security mitigations). So we need to calculate the slide and add it to the VM address to get the real VM address.
Apple provides a convenient function vm_kernel_unslide_or_perm_external
which output a unslide address by input a slided symbol address, we can use this function to find KASLR, the pesudo code is as follows:
1 | vm_offset_t func_address = (vm_offset_t)vm_kernel_unslide_or_perm_external; |
In macOS version less than 11.0, the __DATA_CONST(or __CONST) segment is already found. It is segment VM Address + kernel slide
, and the memory layout is as follows:
1 | 0xffffff8000200000(Kernel header base address) |
But in macOS versions greater than or equal to 11.0, there is another slide called the ‘cache slide’ that needs to be resolved.
On macOS Big Sur and up, ‘slide’ is set to the the “kernel cache slide” – an offset to the location of the “kernel cache”, which contains the kernel and a bunch of kernel extensions. The kernel itself is inside this “cache”. Find it to determine the “kernel slide”. hookcase
Then the memory layout like this:
1 | 0xffffff8000200000(Kernel header base address) |
And the fixed method is easy: solve an exported symbol address like _printf
from the symbol table, add it to the slide, then compare the address with the real symbol address. The subtraction result is the cache slide.
pesudo code:
1 | _printf = get_symbol(kinfo, "_printf"); |
Brute force sysent table
Once we have found the __CONST, __constdata
or __DATA_CONST, __const
section, we can brute force to find the sysent table.
The following pseudo code retrieves the function addresses of the first three system calls: NOSYS
, EXIT
, and FORK
. It then interprets the data in the const section to a sysent structure and compares the sy_call
field of the structure with function addresses. If the comparison is successful, the sysent table is found.
1 | nosys = (sy_call_t *)get_symbol(kinfo, "_nosys"); |
0x03 Hook the syscall
Hooking the syscall is the last and easiest step. Disable interrupts, disable preemption, and disable CPU write protection. Then, replace the sy_call in the specified syscall sysent.
pesudo code:
1 | // disable interrupt |