macOS kernel syscall hook
lyq1996

0x01 Before start

Last year, I spent most of my time developing on macOS. However, I’m not going on anymore on it. So I decided to write a series of blog posts to document some interesting things. And this is the first post, about how to hook syscalls in the macOS kernel.

By the way, it’s been a long time since I’ve written any documents in English. This blog may not be smooth, please be patient.

0x02 Why hook in kernel

As you may know, macOS’s dynamic loader (dyld) provides a simple way for code injection, which Apple calls DYLD_INTERPOSE. All we need to do is declare and implement a replacement function, then declare a structure that includes the original function and the replacement function, and use attribute syntax to store the structure in the __DATA,__interpose section. Then, once the program is loaded by dyld, the hook will automatically take effect.

For example, hook open syscall.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/*
* Example:
*
* static
* int
* my_open(const char* path, int flags, mode_t mode)
* {
* int value;
* // do stuff before open (including changing the arguments)
* value = open(path, flags, mode);
* // do stuff after open (including changing the return value(s))
* return value;
* }
* DYLD_INTERPOSE(my_open, open)
*/

#define DYLD_INTERPOSE(_replacement,_replacee) \
__attribute__((used)) static struct{ const void* replacement; const void* replacee; } _interpose_##_replacee \
__attribute__ ((section ("__DATA,__interpose"))) = { (const void*)(unsigned long)&_replacement, (const void*)(unsigned long)&_replacee };

#endif

In addition, to hook processes that are not developed by us, we can build a shared library containing a hook and a DYLD_INTERPOSE structure. Then, inject the library into the process using the DYLD_INSERT_LIBRARIES environment variable. This allows us to replace the specific function with the hook from the library.

Compared to kernel hooking, this method can hook C functions in userspace and is not limited to syscalls. However, it also has the limitation of being unable to hook system processes, which makes kernel hooking sometimes necessary.

0x01 sysent

System calls are defined by a sysent structure in the kernel, which can be found in the <sys/sysent.h> header as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
struct sysent {
/* implementing function */
sy_call_t *sy_call;

/* system call arguments munger for 32-bit process */
#if CONFIG_REQUIRES_U32_MUNGING || (__arm__ && (__BIGGEST_ALIGNMENT__ > 4))
sy_munge_t *sy_arg_munge32;
#endif
/* system call return types */
int32_t sy_return_type;
/* number of args */
int16_t sy_narg;

/* Total size of arguments in bytes for
* 32-bit system calls
*/
uint16_t sy_arg_bytes;
};

In macOS, the sysent table is simple an array of sysent. It is declared in the <sys/sysent.h> header as follows:

1
extern struct sysent sysent[];

When a syscall is called by userspace process, the syscall number is used to find the corresponding syscall implemention.

For example, the number of exit syscall is 1.

1
#define SYS_exit 1

The corresponding entry for the exit syscall can be found in sysent[1].

Another thing important is that kernel extensions(kexts) share the same memory space with the kernel, so we can build a kext and load it into the system. Then, we can locate the sysent table in the kext and replace the sy_call address of the specified syscall to the hook.

0x02 Find the sysent table

The first step is to find the sysent table.

For macOS 10.13 - 10.14, the sysent table is saved in __CONST, __constdata section.
For macOS 10.15 - 14, the sysent table is saved in __DATA_CONST, __const section.

Find the section address

In order to find the __CONST, __constdata or __DATA_CONST, __const section which contains the sysent table, we need to load the kernel Mach-O file and praser the Mach-O header in our kext.

pesudo code as follows:

1
2
3
4
5
6
7
8
// convert kernel path into a vnode.
vnode_lookup("/System/Library/Kernels/kernel", 0, &kernel_vnode, vfs);

// read kernel header from the vnode
read_kernel_header(buffer, vnode);

// parser necessary header information from the header buffer
parser_kernel_header(buffer);

In parser_kernel_header, we can get the following important information:

1
2
3
4
5
6
7
8
9
The VM address, VM size, file offset, file size of the __TEXT segment and its section.

The VM address, VM size, file offset, file size of the __DATA segment and its section.

The VM address, VM size, file offset, file size of the __DATA_CONST(or __CONST) segment and its section.

The VM address, VM size, file offset, file size of the __LINKEDIT segment and its section.

Symbol and String table.

One additional point is that the VM address does not contains the KASLR slide(which is a part of security mitigations). So we need to calculate the slide and add it to the VM address to get the real VM address.

Apple provides a convenient function vm_kernel_unslide_or_perm_external which output a unslide address by input a slided symbol address, we can use this function to find KASLR, the pesudo code is as follows:

1
2
3
4
vm_offset_t func_address = (vm_offset_t)vm_kernel_unslide_or_perm_external;
vm_offset_t func_address_unslid = 0;
vm_kernel_unslide_or_perm_external(func_address, &func_address_unslid);
kernel_slide = func_address - func_address_unslid;

In macOS version less than 11.0, the __DATA_CONST(or __CONST) segment is already found. It is segment VM Address + kernel slide, and the memory layout is as follows:

1
2
3
4
5
6
7
8
9
0xffffff8000200000(Kernel header base address)
...
0xffffff80002xxxxx(base address + slide) ---> kernel header
...
...
0xffffff80002xxxxx(__DATA_CONST segment VM address + slide) ---> __DATA_CONST segment
...
0xffffff80002xxxxx(__const section VM address + slide) ---> __const section
...

But in macOS versions greater than or equal to 11.0, there is another slide called the ‘cache slide’ that needs to be resolved.

On macOS Big Sur and up, ‘slide’ is set to the the “kernel cache slide” – an offset to the location of the “kernel cache”, which contains the kernel and a bunch of kernel extensions. The kernel itself is inside this “cache”. Find it to determine the “kernel slide”. hookcase

Then the memory layout like this:

1
2
3
4
5
6
7
8
9
0xffffff8000200000(Kernel header base address)
...
0xffffff80002xxxxx(kernel cache address = base address + slide + cache_slide) ---> kernel header
...
...
0xffffff80002xxxxx(__DATA_CONST segment VM address + slide + cache_slide) ---> __DATA_CONST segment
...
0xffffff80002xxxxx(__const section VM address + slide + cache_slide) ---> __const section
...

And the fixed method is easy: solve an exported symbol address like _printf from the symbol table, add it to the slide, then compare the address with the real symbol address. The subtraction result is the cache slide.

pesudo code:

1
2
_printf = get_symbol(kinfo, "_printf");
cache_silde = (mach_vm_address_t)&printf - _printf;

Brute force sysent table

Once we have found the __CONST, __constdata or __DATA_CONST, __const section, we can brute force to find the sysent table.

The following pseudo code retrieves the function addresses of the first three system calls: NOSYS, EXIT, and FORK. It then interprets the data in the const section to a sysent structure and compares the sy_call field of the structure with function addresses. If the comparison is successful, the sysent table is found.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
nosys = (sy_call_t *)get_symbol(kinfo, "_nosys");
exit = (sy_call_t *)get_symbol(kinfo, "_exit");
fork = (sy_call_t *)get_symbol(kinfo, "_fork");

sysent_table = 0;
for (offset = 0; offset < section_size; offset += 16) {
struct sysent *table = (struct sysent *)(section_addr + offset);
if (table->sy_call != nosys) {
continue;
}

vm_offset_t next_entry_offset = sizeof(struct sysent);
struct sysent *next_entry = (struct sysent *)((vm_offset_t)table + next_entry_offset);
if (next_entry->sy_call != exit) {
continue;
}
next_entry = (struct sysent *)
((vm_offset_t)next_entry + next_entry_offset);
if (next_entry->sy_call != fork) {
continue;
}

sysent_table = table;
}

0x03 Hook the syscall

Hooking the syscall is the last and easiest step. Disable interrupts, disable preemption, and disable CPU write protection. Then, replace the sy_call in the specified syscall sysent.

pesudo code:

1
2
3
4
5
6
7
8
9
10
11
12
13
// disable interrupt
boolean_t org_int_level = ml_set_interrupts_enabled(false);
// disable preemption
_disable_preemption();
// disable write protection
_disable_wp();

// replace origin sy_call
sysent_table[SYS_exit].sy_call = exit_hook;

_enable_wp();
_enable_preemption();
ml_set_interrupts_enabled(org_int_level);

0x04 Thanks

HookCase

 评论
评论插件加载失败
正在加载评论插件
由 Hexo 驱动 & 主题 Keep
本站由 提供部署服务
访客数 访问量