|
This is overview of systrace design. For intro to systrace, go to Niels
Provos' systrace web, read onlamp.com's articles or buy book about
OpenBSD covering systrace.
kernel . userland ---------------------------------------| . | /--> [ in kernel policy ] | . | | | . -------------------------- | | | . | systraced binary | | systrace_enter() <------------------.--| (p_flag & P_SYSTRACE) | | | syscall2(). -------------------------- | v | . ^ | systrace_msg_ask() | . | ---------------------------------------- . | execvp() | . | | . ------------------------ | ioctl . | /bin/systrace | \----<--------------->------.->| [ userland policy ] | poll . ------------------------ . | requestor_start() . v ( execvp() ) . --------------------- . | xsystrace | . ---------------------Binary /bin/systrace uses execvp() system call to launch traced binary. It also sets specific flag (P_SYSTRACE) on this binary to indicate that this binary is to be systraced. When traced binary wants to call certain syscall, it will do it via speciall assembler instruction which switches processor into kernel mode (int 0x80 on i386 architecture, syscall on MIPS arch) and invokes exception handler. (let's say syscall is speciall kind of execption) In kernel, syscall handler is called. For i386 architecture that is syscall2(), defined in /sys/i386/i386/trap.c. This function contains hooks for calling systrace functions which evaluate policy for this syscall and decide whether it will be called or not and clean up after syscall has (not) been called. These hooks are systrace_enter() and systrace_exit(). More systrace hooks exist in kernel code, but this number is low. (less than 5) So, the most interesting part of syscall handler looks like this:
systrace kernel portion then needs to decide to permit/deny this particular syscall. This presumes that systrace policies are loaded. Systrace policies for given binary (and optionaly its children) consist of userland policies and kernel policies. Kernel policy set contains syscalls which don't need their arguments evaluated. This is done for performance reasons.
This decision may require search in userland policy. kernel portion 'sends
a message' to userland portion via systrace_msg_ask(). This
function returns only after it got answer from userspace or some
error has occured. (e.g. traced process has ended) In the meantime, traced
process is put to sleep. If syscall is permitted, kernel portion in systrace_enter() will eventualy rewrite its arguments via systrace_replace() and after that, elevate privileges via systrace_seteuid(), systrace_setegid(). |