seccomp-nurse is a generic sandbox environnement for Linux, which doesn’t require any recompilation. Its purpose is to run legit applications in hostile environment, I repeat, it is not designed to run malicious binary.
How does it work? The following figure describes the architecture of seccomp-nurse. You can see two processes, one running the untrusted code and the trusted one. The trusted process is charge of intercepting syscalls and checking if the action is allowed.
How do we intercept syscalls? By using a x86_32 hack. If you remember my previous post, I described how the GNU Libc was executing syscalls: by making an indirect call in VDSO. seccomp-nurse overrides this page in order to call our own function instead of performing the syscall. Our handler retrieves CPU registers and directly sends them to the trusted process through a socket. The trusted process checks its policy engine, like: “can this process open this file?”
If action is allowed, how to execute it? SECCOMP only permits 4 syscalls, how to do? Well. SECCOMP flag is limited to the thread scope, that means that if a process has two threads, one can be sandboxed (which will be called untrustee) and the other (called trustee) is free to do whatever it wants, furthermore, if threads share everything, any action done in one thread has an impact on the other. This is pretty cool! But so dangereous!
Indeed, everything is shared, only the CPU registers are not shared between threads, that’s all! The trustee must consider its environment as hostile: its code must not do on memory access, only registers can be used. That’s why this part is written in assembly in order to control every instructions. It has been designed to be the simplest possible because this is the keystone of the sandbox, the security of the system relies on it.
This routine is completely dummy and has no intelligence at all, everything is done in the trusted process, the trustee understands only theses commands:
- Execute this syscall
- Raise a SIGTRAP (for debugging purpose)
- Native exit
- Poke/Peek memory
Limitations: Because of our way of intercepting syscalls, we can only run dynamically linked binaries on 32 bits, using the GNU Libc. It is hoped that the situation will improve greatly in the following weeks… Stay tuned!
Performances: Hahem. I don’t know. Each time the untrustee makes a syscall, our sandbox makes a lot of back and forth between both processes (one back and forth = at least one read, one write).