diff options
author | Mike Frysinger <vapier@google.com> | 2020-01-30 00:08:48 -0500 |
---|---|---|
committer | Mike Frysinger <vapier@google.com> | 2020-01-30 15:58:31 -0500 |
commit | b02c6a831d484a095a1175c2d37a6f5aaba98e94 (patch) | |
tree | d0b7effb70830bf91a234ff125ea560b70f374f3 | |
download | minijail-b02c6a831d484a095a1175c2d37a6f5aaba98e94.tar.gz |
initial site for hosting on GitHub
-rw-r--r-- | _config.yml | 1 | ||||
-rw-r--r-- | index.md | 101 | ||||
-rw-r--r-- | minijail0.1.md | 418 | ||||
-rw-r--r-- | minijail0.5.md | 180 |
4 files changed, 700 insertions, 0 deletions
diff --git a/_config.yml b/_config.yml new file mode 100644 index 0000000..8bbf794 --- /dev/null +++ b/_config.yml @@ -0,0 +1 @@ +theme: jekyll-theme-hacker diff --git a/index.md b/index.md new file mode 100644 index 0000000..0f97c9c --- /dev/null +++ b/index.md @@ -0,0 +1,101 @@ +## About + +Minijail is a sandboxing and containment tool used in Chrome OS and Android. +It provides an executable that can be used to launch and sandbox other programs, +and a library that can be used by code to sandbox itself. + +## Sites + +The Minijail homepage:<br/> +<https://google.github.io/minijail/> + +The main repo:<br/> +<https://android.googlesource.com/platform/external/minijail/> + +With a read-only mirror for people to fork:<br/> +<https://github.com/google/minijail/> + +There might be other copies floating around, but those are the official ones! + +## Getting the code + +### Releases + +Releases are tagged as `linux-vXX`:<br/> +<https://github.com/google/minijail/releases> + +### Latest Development + +You're one `git clone` away from happiness. + +``` +$ git clone https://android.googlesource.com/platform/external/minijail +$ cd minijail +``` + +## Documentation + +Check out the [minijail0(1)](./minijail0.1) and [minijail0(5)](./minijail0.5) +online man pages for more details about using Minijail. + +See the [tools/README.md](https://github.com/google/minijail/blob/master/tools/README.md) +document for info about extra tools we provide to help with development. + +The following talk serves as a good introduction to Minijail and how it can be used. +[video](https://drive.google.com/file/d/0BwPS_JpKyELWZTFBcTVsa1hhYjA/preview) +[slides](https://docs.google.com/presentation/d/1r6LpvDZtYrsl7ryOV4HtpUR-phfCLRL6PA-chcL1Kno/present) + +The Chromium OS project has a +[comprehensive sandboxing guide](https://chromium.googlesource.com/chromiumos/docs/+/master/sandboxing.md) +that is largely based on Minijail. + +## Building + +Just run `make` and you're good to go! + +If that doesn't work out, please see the +[HACKING.md](https://github.com/google/minijail/blob/master/HACKING.md) +document for more details. + +## Examples + +Here's a few simple examples. +Check out the docs above for way more in-depth use. + +### Change root to any user + +``` +# id +uid=0(root) gid=0(root) groups=0(root),128(pkcs11) +# minijail0 -u jorgelo -g 5000 /usr/bin/id +uid=72178(jorgelo) gid=5000(eng) groups=5000(eng) +``` + +### Drop root while keeping some capabilities + +``` +# minijail0 -u jorgelo -c 3000 -- /bin/cat /proc/self/status +Name: cat +... +CapInh: 0000000000003000 +CapPrm: 0000000000003000 +CapEff: 0000000000003000 +CapBnd: 0000000000003000 +``` + +## Contact + +We've got a couple of contact points. + +* [minijail@chromium.org](https://groups.google.com/a/chromium.org/forum/#!forum/minijail): + Public user & developer mailing list. +* [minijail-users@google.com](https://groups.google.com/a/google.com/forum/#!forum/minijail-users): + Internal Google user mailing list. +* [minijail-dev@google.com](https://groups.google.com/a/google.com/forum/#!forum/minijail-dev): + Internal Google developer mailing list. +* [crbug.com/list](https://crbug.com/?q=component:OS>Systems>Minijail): + Existing bug reports & feature requests. +* [crbug.com/new](https://bugs.chromium.org/p/chromium/issues/entry?components=OS>Systems>Minijail): + File new bug reports & feature requests. +* [AOSP Gerrit](https://android-review.googlesource.com/q/project:platform/external/minijail): + Code reviews. diff --git a/minijail0.1.md b/minijail0.1.md new file mode 100644 index 0000000..a18a70d --- /dev/null +++ b/minijail0.1.md @@ -0,0 +1,418 @@ +# minijail0(1): sandbox a process + + - [Synopsis](#synopsis) + - [Description](#description) + - [Sandboxing Profiles](#sandboxing-profiles) + - [Implementation](#implementation) + - [Author](#author) + - [Copyright](#copyright) + - [See Also](#see-also) + +## Synopsis + +**minijail0** \[*OPTION*\]... \<*PROGRAM*\> \[*args*\]... + +## Description + +Runs PROGRAM inside a sandbox. + + - **\-a \<table\>** + Run using the alternate syscall table named *table*. Only available + on kernels and architectures that support the **PR\_ALT\_SYSCALL** + option of + [**prctl**(2)](http://man7.org/linux/man-pages/man2/prctl.2.html). + + - **\-b \<src\>\[,\<dest\>\[,\<writeable\>\]\]** + Bind-mount *src* into the chroot directory at *dest*, optionally + writeable. The *src* path must be an absolute path. If *dest* is not + specified, it will default to *src*. If the destination does not + exist, it will be created as a file or directory based on the *src* + type (including missing parent directories). To create a writable + bind-mount set *writable* to **1**. If not specified it will default + to **0** (read-only). + + - **\-B \<mask\>** + Skip setting securebits in *mask* when restricting capabilities + (**\-c**). *mask* is a hex constant that represents the mask of + securebits that will be preserved. See + [**capabilities**(7)](http://man7.org/linux/man-pages/man7/capabilities.7.html) + for the complete list. By default, **SECURE\_NOROOT**, + **SECURE\_NO\_SETUID\_FIXUP**, and **SECURE\_KEEP\_CAPS** (together + with their respective locks) are set. + **SECBIT\_NO\_CAP\_AMBIENT\_RAISE** (and its respective lock) is + never set because the permitted and inheritable capability sets have + already been set through **\-c**. + + - **\-c \<caps\>** + Restrict capabilities to *caps*, which is either a hex constant or a + string that will be passed to + [**cap\_from\_text**(3)](http://man7.org/linux/man-pages/man3/cap_from_text.3.html) + (only the effective capability mask will be considered). The value + will be used as the permitted, effective, and inheritable sets. When + used in conjunction with **\-u** and **\-g**, this allows a program + to have access to only certain parts of root's default privileges + while running as another user and group ID altogether. Note that + these capabilities are not inherited by subprocesses of the process + given capabilities unless those subprocesses have POSIX file + capabilities or the **\-\-ambient** flag is also passed. See + [**capabilities**(7)](http://man7.org/linux/man-pages/man7/capabilities.7.html). + + - **\-C \<dir\>** + Change root (using + [**chroot**(2)](http://man7.org/linux/man-pages/man2/chroot.2.html)) + to *dir*. + + - **\-d**, **\-\-mount\-dev** + Create a new /dev mount with a minimal set of nodes. Implies + **\-v**. Additional nodes can be bound with the **\-b** or **\-k** + options. The initial set of nodes are: full null tty urandom zero. + Symlinks are also created for: fd ptmx stderr stdin stdout. + + - **\-e[file]** + Enter a new network namespace, or if *file* is specified, enter an + existing network namespace specified by *file* which is typically of + the form /proc/\<pid\>/ns/net. + + - **\-f \<file\>** + Write the pid of the jailed process to *file*. + + - **\-g \<group\>** + Change groups to *group*, which may be either a group name or a + numeric group ID. + + - **\-G** + Inherit all the supplementary groups of the user specified with + **\-u**. It is an error to use this option without having specified + a **user name** to **\-u**. + + - **\-h** + Print a help message. + + - **\-H** + Print a help message detailing supported system call names for + seccomp\_filter. (Other direct numbers may be specified if minijail0 + is not in sync with the host kernel or something like 32/64-bit + compatibility issues exist.) + + - **\-i** + Exit immediately after + [**fork**(2)](http://man7.org/linux/man-pages/man2/fork.2.html). The + jailed process will keep running in the background. + +Normally minijail will fork+exec the specified *program* so that it can +set up the right security settings in the new child process. The initial +minijail process will stay resident and wait for the *program* to exit +so the script that ran minijail will correctly block (e.g. standalone +scripts). Specifying **\-i** makes that initial process exit immediately +and free up the resources. + +This option is recommended for daemons and init services when you want +to background the long running *program*. + + - **\-I** + Run *program* as init (pid 1) inside a new pid namespace (implies + **\-p**). + +Most programs don't expect to run as an init which is why minijail will +do it for you by default. Basically, the *program* needs to reap any +processes it forks to avoid leaving zombies behind. Signal handling +needs care since the kernel will mask all signals that don't have +handlers registered (all default handlers are ignored and cannot be +changed). + +This means a minijail process (acting as init) will remain resident by +default. While using **\-I** is recommended when possible, strict review +is required to make sure the *program* continues to work as expected. + +**\-i** and **\-I** may be safely used together. The **\-i** option +controls the first minijail process outside of the pid namespace while +the **\-I** option controls the minijail process inside of the pid +namespace. + + - **\-k \<src\>,\<dest\>,\<type\>\[,\<flags\>\[,\<data\>\]\]** + Mount *src*, a *type* filesystem, at *dest*. If a chroot or pivot + root is active, *dest* will automatically be placed below that path. + +The *flags* field is optional and may be a mix of *MS\_XXX* or hex +constants separated by *|* characters. See +[**mount**(2)](http://man7.org/linux/man-pages/man2/mount.2.html) for +details. *MS\_NODEV|MS\_NOSUID|MS\_NOEXEC* is the default value (a +writable mount with nodev/nosuid/noexec bits set), and it is strongly +recommended that all mounts have these three bits set whenever possible. +If you need to disable all three, then specify something like +*MS\_SILENT*. + +The *data* field is optional and is a comma delimited string (see +[**mount**(2)](http://man7.org/linux/man-pages/man2/mount.2.html) for +details). It is passed directly to the kernel, so all fields here are +filesystem specific. For *tmpfs*, if no data is specified, we will +default to *mode=0755,size=10M*. If you want other settings, you will +need to specify them explicitly yourself. + +If the mount is not a pseudo filesystem (e.g. proc or sysfs), *src* path +must be an absolute path (e.g. */dev/sda1* and not *sda1*). + +If the destination does not exist, it will be created as a directory +(including missing parent directories). + + - **\-K[mode]** + Don't mark all existing mounts as MS\_PRIVATE. This option is + **dangerous** as it negates most of the functionality of **\-v**. + You very likely don't need this. + +You may specify a mount propagation mode in which case, that will be +used instead of the default MS\_PRIVATE. See the +[**mount**(2)](http://man7.org/linux/man-pages/man2/mount.2.html) man +page and the kernel docs *Documentation/filesystems/sharedsubtree.txt* +for more technical details, but a brief guide: + +> · **slave** Changes in the parent mount namespace will propagate in, +> but changes in this mount namespace will not propagate back out. This +> is usually what people want to use. +> +> · **private** No changes in either mount namespace will propagate. +> This is the default behavior if you don't specify **\-K**. +> +> · **shared** Changes in the parent and this mount namespace will +> freely propagate back and forth. This is not recommended. +> +> · **unbindable** Mark all mounts as unbindable. + + - **\-l** + Run inside a new IPC namespace. This option makes the program's + System V IPC namespace independent. + + - **\-L** + Report blocked syscalls when using a seccomp filter. On kernels with + support for SECCOMP\_RET\_LOG, every blocked syscall will be + reported through the audit subsystem (see + [**seccomp**(2)](http://man7.org/linux/man-pages/man2/seccomp.2.html) + for more details on SECCOMP\_RET\_LOG availability.) On all other + kernels, the first failing syscall will be logged to syslog. This + latter case will also force certain syscalls to be allowed in order + to write to syslog. Note: this option is disabled and ignored for + release builds. + + - **\-m[<uid> \<loweruid\> \<count\>\[,\<uid\> \<loweruid\> + \<count\>\]\]** + Set the uid mapping of a user namespace (implies **\-pU**). Same + arguments as + [**newuidmap**(1)](http://man7.org/linux/man-pages/man1/newuidmap.1.html). + Multiple mappings should be separated by ','. With no mapping, map + the current uid to root inside the user namespace. + + - **\-M[<uid> \<loweruid\> \<count\>\[,\<uid\> \<loweruid\> + \<count\>\]\]** + Set the gid mapping of a user namespace (implies **\-pU**). Same + arguments as + [**newgidmap**(1)](http://man7.org/linux/man-pages/man1/newgidmap.1.html). + Multiple mappings should be separated by ','. With no mapping, map + the current gid to root inside the user namespace. + + - **\-n** + Set the process's *no\_new\_privs* bit. See + [**prctl**(2)](http://man7.org/linux/man-pages/man2/prctl.2.html) + and the kernel source file *Documentation/prctl/no\_new\_privs.txt* + for more info. + + - **\-N** + Run inside a new cgroup namespace. This option runs the program with + a cgroup view showing the program's cgroup as the root. This is only + available on v4.6+ of the Linux kernel. + + - **\-p** + Run inside a new PID namespace. This option will make it impossible + for the program to see or affect processes that are not its + descendants. This implies **\-v** and **\-r**, since otherwise the + process can see outside its namespace by inspecting /proc. + +If the *program* exits, all of its children will be killed immediately +by the kernel. If you need to daemonize or background things, use the +**\-i** option. + +See +[**pid\_namespaces**(7)](http://man7.org/linux/man-pages/man7/pid_namespaces.7.html) +for more info. + + - **\-P \<dir\>** + Set *dir* as the root fs using **pivot\_root**. Implies **\-v**, not + compatible with **\-C**. + + - **\-r** + Remount /proc readonly. This implies **\-v**. Remounting /proc + readonly means that even if the process has write access to a system + config knob in /proc (e.g., in /sys/kernel), it cannot change the + value. + + - **\-R \<rlim\_type\>,\<rlim\_cur\>,\<rlim\_max\>** + Set an rlimit value, see + [**getrlimit**(2)](http://man7.org/linux/man-pages/man2/getrlimit.2.html) + for more details. + +*rlim\_type* may be specified using symbolic constants like +*RLIMIT\_AS*. + +*rlim\_cur* and *rlim\_max* are specified either with a number (decimal +or hex starting with *0x*), or with the string *unlimited* (which will +translate to *RLIM\_INFINITY*). + + - **\-s** + Enable + [**seccomp**(2)](http://man7.org/linux/man-pages/man2/seccomp.2.html) + in mode 1, which restricts the child process to a very small set of + system calls. You most likely do not want to use this with the + seccomp filter mode (**\-S**) as they are completely different (even + though they have similar names). + + - **\-S \<arch-specific seccomp\_filter policy file\>** + Enable + [**seccomp**(2)](http://man7.org/linux/man-pages/man2/seccomp.2.html) + in mode 13 which restricts the child process to a set of system + calls defined in the policy file. Note that system call names may be + different based on the runtime environment; see + [**minijail0**(5)](./minijail0.5) + for more details. + + - **\-t[size]** + Mounts a tmpfs filesystem on /tmp. /tmp must exist already (e.g. in + the chroot). The filesystem has a default size of "64M", overridden + with an optional argument. It has standard /tmp permissions (1777), + and is mounted nodev/noexec/nosuid. Implies **\-v**. + + - **\-T \<type\>** + Assume binary's ELF linkage type is *type*, which must be either + 'static' or 'dynamic'. Either setting will prevent minijail0 from + manually parsing the ELF header to determine the type. Type 'static' + can be used to avoid preload hooking, and will force minijail0 to + instead set everything up before the program is executed. Type + 'dynamic' will force minijail0 to preload *libminijailpreload.so* to + setup hooks, but will fail on actually statically-linked binaries. + + - **\-u \<user\>** + Change users to *user*, which may be either a user name or a numeric + user ID. + + - **\-U** + Enter a new user namespace (implies **\-p**). + + - **\-v** + Run inside a new VFS namespace. This option makes the program's + mountpoints independent of the rest of the system's. + + - **\-V \<file\>** + Enter the VFS namespace specified by *file*. + + - **\-w** + Create and join a new anonymous session keyring. See + [**keyrings**(7)](http://man7.org/linux/man-pages/man7/keyrings.7.html) + for more details. + + - **\-y** + Keep the current user's supplementary groups. + + - **\-Y** + Synchronize seccomp filters across thread group. + + - **\-z** + Don't forward any signals to the jailed process. For example, when + not using **\-i**, sending **SIGINT** (e.g., CTRL-C on the + terminal), will kill the minijail0 process, not the jailed process. + + - **\-\-ambient** + Raise ambient capabilities to match the mask specified by **\-c**. + Since ambient capabilities are preserved across + [**execve**(2)](http://man7.org/linux/man-pages/man2/execve.2.html), + this allows for process trees to have a restricted set of + capabilities, even if they are capability-dumb binaries. See + [**capabilities**(7)](http://man7.org/linux/man-pages/man7/capabilities.7.html). + + - **\-\-uts[=hostname]** + Create a new UTS/hostname namespace, and optionally set the hostname + in the new namespace to *hostname*. + + - **\-\-logging=<system>** + Use *system* as the logging system. *system* must be one of **auto** + (the default), **syslog**, or **stderr**. + +**auto** will use **stderr** if connected to a tty (e.g. run directly by +a user), otherwise it will use **syslog**. + + - **\-\-profile \<profile\>** + Choose from one of the available sandboxing profiles, which are + simple way to get a standardized environment. See the **SANDBOXING + PROFILES** section below for the full list of supported values for + *profile*. + + - **\-\-preload\-library \<file path\>** + Allows overriding the default path of */lib/libminijailpreload.so*. + This is only really useful for testing. **\-\-seccomp\-bpf\-binary + \<arch-specific BPF binary\>** This is similar to **\-S**, but + instead of using a policy file, **\-\-secomp\-bpf\-binary** expects + a arch-and-kernel-version-specific pre-compiled BPF binary (such as + the ones produced by **parse\_seccomp\_policy**). Note that the + filter might be different based on the runtime environment; see + [**minijail0**(5)](./minijail0.5) + for more details. + +## Sandboxing Profiles + +The following sandboxing profiles are supported: + + - **minimalistic-mountns** + Set up a minimalistic mount namespace. Equivalent to **\-v \-P + /var/empty** -b / -b /proc -b /dev/log -t -r --mount-dev. + +## Implementation + +This program is broken up into two parts: **minijail0** (the frontend) +and a helper library called **libminijailpreload**. Some jailings can +only be achieved from the process to which they will actually apply: + +> · capability use (without using ambient capabilities): non-ambient +> capabilities are not inherited across +> [**execve**(2)](http://man7.org/linux/man-pages/man2/execve.2.html) +> unless the file being executed has POSIX file capabilities. Ambient +> capabilities (the **\-\-ambient** flag) fix capability inheritance +> across +> [**execve**(2)](http://man7.org/linux/man-pages/man2/execve.2.html) to +> avoid the need for file capabilities. + +· seccomp: a meaningful seccomp filter policy should disallow +[**execve**(2)](http://man7.org/linux/man-pages/man2/execve.2.html), to +prevent a compromised process from executing a different binary. +However, this would prevent the seccomp policy from being applied before +[**execve**(2)](http://man7.org/linux/man-pages/man2/execve.2.html). + +To this end, **libminijailpreload** is forcibly loaded into all +dynamically-linked target programs by default; we pass the specific +restrictions in an environment variable which the preloaded library +looks for. The forcibly-loaded library then applies the restrictions to +the newly-loaded program. + +This behavior can be disabled by the use of the **\-T static** flag. +There are other cases in which the use of this flag might be useful: + +> · When *program* is linked against a different version of **libc.so** +> than **libminijailpreload.so**. + +· When +[**execve**(2)](http://man7.org/linux/man-pages/man2/execve.2.html) has +side-effects that interact badly with the jailing process. If the system +uses SELinux, +[**execve**(2)](http://man7.org/linux/man-pages/man2/execve.2.html) can +cause an automatic domain transition, which would then require that the +target domain allows the operations to jail *program*. + +## Author + +The Chromium OS Authors \<chromiumos-dev@chromium.org\> + +## Copyright + +Copyright © 2011 The Chromium OS Authors License BSD-like. + +## See Also + +**libminijail.h**, +[**minijail0**(5)](./minijail0.5), +[**seccomp**(2)](http://man7.org/linux/man-pages/man2/seccomp.2.html) diff --git a/minijail0.5.md b/minijail0.5.md new file mode 100644 index 0000000..2a1eb0d --- /dev/null +++ b/minijail0.5.md @@ -0,0 +1,180 @@ +# minijail0(5): sandbox a process + + - [Description](#description) + - [Examples](#examples) + - [Seccomp\_Filter Policy](#seccomp_filter-policy) + - [Seccomp\_Filter Syntax](#seccomp_filter-syntax) + - [Atom Syntax](#atom-syntax) + - [Return Values](#return-values) + - [Seccomp\_Filter Policy Writing](#seccomp_filter-policy-writing) + - [Author](#author) + - [Copyright](#copyright) + - [See Also](#see-also) + +## Description + +Runs PROGRAM inside a sandbox. See +[**minijail0**(1)](./minijail0.1) +for details. + +## Examples + +Safely switch from root to nobody while dropping all capabilities and +inheriting any groups from nobody: + +\# minijail0 -c 0 -G -u nobody /usr/bin/whoami nobody + +Run in a PID and VFS namespace without superuser capabilities (but still +as root) and with a private view of /proc: + +\# minijail0 -p -v -r -c 0 /bin/ps PID TTY TIME CMD 1 pts/0 00:00:00 +minijail0 2 pts/0 00:00:00 ps + +Running a process with a seccomp filter policy at reduced privileges: + +\# minijail0 -S /usr/share/minijail0/$(uname -m)/cat.policy -- \\ +/bin/cat /proc/self/seccomp\_filter ... + +## Seccomp\_Filter Policy + +The policy file supplied to the **\-S** argument supports the following +syntax: + +**\<syscall\_name\>**:**\<ftrace filter policy\>** +**\<syscall\_number\>**:**\<ftrace filter policy\>** **\<empty line\>** +**\# any single line comment** + +Long lines may be broken up using \\ at the end. + +A policy that emulates +[**seccomp**(2)](http://man7.org/linux/man-pages/man2/seccomp.2.html) in +mode 1 may look like: read: 1 write: 1 sig\_return: 1 exit: 1 + +The "1" acts as a wildcard and allows any use of the mentioned system +call. More advanced filtering is possible if your kernel supports +CONFIG\_FTRACE\_SYSCALLS. For example, we can allow a process to open +any file read only and mmap PROT\_READ only: + +\# open with O\_LARGEFILE|O\_RDONLY|O\_NONBLOCK or some combination +open: arg1 == 32768 || arg1 == 0 || arg1 == 34816 || arg1 == 2048 mmap2: +arg2 == 0x0 munmap: 1 close: 1 + +The supported arguments may be found by reviewing the system call +prototypes in the Linux kernel source code. Be aware that any +non-numeric comparison may be subject to time-of-check-time-of-use +attacks and cannot be considered safe. + +**execve** may only be used when invoking with CAP\_SYS\_ADMIN +privileges. + +In order to promote reusability, policy files can include other policy +files using the following syntax: + +**@include /absolute/path/to/file.policy** **@include +./path/relative/to/CWD/file.policy** + +Inclusion is limited to a single level (i.e. files that are +**@include**d cannot themselves **@include** more files), since that +makes the policies harder to understand. + +## Seccomp\_Filter Syntax + +More formally, the expression after the colon can be an expression in +Disjunctive Normal Form (DNF): a disjunction ("or", *||*) of +conjunctions ("and", *&&*) of atoms. + +### Atom Syntax + +Atoms are of the form *arg{DNUM} {OP} {VAL}* where: + +> · *DNUM* is a decimal number + +· *OP* is an unsigned comparison operator: *==*, *\!=*, *\<*, *\<=*, +*\>*, *\>=*, *&* (flags set), or *in* (inclusion) + +· AL is a constant expression. It can be a named constant (like +**O\_RDONLY**), a number (octal, decimal, or hexadecimal), a mask of +constants separated by *|*, or a parenthesized constant expression. +Constant expressions can also be prefixed with the bitwise complement +operator *\~* to produce their complement. + +*==*, *\!=*, *\<*, *\<=*, *\>*, and *\>=* should be pretty self +explanatory. + +*&* will test for a flag being set, for example, O\_RDONLY for +[**open**(2)](http://man7.org/linux/man-pages/man2/open.2.html): + +open: arg1 & O\_RDONLY + +Minijail supports most common named constants, like O\_RDONLY. It's +preferable to use named constants rather than numeric values as not all +architectures use the same numeric value. + +When the possible combinations of allowed flags grow, specifying them +all can be cumbersome. This is where the *in* operator comes handy. The +system call will be allowed iff the flags set in the argument are +included (as a set) in the flags in the policy: + +mmap: arg3 in MAP\_PRIVATE|MAP\_ANONYMOUS + +This will allow +[**mmap**(2)](http://man7.org/linux/man-pages/man2/mmap.2.html) as long +as *arg3* (flags) has any combination of MAP\_PRIVATE and +MAP\_ANONYMOUS, but nothing else. One common use of this is to restrict +[**mmap**(2)](http://man7.org/linux/man-pages/man2/mmap.2.html) / +[**mprotect**(2)](http://man7.org/linux/man-pages/man2/mprotect.2.html) +to only allow write^exec mappings: + +mmap: arg2 in \~PROT\_EXEC || arg2 in \~PROT\_WRITE mprotect: arg2 in +\~PROT\_EXEC || arg2 in \~PROT\_WRITE + +### Return Values + +By default, blocked syscalls call the process to be killed. The *return +{NUM}* syntax can be used to force a specific errno to be returned +instead. + +read: return EBADF + +This expression will block the +[**read**(2)](http://man7.org/linux/man-pages/man2/read.2.html) syscall, +make it return -1, and set **errno** to EBADF (9 on x86 platforms). + +An expression can also include an optional *return \<errno\>* clause, +separated by a semicolon: + +read: arg0 == 0; return EBADF + +This is, if the first argument to read is 0, then allow the syscall; +else, block the syscall, return -1, and set **errno** to EBADF. + +## Seccomp\_Filter Policy Writing + +Determining policy for seccomp\_filter can be time consuming. System +calls are often named in arch-specific, or legacy tainted, ways. E.g., +geteuid versus geteuid32. On process death due to a seccomp filter rule, +the offending system call number will be supplied with a best guess of +the ABI defined name. This information may be used to produce working +baseline policies. However, if the process being contained has a fairly +tight working domain, using **tools/generate\_seccomp\_policy.py** with +the output of **strace \-f \-e raw=all \<program\>** can generate the +list of system calls that are needed. Note that when using libminijail +or minijail with preloading, supporting initial process setup calls will +not be required. Be conservative. + +It's also possible to analyze the binary checking for all non-dead +functions and determining if any of them issue system calls. There is no +active implementation for this, but something like +code.google.com/p/seccompsandbox is one possible runtime variant. + +## Author + +The Chromium OS Authors \<chromiumos-dev@chromium.org\> + +## Copyright + +Copyright © 2011 The Chromium OS Authors License BSD-like. + +## See Also + +[**minijail0**(1)](./minijail0.1) |