aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMike Frysinger <vapier@google.com>2020-01-30 00:08:48 -0500
committerMike Frysinger <vapier@google.com>2020-01-30 15:58:31 -0500
commitb02c6a831d484a095a1175c2d37a6f5aaba98e94 (patch)
treed0b7effb70830bf91a234ff125ea560b70f374f3
downloadminijail-b02c6a831d484a095a1175c2d37a6f5aaba98e94.tar.gz
initial site for hosting on GitHub
-rw-r--r--_config.yml1
-rw-r--r--index.md101
-rw-r--r--minijail0.1.md418
-rw-r--r--minijail0.5.md180
4 files changed, 700 insertions, 0 deletions
diff --git a/_config.yml b/_config.yml
new file mode 100644
index 0000000..8bbf794
--- /dev/null
+++ b/_config.yml
@@ -0,0 +1 @@
+theme: jekyll-theme-hacker
diff --git a/index.md b/index.md
new file mode 100644
index 0000000..0f97c9c
--- /dev/null
+++ b/index.md
@@ -0,0 +1,101 @@
+## About
+
+Minijail is a sandboxing and containment tool used in Chrome OS and Android.
+It provides an executable that can be used to launch and sandbox other programs,
+and a library that can be used by code to sandbox itself.
+
+## Sites
+
+The Minijail homepage:<br/>
+<https://google.github.io/minijail/>
+
+The main repo:<br/>
+<https://android.googlesource.com/platform/external/minijail/>
+
+With a read-only mirror for people to fork:<br/>
+<https://github.com/google/minijail/>
+
+There might be other copies floating around, but those are the official ones!
+
+## Getting the code
+
+### Releases
+
+Releases are tagged as `linux-vXX`:<br/>
+<https://github.com/google/minijail/releases>
+
+### Latest Development
+
+You're one `git clone` away from happiness.
+
+```
+$ git clone https://android.googlesource.com/platform/external/minijail
+$ cd minijail
+```
+
+## Documentation
+
+Check out the [minijail0(1)](./minijail0.1) and [minijail0(5)](./minijail0.5)
+online man pages for more details about using Minijail.
+
+See the [tools/README.md](https://github.com/google/minijail/blob/master/tools/README.md)
+document for info about extra tools we provide to help with development.
+
+The following talk serves as a good introduction to Minijail and how it can be used.
+[video](https://drive.google.com/file/d/0BwPS_JpKyELWZTFBcTVsa1hhYjA/preview)
+[slides](https://docs.google.com/presentation/d/1r6LpvDZtYrsl7ryOV4HtpUR-phfCLRL6PA-chcL1Kno/present)
+
+The Chromium OS project has a
+[comprehensive sandboxing guide](https://chromium.googlesource.com/chromiumos/docs/+/master/sandboxing.md)
+that is largely based on Minijail.
+
+## Building
+
+Just run `make` and you're good to go!
+
+If that doesn't work out, please see the
+[HACKING.md](https://github.com/google/minijail/blob/master/HACKING.md)
+document for more details.
+
+## Examples
+
+Here's a few simple examples.
+Check out the docs above for way more in-depth use.
+
+### Change root to any user
+
+```
+# id
+uid=0(root) gid=0(root) groups=0(root),128(pkcs11)
+# minijail0 -u jorgelo -g 5000 /usr/bin/id
+uid=72178(jorgelo) gid=5000(eng) groups=5000(eng)
+```
+
+### Drop root while keeping some capabilities
+
+```
+# minijail0 -u jorgelo -c 3000 -- /bin/cat /proc/self/status
+Name: cat
+...
+CapInh: 0000000000003000
+CapPrm: 0000000000003000
+CapEff: 0000000000003000
+CapBnd: 0000000000003000
+```
+
+## Contact
+
+We've got a couple of contact points.
+
+* [minijail@chromium.org](https://groups.google.com/a/chromium.org/forum/#!forum/minijail):
+ Public user & developer mailing list.
+* [minijail-users@google.com](https://groups.google.com/a/google.com/forum/#!forum/minijail-users):
+ Internal Google user mailing list.
+* [minijail-dev@google.com](https://groups.google.com/a/google.com/forum/#!forum/minijail-dev):
+ Internal Google developer mailing list.
+* [crbug.com/list](https://crbug.com/?q=component:OS>Systems>Minijail):
+ Existing bug reports & feature requests.
+* [crbug.com/new](https://bugs.chromium.org/p/chromium/issues/entry?components=OS>Systems>Minijail):
+ File new bug reports & feature requests.
+* [AOSP Gerrit](https://android-review.googlesource.com/q/project:platform/external/minijail):
+ Code reviews.
diff --git a/minijail0.1.md b/minijail0.1.md
new file mode 100644
index 0000000..a18a70d
--- /dev/null
+++ b/minijail0.1.md
@@ -0,0 +1,418 @@
+# minijail0(1): sandbox a process
+
+ - [Synopsis](#synopsis)
+ - [Description](#description)
+ - [Sandboxing Profiles](#sandboxing-profiles)
+ - [Implementation](#implementation)
+ - [Author](#author)
+ - [Copyright](#copyright)
+ - [See Also](#see-also)
+
+## Synopsis
+
+**minijail0** \[*OPTION*\]... \<*PROGRAM*\> \[*args*\]...
+
+## Description
+
+Runs PROGRAM inside a sandbox.
+
+ - **\-a \<table\>**
+ Run using the alternate syscall table named *table*. Only available
+ on kernels and architectures that support the **PR\_ALT\_SYSCALL**
+ option of
+ [**prctl**(2)](http://man7.org/linux/man-pages/man2/prctl.2.html).
+
+ - **\-b \<src\>\[,\<dest\>\[,\<writeable\>\]\]**
+ Bind-mount *src* into the chroot directory at *dest*, optionally
+ writeable. The *src* path must be an absolute path. If *dest* is not
+ specified, it will default to *src*. If the destination does not
+ exist, it will be created as a file or directory based on the *src*
+ type (including missing parent directories). To create a writable
+ bind-mount set *writable* to **1**. If not specified it will default
+ to **0** (read-only).
+
+ - **\-B \<mask\>**
+ Skip setting securebits in *mask* when restricting capabilities
+ (**\-c**). *mask* is a hex constant that represents the mask of
+ securebits that will be preserved. See
+ [**capabilities**(7)](http://man7.org/linux/man-pages/man7/capabilities.7.html)
+ for the complete list. By default, **SECURE\_NOROOT**,
+ **SECURE\_NO\_SETUID\_FIXUP**, and **SECURE\_KEEP\_CAPS** (together
+ with their respective locks) are set.
+ **SECBIT\_NO\_CAP\_AMBIENT\_RAISE** (and its respective lock) is
+ never set because the permitted and inheritable capability sets have
+ already been set through **\-c**.
+
+ - **\-c \<caps\>**
+ Restrict capabilities to *caps*, which is either a hex constant or a
+ string that will be passed to
+ [**cap\_from\_text**(3)](http://man7.org/linux/man-pages/man3/cap_from_text.3.html)
+ (only the effective capability mask will be considered). The value
+ will be used as the permitted, effective, and inheritable sets. When
+ used in conjunction with **\-u** and **\-g**, this allows a program
+ to have access to only certain parts of root's default privileges
+ while running as another user and group ID altogether. Note that
+ these capabilities are not inherited by subprocesses of the process
+ given capabilities unless those subprocesses have POSIX file
+ capabilities or the **\-\-ambient** flag is also passed. See
+ [**capabilities**(7)](http://man7.org/linux/man-pages/man7/capabilities.7.html).
+
+ - **\-C \<dir\>**
+ Change root (using
+ [**chroot**(2)](http://man7.org/linux/man-pages/man2/chroot.2.html))
+ to *dir*.
+
+ - **\-d**, **\-\-mount\-dev**
+ Create a new /dev mount with a minimal set of nodes. Implies
+ **\-v**. Additional nodes can be bound with the **\-b** or **\-k**
+ options. The initial set of nodes are: full null tty urandom zero.
+ Symlinks are also created for: fd ptmx stderr stdin stdout.
+
+ - **\-e[file]**
+ Enter a new network namespace, or if *file* is specified, enter an
+ existing network namespace specified by *file* which is typically of
+ the form /proc/\<pid\>/ns/net.
+
+ - **\-f \<file\>**
+ Write the pid of the jailed process to *file*.
+
+ - **\-g \<group\>**
+ Change groups to *group*, which may be either a group name or a
+ numeric group ID.
+
+ - **\-G**
+ Inherit all the supplementary groups of the user specified with
+ **\-u**. It is an error to use this option without having specified
+ a **user name** to **\-u**.
+
+ - **\-h**
+ Print a help message.
+
+ - **\-H**
+ Print a help message detailing supported system call names for
+ seccomp\_filter. (Other direct numbers may be specified if minijail0
+ is not in sync with the host kernel or something like 32/64-bit
+ compatibility issues exist.)
+
+ - **\-i**
+ Exit immediately after
+ [**fork**(2)](http://man7.org/linux/man-pages/man2/fork.2.html). The
+ jailed process will keep running in the background.
+
+Normally minijail will fork+exec the specified *program* so that it can
+set up the right security settings in the new child process. The initial
+minijail process will stay resident and wait for the *program* to exit
+so the script that ran minijail will correctly block (e.g. standalone
+scripts). Specifying **\-i** makes that initial process exit immediately
+and free up the resources.
+
+This option is recommended for daemons and init services when you want
+to background the long running *program*.
+
+ - **\-I**
+ Run *program* as init (pid 1) inside a new pid namespace (implies
+ **\-p**).
+
+Most programs don't expect to run as an init which is why minijail will
+do it for you by default. Basically, the *program* needs to reap any
+processes it forks to avoid leaving zombies behind. Signal handling
+needs care since the kernel will mask all signals that don't have
+handlers registered (all default handlers are ignored and cannot be
+changed).
+
+This means a minijail process (acting as init) will remain resident by
+default. While using **\-I** is recommended when possible, strict review
+is required to make sure the *program* continues to work as expected.
+
+**\-i** and **\-I** may be safely used together. The **\-i** option
+controls the first minijail process outside of the pid namespace while
+the **\-I** option controls the minijail process inside of the pid
+namespace.
+
+ - **\-k \<src\>,\<dest\>,\<type\>\[,\<flags\>\[,\<data\>\]\]**
+ Mount *src*, a *type* filesystem, at *dest*. If a chroot or pivot
+ root is active, *dest* will automatically be placed below that path.
+
+The *flags* field is optional and may be a mix of *MS\_XXX* or hex
+constants separated by *|* characters. See
+[**mount**(2)](http://man7.org/linux/man-pages/man2/mount.2.html) for
+details. *MS\_NODEV|MS\_NOSUID|MS\_NOEXEC* is the default value (a
+writable mount with nodev/nosuid/noexec bits set), and it is strongly
+recommended that all mounts have these three bits set whenever possible.
+If you need to disable all three, then specify something like
+*MS\_SILENT*.
+
+The *data* field is optional and is a comma delimited string (see
+[**mount**(2)](http://man7.org/linux/man-pages/man2/mount.2.html) for
+details). It is passed directly to the kernel, so all fields here are
+filesystem specific. For *tmpfs*, if no data is specified, we will
+default to *mode=0755,size=10M*. If you want other settings, you will
+need to specify them explicitly yourself.
+
+If the mount is not a pseudo filesystem (e.g. proc or sysfs), *src* path
+must be an absolute path (e.g. */dev/sda1* and not *sda1*).
+
+If the destination does not exist, it will be created as a directory
+(including missing parent directories).
+
+ - **\-K[mode]**
+ Don't mark all existing mounts as MS\_PRIVATE. This option is
+ **dangerous** as it negates most of the functionality of **\-v**.
+ You very likely don't need this.
+
+You may specify a mount propagation mode in which case, that will be
+used instead of the default MS\_PRIVATE. See the
+[**mount**(2)](http://man7.org/linux/man-pages/man2/mount.2.html) man
+page and the kernel docs *Documentation/filesystems/sharedsubtree.txt*
+for more technical details, but a brief guide:
+
+> · **slave** Changes in the parent mount namespace will propagate in,
+> but changes in this mount namespace will not propagate back out. This
+> is usually what people want to use.
+>
+> · **private** No changes in either mount namespace will propagate.
+> This is the default behavior if you don't specify **\-K**.
+>
+> · **shared** Changes in the parent and this mount namespace will
+> freely propagate back and forth. This is not recommended.
+>
+> · **unbindable** Mark all mounts as unbindable.
+
+ - **\-l**
+ Run inside a new IPC namespace. This option makes the program's
+ System V IPC namespace independent.
+
+ - **\-L**
+ Report blocked syscalls when using a seccomp filter. On kernels with
+ support for SECCOMP\_RET\_LOG, every blocked syscall will be
+ reported through the audit subsystem (see
+ [**seccomp**(2)](http://man7.org/linux/man-pages/man2/seccomp.2.html)
+ for more details on SECCOMP\_RET\_LOG availability.) On all other
+ kernels, the first failing syscall will be logged to syslog. This
+ latter case will also force certain syscalls to be allowed in order
+ to write to syslog. Note: this option is disabled and ignored for
+ release builds.
+
+ - **\-m[<uid> \<loweruid\> \<count\>\[,\<uid\> \<loweruid\>
+ \<count\>\]\]**
+ Set the uid mapping of a user namespace (implies **\-pU**). Same
+ arguments as
+ [**newuidmap**(1)](http://man7.org/linux/man-pages/man1/newuidmap.1.html).
+ Multiple mappings should be separated by ','. With no mapping, map
+ the current uid to root inside the user namespace.
+
+ - **\-M[<uid> \<loweruid\> \<count\>\[,\<uid\> \<loweruid\>
+ \<count\>\]\]**
+ Set the gid mapping of a user namespace (implies **\-pU**). Same
+ arguments as
+ [**newgidmap**(1)](http://man7.org/linux/man-pages/man1/newgidmap.1.html).
+ Multiple mappings should be separated by ','. With no mapping, map
+ the current gid to root inside the user namespace.
+
+ - **\-n**
+ Set the process's *no\_new\_privs* bit. See
+ [**prctl**(2)](http://man7.org/linux/man-pages/man2/prctl.2.html)
+ and the kernel source file *Documentation/prctl/no\_new\_privs.txt*
+ for more info.
+
+ - **\-N**
+ Run inside a new cgroup namespace. This option runs the program with
+ a cgroup view showing the program's cgroup as the root. This is only
+ available on v4.6+ of the Linux kernel.
+
+ - **\-p**
+ Run inside a new PID namespace. This option will make it impossible
+ for the program to see or affect processes that are not its
+ descendants. This implies **\-v** and **\-r**, since otherwise the
+ process can see outside its namespace by inspecting /proc.
+
+If the *program* exits, all of its children will be killed immediately
+by the kernel. If you need to daemonize or background things, use the
+**\-i** option.
+
+See
+[**pid\_namespaces**(7)](http://man7.org/linux/man-pages/man7/pid_namespaces.7.html)
+for more info.
+
+ - **\-P \<dir\>**
+ Set *dir* as the root fs using **pivot\_root**. Implies **\-v**, not
+ compatible with **\-C**.
+
+ - **\-r**
+ Remount /proc readonly. This implies **\-v**. Remounting /proc
+ readonly means that even if the process has write access to a system
+ config knob in /proc (e.g., in /sys/kernel), it cannot change the
+ value.
+
+ - **\-R \<rlim\_type\>,\<rlim\_cur\>,\<rlim\_max\>**
+ Set an rlimit value, see
+ [**getrlimit**(2)](http://man7.org/linux/man-pages/man2/getrlimit.2.html)
+ for more details.
+
+*rlim\_type* may be specified using symbolic constants like
+*RLIMIT\_AS*.
+
+*rlim\_cur* and *rlim\_max* are specified either with a number (decimal
+or hex starting with *0x*), or with the string *unlimited* (which will
+translate to *RLIM\_INFINITY*).
+
+ - **\-s**
+ Enable
+ [**seccomp**(2)](http://man7.org/linux/man-pages/man2/seccomp.2.html)
+ in mode 1, which restricts the child process to a very small set of
+ system calls. You most likely do not want to use this with the
+ seccomp filter mode (**\-S**) as they are completely different (even
+ though they have similar names).
+
+ - **\-S \<arch-specific seccomp\_filter policy file\>**
+ Enable
+ [**seccomp**(2)](http://man7.org/linux/man-pages/man2/seccomp.2.html)
+ in mode 13 which restricts the child process to a set of system
+ calls defined in the policy file. Note that system call names may be
+ different based on the runtime environment; see
+ [**minijail0**(5)](./minijail0.5)
+ for more details.
+
+ - **\-t[size]**
+ Mounts a tmpfs filesystem on /tmp. /tmp must exist already (e.g. in
+ the chroot). The filesystem has a default size of "64M", overridden
+ with an optional argument. It has standard /tmp permissions (1777),
+ and is mounted nodev/noexec/nosuid. Implies **\-v**.
+
+ - **\-T \<type\>**
+ Assume binary's ELF linkage type is *type*, which must be either
+ 'static' or 'dynamic'. Either setting will prevent minijail0 from
+ manually parsing the ELF header to determine the type. Type 'static'
+ can be used to avoid preload hooking, and will force minijail0 to
+ instead set everything up before the program is executed. Type
+ 'dynamic' will force minijail0 to preload *libminijailpreload.so* to
+ setup hooks, but will fail on actually statically-linked binaries.
+
+ - **\-u \<user\>**
+ Change users to *user*, which may be either a user name or a numeric
+ user ID.
+
+ - **\-U**
+ Enter a new user namespace (implies **\-p**).
+
+ - **\-v**
+ Run inside a new VFS namespace. This option makes the program's
+ mountpoints independent of the rest of the system's.
+
+ - **\-V \<file\>**
+ Enter the VFS namespace specified by *file*.
+
+ - **\-w**
+ Create and join a new anonymous session keyring. See
+ [**keyrings**(7)](http://man7.org/linux/man-pages/man7/keyrings.7.html)
+ for more details.
+
+ - **\-y**
+ Keep the current user's supplementary groups.
+
+ - **\-Y**
+ Synchronize seccomp filters across thread group.
+
+ - **\-z**
+ Don't forward any signals to the jailed process. For example, when
+ not using **\-i**, sending **SIGINT** (e.g., CTRL-C on the
+ terminal), will kill the minijail0 process, not the jailed process.
+
+ - **\-\-ambient**
+ Raise ambient capabilities to match the mask specified by **\-c**.
+ Since ambient capabilities are preserved across
+ [**execve**(2)](http://man7.org/linux/man-pages/man2/execve.2.html),
+ this allows for process trees to have a restricted set of
+ capabilities, even if they are capability-dumb binaries. See
+ [**capabilities**(7)](http://man7.org/linux/man-pages/man7/capabilities.7.html).
+
+ - **\-\-uts[=hostname]**
+ Create a new UTS/hostname namespace, and optionally set the hostname
+ in the new namespace to *hostname*.
+
+ - **\-\-logging=<system>**
+ Use *system* as the logging system. *system* must be one of **auto**
+ (the default), **syslog**, or **stderr**.
+
+**auto** will use **stderr** if connected to a tty (e.g. run directly by
+a user), otherwise it will use **syslog**.
+
+ - **\-\-profile \<profile\>**
+ Choose from one of the available sandboxing profiles, which are
+ simple way to get a standardized environment. See the **SANDBOXING
+ PROFILES** section below for the full list of supported values for
+ *profile*.
+
+ - **\-\-preload\-library \<file path\>**
+ Allows overriding the default path of */lib/libminijailpreload.so*.
+ This is only really useful for testing. **\-\-seccomp\-bpf\-binary
+ \<arch-specific BPF binary\>** This is similar to **\-S**, but
+ instead of using a policy file, **\-\-secomp\-bpf\-binary** expects
+ a arch-and-kernel-version-specific pre-compiled BPF binary (such as
+ the ones produced by **parse\_seccomp\_policy**). Note that the
+ filter might be different based on the runtime environment; see
+ [**minijail0**(5)](./minijail0.5)
+ for more details.
+
+## Sandboxing Profiles
+
+The following sandboxing profiles are supported:
+
+ - **minimalistic-mountns**
+ Set up a minimalistic mount namespace. Equivalent to **\-v \-P
+ /var/empty** -b / -b /proc -b /dev/log -t -r --mount-dev.
+
+## Implementation
+
+This program is broken up into two parts: **minijail0** (the frontend)
+and a helper library called **libminijailpreload**. Some jailings can
+only be achieved from the process to which they will actually apply:
+
+> · capability use (without using ambient capabilities): non-ambient
+> capabilities are not inherited across
+> [**execve**(2)](http://man7.org/linux/man-pages/man2/execve.2.html)
+> unless the file being executed has POSIX file capabilities. Ambient
+> capabilities (the **\-\-ambient** flag) fix capability inheritance
+> across
+> [**execve**(2)](http://man7.org/linux/man-pages/man2/execve.2.html) to
+> avoid the need for file capabilities.
+
+· seccomp: a meaningful seccomp filter policy should disallow
+[**execve**(2)](http://man7.org/linux/man-pages/man2/execve.2.html), to
+prevent a compromised process from executing a different binary.
+However, this would prevent the seccomp policy from being applied before
+[**execve**(2)](http://man7.org/linux/man-pages/man2/execve.2.html).
+
+To this end, **libminijailpreload** is forcibly loaded into all
+dynamically-linked target programs by default; we pass the specific
+restrictions in an environment variable which the preloaded library
+looks for. The forcibly-loaded library then applies the restrictions to
+the newly-loaded program.
+
+This behavior can be disabled by the use of the **\-T static** flag.
+There are other cases in which the use of this flag might be useful:
+
+> · When *program* is linked against a different version of **libc.so**
+> than **libminijailpreload.so**.
+
+· When
+[**execve**(2)](http://man7.org/linux/man-pages/man2/execve.2.html) has
+side-effects that interact badly with the jailing process. If the system
+uses SELinux,
+[**execve**(2)](http://man7.org/linux/man-pages/man2/execve.2.html) can
+cause an automatic domain transition, which would then require that the
+target domain allows the operations to jail *program*.
+
+## Author
+
+The Chromium OS Authors \<chromiumos-dev@chromium.org\>
+
+## Copyright
+
+Copyright © 2011 The Chromium OS Authors License BSD-like.
+
+## See Also
+
+**libminijail.h**,
+[**minijail0**(5)](./minijail0.5),
+[**seccomp**(2)](http://man7.org/linux/man-pages/man2/seccomp.2.html)
diff --git a/minijail0.5.md b/minijail0.5.md
new file mode 100644
index 0000000..2a1eb0d
--- /dev/null
+++ b/minijail0.5.md
@@ -0,0 +1,180 @@
+# minijail0(5): sandbox a process
+
+ - [Description](#description)
+ - [Examples](#examples)
+ - [Seccomp\_Filter Policy](#seccomp_filter-policy)
+ - [Seccomp\_Filter Syntax](#seccomp_filter-syntax)
+ - [Atom Syntax](#atom-syntax)
+ - [Return Values](#return-values)
+ - [Seccomp\_Filter Policy Writing](#seccomp_filter-policy-writing)
+ - [Author](#author)
+ - [Copyright](#copyright)
+ - [See Also](#see-also)
+
+## Description
+
+Runs PROGRAM inside a sandbox. See
+[**minijail0**(1)](./minijail0.1)
+for details.
+
+## Examples
+
+Safely switch from root to nobody while dropping all capabilities and
+inheriting any groups from nobody:
+
+\# minijail0 -c 0 -G -u nobody /usr/bin/whoami nobody
+
+Run in a PID and VFS namespace without superuser capabilities (but still
+as root) and with a private view of /proc:
+
+\# minijail0 -p -v -r -c 0 /bin/ps PID TTY TIME CMD 1 pts/0 00:00:00
+minijail0 2 pts/0 00:00:00 ps
+
+Running a process with a seccomp filter policy at reduced privileges:
+
+\# minijail0 -S /usr/share/minijail0/$(uname -m)/cat.policy -- \\
+/bin/cat /proc/self/seccomp\_filter ...
+
+## Seccomp\_Filter Policy
+
+The policy file supplied to the **\-S** argument supports the following
+syntax:
+
+**\<syscall\_name\>**:**\<ftrace filter policy\>**
+**\<syscall\_number\>**:**\<ftrace filter policy\>** **\<empty line\>**
+**\# any single line comment**
+
+Long lines may be broken up using \\ at the end.
+
+A policy that emulates
+[**seccomp**(2)](http://man7.org/linux/man-pages/man2/seccomp.2.html) in
+mode 1 may look like: read: 1 write: 1 sig\_return: 1 exit: 1
+
+The "1" acts as a wildcard and allows any use of the mentioned system
+call. More advanced filtering is possible if your kernel supports
+CONFIG\_FTRACE\_SYSCALLS. For example, we can allow a process to open
+any file read only and mmap PROT\_READ only:
+
+\# open with O\_LARGEFILE|O\_RDONLY|O\_NONBLOCK or some combination
+open: arg1 == 32768 || arg1 == 0 || arg1 == 34816 || arg1 == 2048 mmap2:
+arg2 == 0x0 munmap: 1 close: 1
+
+The supported arguments may be found by reviewing the system call
+prototypes in the Linux kernel source code. Be aware that any
+non-numeric comparison may be subject to time-of-check-time-of-use
+attacks and cannot be considered safe.
+
+**execve** may only be used when invoking with CAP\_SYS\_ADMIN
+privileges.
+
+In order to promote reusability, policy files can include other policy
+files using the following syntax:
+
+**@include /absolute/path/to/file.policy** **@include
+./path/relative/to/CWD/file.policy**
+
+Inclusion is limited to a single level (i.e. files that are
+**@include**d cannot themselves **@include** more files), since that
+makes the policies harder to understand.
+
+## Seccomp\_Filter Syntax
+
+More formally, the expression after the colon can be an expression in
+Disjunctive Normal Form (DNF): a disjunction ("or", *||*) of
+conjunctions ("and", *&&*) of atoms.
+
+### Atom Syntax
+
+Atoms are of the form *arg{DNUM} {OP} {VAL}* where:
+
+> · *DNUM* is a decimal number
+
+· *OP* is an unsigned comparison operator: *==*, *\!=*, *\<*, *\<=*,
+*\>*, *\>=*, *&* (flags set), or *in* (inclusion)
+
+· AL is a constant expression. It can be a named constant (like
+**O\_RDONLY**), a number (octal, decimal, or hexadecimal), a mask of
+constants separated by *|*, or a parenthesized constant expression.
+Constant expressions can also be prefixed with the bitwise complement
+operator *\~* to produce their complement.
+
+*==*, *\!=*, *\<*, *\<=*, *\>*, and *\>=* should be pretty self
+explanatory.
+
+*&* will test for a flag being set, for example, O\_RDONLY for
+[**open**(2)](http://man7.org/linux/man-pages/man2/open.2.html):
+
+open: arg1 & O\_RDONLY
+
+Minijail supports most common named constants, like O\_RDONLY. It's
+preferable to use named constants rather than numeric values as not all
+architectures use the same numeric value.
+
+When the possible combinations of allowed flags grow, specifying them
+all can be cumbersome. This is where the *in* operator comes handy. The
+system call will be allowed iff the flags set in the argument are
+included (as a set) in the flags in the policy:
+
+mmap: arg3 in MAP\_PRIVATE|MAP\_ANONYMOUS
+
+This will allow
+[**mmap**(2)](http://man7.org/linux/man-pages/man2/mmap.2.html) as long
+as *arg3* (flags) has any combination of MAP\_PRIVATE and
+MAP\_ANONYMOUS, but nothing else. One common use of this is to restrict
+[**mmap**(2)](http://man7.org/linux/man-pages/man2/mmap.2.html) /
+[**mprotect**(2)](http://man7.org/linux/man-pages/man2/mprotect.2.html)
+to only allow write^exec mappings:
+
+mmap: arg2 in \~PROT\_EXEC || arg2 in \~PROT\_WRITE mprotect: arg2 in
+\~PROT\_EXEC || arg2 in \~PROT\_WRITE
+
+### Return Values
+
+By default, blocked syscalls call the process to be killed. The *return
+{NUM}* syntax can be used to force a specific errno to be returned
+instead.
+
+read: return EBADF
+
+This expression will block the
+[**read**(2)](http://man7.org/linux/man-pages/man2/read.2.html) syscall,
+make it return -1, and set **errno** to EBADF (9 on x86 platforms).
+
+An expression can also include an optional *return \<errno\>* clause,
+separated by a semicolon:
+
+read: arg0 == 0; return EBADF
+
+This is, if the first argument to read is 0, then allow the syscall;
+else, block the syscall, return -1, and set **errno** to EBADF.
+
+## Seccomp\_Filter Policy Writing
+
+Determining policy for seccomp\_filter can be time consuming. System
+calls are often named in arch-specific, or legacy tainted, ways. E.g.,
+geteuid versus geteuid32. On process death due to a seccomp filter rule,
+the offending system call number will be supplied with a best guess of
+the ABI defined name. This information may be used to produce working
+baseline policies. However, if the process being contained has a fairly
+tight working domain, using **tools/generate\_seccomp\_policy.py** with
+the output of **strace \-f \-e raw=all \<program\>** can generate the
+list of system calls that are needed. Note that when using libminijail
+or minijail with preloading, supporting initial process setup calls will
+not be required. Be conservative.
+
+It's also possible to analyze the binary checking for all non-dead
+functions and determining if any of them issue system calls. There is no
+active implementation for this, but something like
+code.google.com/p/seccompsandbox is one possible runtime variant.
+
+## Author
+
+The Chromium OS Authors \<chromiumos-dev@chromium.org\>
+
+## Copyright
+
+Copyright © 2011 The Chromium OS Authors License BSD-like.
+
+## See Also
+
+[**minijail0**(1)](./minijail0.1)