aboutsummaryrefslogtreecommitdiff
path: root/minijail0.5
blob: c0e18e874ca8c6aac3589a525e5b3129d0446c5b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
.TH MINIJAIL0 "5" "July 2011" "ChromiumOS" "User Commands"
.SH NAME
minijail0 \- sandbox a process
.SH DESCRIPTION
.PP
Runs PROGRAM inside a sandbox. See \fBminijail0\fR(1) for details.
.SH EXAMPLES

Safely switch from user \fIroot\fR to \fInobody\fR, switch to primary group
\fInobody\fR, drop all capabilities, and inherit any supplementary groups from
user \fInobody\fR:

  # minijail0 -u nobody -g nobody -c 0 -G /usr/bin/whoami
  nobody

Run in a PID and VFS namespace without superuser capabilities (but still
as root) and with a private view of /proc:

  # minijail0 -p -v -r -c 0 /bin/ps
    PID TTY           TIME CMD
      1 pts/0     00:00:00 minijail0
      2 pts/0     00:00:00 ps

Running a process with a seccomp filter policy at reduced privileges:

  # minijail0 -S /usr/share/minijail0/$(uname -m)/cat.policy -- \\
              /bin/cat /proc/self/seccomp_filter
  ...

.SH SECCOMP_FILTER POLICY
The policy file supplied to the \fB-S\fR argument supports the following syntax:

  \fB<syscall_name>\fR:\fB<ftrace filter policy>\fR
  \fB<syscall_number>\fR:\fB<ftrace filter policy>\fR
  \fB<empty line>\fR
  \fB# any single line comment\fR

Long lines may be broken up using \\ at the end.

A policy that emulates \fBseccomp\fR(2) in mode 1 may look like:
  read: 1
  write: 1
  sig_return: 1
  exit: 1

The "1" acts as a wildcard and allows any use of the mentioned system
call.  More advanced filtering is possible if your kernel supports
CONFIG_FTRACE_SYSCALLS.  For example, we can allow a process to open any
file read only and mmap PROT_READ only:

  # open with O_LARGEFILE|O_RDONLY|O_NONBLOCK or some combination
  open: arg1 == 32768 || arg1 == 0 || arg1 == 34816 || arg1 == 2048
  mmap2: arg2 == 0x0
  munmap: 1
  close: 1

The supported arguments may be found by reviewing the system call
prototypes in the Linux kernel source code.  Be aware that any
non-numeric comparison may be subject to time-of-check-time-of-use
attacks and cannot be considered safe.

\fBexecve\fR may only be used when invoking with CAP_SYS_ADMIN privileges.

In order to promote reusability, policy files can include other policy files
using the following syntax:

  \fB@include /absolute/path/to/file.policy\fR
  \fB@include ./path/relative/to/CWD/file.policy\fR

Inclusion is limited to a single level (i.e. files that are \fB@include\fRd
cannot themselves \fB@include\fR more files), since that makes the policies
harder to understand.

.SH SECCOMP_FILTER SYNTAX
More formally, the expression after the colon can be an expression in
Disjunctive Normal Form (DNF): a disjunction ("or", \fI||\fR) of
conjunctions ("and", \fI&&\fR) of atoms.

.SS "Atom Syntax"
Atoms are of the form \fIarg{DNUM} {OP} {VAL}\fR where:
.IP
\[bu] \fIDNUM\fR is a decimal number

\[bu] \fIOP\fR is an unsigned comparison operator:
\fI==\fR, \fI!=\fR, \fI<\fR, \fI<=\fR, \fI>\fR, \fI>=\fR, \fI&\fR (flags set),
or \fIin\fR (inclusion)

\[bu] \fVAL\fR is a constant expression.  It can be a named constant (like
\fBO_RDONLY\fR), a number (octal, decimal, or hexadecimal), a mask of constants
separated by \fI|\fR, or a parenthesized constant expression. Constant
expressions can also be prefixed with the bitwise complement operator \fI~\fR
to produce their complement.
.RE

\fI==\fR, \fI!=\fR, \fI<\fR, \fI<=\fR, \fI>\fR, and \fI>=\fR should be pretty
self explanatory.

\fI&\fR will test for a flag being set, for example, O_RDONLY for
.BR open (2):

  open: arg1 & O_RDONLY

Minijail supports most common named constants, like O_RDONLY.
It's preferable to use named constants rather than numeric values as not all
architectures use the same numeric value.

When the possible combinations of allowed flags grow, specifying them all can
be cumbersome.
This is where the \fIin\fR operator comes handy.
The system call will be allowed iff the flags set in the argument are included
(as a set) in the flags in the policy:

  mmap: arg3 in MAP_PRIVATE|MAP_ANONYMOUS

This will allow \fBmmap\fR(2) as long as \fIarg3\fR (flags) has any combination
of MAP_PRIVATE and MAP_ANONYMOUS, but nothing else.  One common use of this is
to restrict \fBmmap\fR(2) / \fBmprotect\fR(2) to only allow write^exec
mappings:

  mmap: arg2 in ~PROT_EXEC || arg2 in ~PROT_WRITE
  mprotect: arg2 in ~PROT_EXEC || arg2 in ~PROT_WRITE

.SS "Return Values"

By default, blocked syscalls call the process to be killed.
The \fIreturn {NUM}\fR syntax can be used to force a specific errno to be
returned instead.

  read: return EBADF

This expression will block the \fBread\fR(2) syscall, make it return -1, and set
\fBerrno\fR to EBADF (9 on x86 platforms).

An expression can also include an optional \fIreturn <errno>\fR clause,
separated by a semicolon:

  read: arg0 == 0; return EBADF

This is, if the first argument to read is 0, then allow the syscall;
else, block the syscall, return -1, and set \fBerrno\fR to EBADF.

.SH SECCOMP_FILTER POLICY WRITING

Determining policy for seccomp_filter can be time consuming.  System
calls are often named in arch-specific, or legacy tainted, ways.  E.g.,
geteuid versus geteuid32.  On process death due to a seccomp filter
rule, the offending system call number will be supplied with a best
guess of the ABI defined name.  This information may be used to produce
working baseline policies.  However, if the process being contained has
a fairly tight working domain, using \fBtools/generate_seccomp_policy.py\fR
with the output of \fBstrace -f -e raw=all <program>\fR can generate the list
of system calls that are needed.  Note that when using libminijail or minijail
with preloading, supporting initial process setup calls will not be required.
Be conservative.

It's also possible to analyze the binary checking for all non-dead
functions and determining if any of them issue system calls.  There is
no active implementation for this, but something like
code.google.com/p/seccompsandbox is one possible runtime variant.

.SH CONFIGURATION FILE
A configuration file can be used to specify command line options and other
settings.

It supports the following syntax:
  \fB% minijail-config-file v0\fR
  \fB<option>\fR=\fB<argument>\fR
  \fB<no-argument-option>\fR
  \fB<empty line>\fR
  \fB# any single line comment\fR

Long lines may be broken up using \\ at the end.

The special directive "% minijail-config-file v0" must occupy the first line.
"v0" also declares the version of the config file format.

Keys contain only alphabetic characters and '-'. Values can be any non-empty
string. Leading and trailing whitespaces around keys and
values are permitted but will be stripped before processing.

Currently all long options are supported such as
\fBmount\fR, \fBbind-mount\fR. For a option that has no argument, the option
will occupy a single line, without '=' and value. Otherwise, any string that
is given after the '=' is interpreted as the argument.

.SH AUTHOR
The ChromiumOS Authors <chromiumos-dev@chromium.org>
.SH COPYRIGHT
Copyright \(co 2011 The ChromiumOS Authors
License BSD-like.
.SH "SEE ALSO"
.BR minijail0 (1)