docs/fuzzing_binary-only_targets.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304

# Fuzzing binary-only targets

AFL++, libfuzzer, and other fuzzers are great if you have the source code of the
target. This allows for very fast and coverage guided fuzzing.

However, if there is only the binary program and no source code available, then
standard `afl-fuzz -n` (non-instrumented mode) is not effective.

For fast, on-the-fly instrumentation of black-box binaries, AFL++ still offers
various support. The following is a description of how these binaries can be
fuzzed with AFL++.

## TL;DR:

FRIDA mode and QEMU mode in persistent mode are the fastest - if persistent mode
is possible and the stability is high enough.

Otherwise, try Zafl, RetroWrite, Dyninst, and if these fail, too, then try
standard FRIDA/QEMU mode with `AFL_ENTRYPOINT` to where you need it.

If your target is non-linux, then use unicorn_mode.

## Fuzzing binary-only targets with AFL++

### QEMU mode

QEMU mode is the "native" solution to the program. It is available in the
./qemu_mode/ directory and, once compiled, it can be accessed by the afl-fuzz -Q
command line option. It is the easiest to use alternative and even works for
cross-platform binaries.

For linux programs and its libraries, this is accomplished with a version of
QEMU running in the lesser-known "user space emulation" mode. QEMU is a project
separate from AFL++, but you can conveniently build the feature by doing:

```shell
cd qemu_mode
./build_qemu_support.sh
```

The following setup to use QEMU mode is recommended:

* run 1 afl-fuzz -Q instance with CMPLOG (`-c 0` + `AFL_COMPCOV_LEVEL=2`)
* run 1 afl-fuzz -Q instance with QASAN (`AFL_USE_QASAN=1`)
* run 1 afl-fuzz -Q instance with LAF (`AFL_PRELOAD=libcmpcov.so` +
  `AFL_COMPCOV_LEVEL=2`), alternatively you can use FRIDA mode, just switch `-Q`
  with `-O` and remove the LAF instance

Then run as many instances as you have cores left with either -Q mode or - even
better - use a binary rewriter like Dyninst, RetroWrite, ZAFL, etc.
The binary rewriters all have their own advantages and caveats.
ZAFL is the best but cannot be used in a business/commercial context.

If a binary rewriter works for your target then you can use afl-fuzz normally
and it will have twice the speed compared to QEMU mode (but slower than QEMU
persistent mode).

The speed decrease of QEMU mode is at about 50%. However, various options exist
to increase the speed:
- using AFL_ENTRYPOINT to move the forkserver entry to a later basic block in
  the binary (+5-10% speed)
- using persistent mode
  [qemu_mode/README.persistent.md](../qemu_mode/README.persistent.md) this will
  result in a 150-300% overall speed increase - so 3-8x the original QEMU mode
  speed!
- using AFL_CODE_START/AFL_CODE_END to only instrument specific parts

For additional instructions and caveats, see
[qemu_mode/README.md](../qemu_mode/README.md). If possible, you should use the
persistent mode, see
[qemu_mode/README.persistent.md](../qemu_mode/README.persistent.md). The mode is
approximately 2-5x slower than compile-time instrumentation, and is less
conducive to parallelization.

Note that there is also honggfuzz:
[https://github.com/google/honggfuzz](https://github.com/google/honggfuzz) which
now has a QEMU mode, but its performance is just 1.5% ...

If you like to code a customized fuzzer without much work, we highly recommend
to check out our sister project libafl which supports QEMU, too:
[https://github.com/AFLplusplus/LibAFL](https://github.com/AFLplusplus/LibAFL)

### WINE+QEMU

Wine mode can run Win32 PE binaries with the QEMU instrumentation. It needs
Wine, python3, and the pefile python package installed.

It is included in AFL++.

For more information, see
[qemu_mode/README.wine.md](../qemu_mode/README.wine.md).

### FRIDA mode

In FRIDA mode, you can fuzz binary-only targets as easily as with QEMU mode.
FRIDA mode is most of the times slightly faster than QEMU mode. It is also
newer, and has the advantage that it works on MacOS (both intel and M1).

To build FRIDA mode:

```shell
cd frida_mode
gmake
```

For additional instructions and caveats, see
[frida_mode/README.md](../frida_mode/README.md).

If possible, you should use the persistent mode, see
[instrumentation/README.persistent_mode.md](../instrumentation/README.persistent_mode.md).
The mode is approximately 2-5x slower than compile-time instrumentation, and is
less conducive to parallelization. But for binary-only fuzzing, it gives a huge
speed improvement if it is possible to use.

You can also perform remote fuzzing with frida, e.g., if you want to fuzz on
iPhone or Android devices, for this you can use
[https://github.com/ttdennis/fpicker/](https://github.com/ttdennis/fpicker/) as
an intermediate that uses AFL++ for fuzzing.

If you like to code a customized fuzzer without much work, we highly recommend
to check out our sister project libafl which supports Frida, too:
[https://github.com/AFLplusplus/LibAFL](https://github.com/AFLplusplus/LibAFL).
Working examples already exist :-)

### Nyx mode

Nyx is a full system emulation fuzzing environment with snapshot support that is
built upon KVM and QEMU. It is only available on Linux and currently restricted
to x86_x64.

For binary-only fuzzing a special 5.10 kernel is required.

See [nyx_mode/README.md](../nyx_mode/README.md).

### Unicorn

Unicorn is a fork of QEMU. The instrumentation is, therefore, very similar. In
contrast to QEMU, Unicorn does not offer a full system or even userland
emulation. Runtime environment and/or loaders have to be written from scratch,
if needed. On top, block chaining has been removed. This means the speed boost
introduced in the patched QEMU Mode of AFL++ cannot be ported over to Unicorn.

For non-Linux binaries, you can use AFL++'s unicorn_mode which can emulate
anything you want - for the price of speed and user written scripts.

To build unicorn_mode:

```shell
cd unicorn_mode
./build_unicorn_support.sh
```

For further information, check out
[unicorn_mode/README.md](../unicorn_mode/README.md).

### Shared libraries

If the goal is to fuzz a dynamic library, then there are two options available.
For both, you need to write a small harness that loads and calls the library.
Then you fuzz this with either FRIDA mode or QEMU mode and either use
`AFL_INST_LIBS=1` or `AFL_QEMU/FRIDA_INST_RANGES`.

Another, less precise and slower option is to fuzz it with utils/afl_untracer/
and use afl-untracer.c as a template. It is slower than FRIDA mode.

For more information, see
[utils/afl_untracer/README.md](../utils/afl_untracer/README.md).

### Coresight

Coresight is ARM's answer to Intel's PT. With AFL++ v3.15, there is a coresight
tracer implementation available in `coresight_mode/` which is faster than QEMU,
however, cannot run in parallel. Currently, only one process can be traced, it
is WIP.

Fore more information, see
[coresight_mode/README.md](../coresight_mode/README.md).

## Binary rewriters

An alternative solution are binary rewriters. They are faster than the solutions
native to AFL++ but don't always work.

### ZAFL

ZAFL is a static rewriting platform supporting x86-64 C/C++,
stripped/unstripped, and PIE/non-PIE binaries. Beyond conventional
instrumentation, ZAFL's API enables transformation passes (e.g., laf-Intel,
context sensitivity, InsTrim, etc.).

Its baseline instrumentation speed typically averages 90-95% of
afl-clang-fast's.

[https://git.zephyr-software.com/opensrc/zafl](https://git.zephyr-software.com/opensrc/zafl)

### RetroWrite

RetroWrite is a static binary rewriter that can be combined with AFL++. If you
have an x86_64 or arm64 binary that does not contain C++ exceptions and - if
x86_64 - still has it's symbols and compiled with position independent code
(PIC/PIE), then the RetroWrite solution might be for you.
It decompiles to ASM files which can then be instrumented with afl-gcc.

Binaries that are statically instrumented for fuzzing using RetroWrite are close
in performance to compiler-instrumented binaries and outperform the QEMU-based
instrumentation.

[https://github.com/HexHive/retrowrite](https://github.com/HexHive/retrowrite)

### Dyninst

Dyninst is a binary instrumentation framework similar to Pintool and DynamoRIO.
However, whereas Pintool and DynamoRIO work at runtime, Dyninst instruments the
target at load time and then let it run - or save the binary with the changes.
This is great for some things, e.g., fuzzing, and not so effective for others,
e.g., malware analysis.

So, what you can do with Dyninst is taking every basic block and putting AFL++'s
instrumentation code in there - and then save the binary. Afterwards, just fuzz
the newly saved target binary with afl-fuzz. Sounds great? It is. The issue
though - it is a non-trivial problem to insert instructions, which change
addresses in the process space, so that everything is still working afterwards.
Hence, more often than not binaries crash when they are run.

The speed decrease is about 15-35%, depending on the optimization options used
with afl-dyninst.

[https://github.com/vanhauser-thc/afl-dyninst](https://github.com/vanhauser-thc/afl-dyninst)

### Mcsema

Theoretically, you can also decompile to llvm IR with mcsema, and then use
llvm_mode to instrument the binary. Good luck with that.

[https://github.com/lifting-bits/mcsema](https://github.com/lifting-bits/mcsema)

## Binary tracers

### Pintool & DynamoRIO

Pintool and DynamoRIO are dynamic instrumentation engines. They can be used for
getting basic block information at runtime. Pintool is only available for Intel
x32/x64 on Linux, Mac OS, and Windows, whereas DynamoRIO is additionally
available for ARM and AARCH64. DynamoRIO is also 10x faster than Pintool.

The big issue with DynamoRIO (and therefore Pintool, too) is speed. DynamoRIO
has a speed decrease of 98-99%, Pintool has a speed decrease of 99.5%.

Hence, DynamoRIO is the option to go for if everything else fails and Pintool
only if DynamoRIO fails, too.

DynamoRIO solutions:
* [https://github.com/vanhauser-thc/afl-dynamorio](https://github.com/vanhauser-thc/afl-dynamorio)
* [https://github.com/mxmssh/drAFL](https://github.com/mxmssh/drAFL)
* [https://github.com/googleprojectzero/winafl/](https://github.com/googleprojectzero/winafl/)
  <= very good but windows only

Pintool solutions:
* [https://github.com/vanhauser-thc/afl-pin](https://github.com/vanhauser-thc/afl-pin)
* [https://github.com/mothran/aflpin](https://github.com/mothran/aflpin)
* [https://github.com/spinpx/afl_pin_mode](https://github.com/spinpx/afl_pin_mode)
  <= only old Pintool version supported

### Intel PT

If you have a newer Intel CPU, you can make use of Intel's processor trace. The
big issue with Intel's PT is the small buffer size and the complex encoding of
the debug information collected through PT. This makes the decoding very CPU
intensive and hence slow. As a result, the overall speed decrease is about
70-90% (depending on the implementation and other factors).

There are two AFL intel-pt implementations:

1. [https://github.com/junxzm1990/afl-pt](https://github.com/junxzm1990/afl-pt)
    => This needs Ubuntu 14.04.05 without any updates and the 4.4 kernel.

2. [https://github.com/hunter-ht-2018/ptfuzzer](https://github.com/hunter-ht-2018/ptfuzzer)
    => This needs a 4.14 or 4.15 kernel. The "nopti" kernel boot option must be
    used. This one is faster than the other.

Note that there is also honggfuzz:
[https://github.com/google/honggfuzz](https://github.com/google/honggfuzz). But
its IPT performance is just 6%!

## Non-AFL++ solutions

There are many binary-only fuzzing frameworks. Some are great for CTFs but don't
work with large binaries, others are very slow but have good path discovery,
some are very hard to set up...

* Jackalope:
  [https://github.com/googleprojectzero/Jackalope](https://github.com/googleprojectzero/Jackalope)
* Manticore:
  [https://github.com/trailofbits/manticore](https://github.com/trailofbits/manticore)
* QSYM:
  [https://github.com/sslab-gatech/qsym](https://github.com/sslab-gatech/qsym)
* S2E: [https://github.com/S2E](https://github.com/S2E)
* TinyInst:
  [https://github.com/googleprojectzero/TinyInst](https://github.com/googleprojectzero/TinyInst)
*  ... please send me any missing that are good

## Closing words

That's it! News, corrections, updates? Send an email to vh@thc.org.