docs/getting-started/new-project-guide/python_lang.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132

---
layout: default
title: Integrating a Python project
parent: Setting up a new project
grand_parent: Getting started
nav_order: 3
permalink: /getting-started/new-project-guide/python-lang/
---

# Integrating a Python project
{: .no_toc}

- TOC
{:toc}
---


The process of integrating a project written in Python with OSS-Fuzz is very
similar to the general
[Setting up a new project]({{ site.baseurl }}/getting-started/new-project-guide/)
process. The key specifics of integrating a Python project are outlined below.

## Atheris

Python fuzzing in OSS-Fuzz depends on
[Atheris](https://github.com/google/atheris). Fuzzers will depend on the
`atheris` package, and dependencies are pre-installed on the OSS-Fuzz base
docker images.

## Project files

### Example project

We recommending viewing [ujson](https://github.com/google/oss-fuzz/tree/master/projects/ujson) as an
example of a simple Python fuzzing project, with both plain-Atheris and
Atheris + Hypothesis harnesses.

### project.yaml

The `language` attribute must be specified.

```yaml
language: python
```

The only supported fuzzing engine is libFuzzer (`libfuzzer`). The supported
sanitizers are AddressSanitizer (`address`) and
UndefinedBehaviorSanitizer (`undefined`). These must be explicitly specified.

```yaml
fuzzing_engines:
  - libfuzzer
sanitizers:
  - address
  - undefined
```

### Dockerfile

Because most dependencies are already pre-installed on the images, no
significant changes are needed in the Dockerfile for Python fuzzing projects.
You should simply clone the project, set a `WORKDIR`, and copy any necessary
files, or install any project-specific dependencies here as you normally would.

### build.sh

For Python projects, `build.sh` does need some more significant modifications
over normal projects. The following is an annotated example build script,
explaining why each step is necessary and when they can be omitted.

```sh
# Build and install project (using current CFLAGS, CXXFLAGS). This is required
# for projects with C extensions so that they're built with the proper flags.
pip3 install .

# Build fuzzers into $OUT. These could be detected in other ways.
for fuzzer in $(find $SRC -name '*_fuzzer.py'); do
  fuzzer_basename=$(basename -s .py $fuzzer)
  fuzzer_package=${fuzzer_basename}.pkg

  # To avoid issues with Python version conflicts, or changes in environment
  # over time on the OSS-Fuzz bots, we use pyinstaller to create a standalone
  # package. Though not necessarily required for reproducing issues, this is
  # required to keep fuzzers working properly in OSS-Fuzz.
  pyinstaller --distpath $OUT --onefile --name $fuzzer_package $fuzzer

  # Create execution wrapper. Atheris requires that certain libraries are
  # preloaded, so this is also done here to ensure compatibility and simplify
  # test case reproduction. Since this helper script is what OSS-Fuzz will
  # actually execute, it is also always required.
  # NOTE: If you are fuzzing python-only code and do not have native C/C++
  # extensions, then remove the LD_PRELOAD line below as preloading sanitizer
  # library is not required and can lead to unexpected startup crashes.
  echo "#!/bin/sh
# LLVMFuzzerTestOneInput for fuzzer detection.
this_dir=\$(dirname \"\$0\")
LD_PRELOAD=\$this_dir/sanitizer_with_fuzzer.so \
ASAN_OPTIONS=\$ASAN_OPTIONS:symbolize=1:external_symbolizer_path=\$this_dir/llvm-symbolizer:detect_leaks=0 \
\$this_dir/$fuzzer_package \$@" > $OUT/$fuzzer_basename
  chmod u+x $OUT/$fuzzer_basename
done
```

## Hypothesis

Using [Hypothesis](https://hypothesis.readthedocs.io/), the Python library for
[property-based testing](https://hypothesis.works/articles/what-is-property-based-testing/),
makes it really easy to generate complex inputs - whether in traditional test suites
or [by using test functions as fuzz harnesses](https://hypothesis.readthedocs.io/en/latest/details.html#use-with-external-fuzzers).

> Property based testing is the construction of tests such that, when these tests are fuzzed,
  failures in the test reveal problems with the system under test that could not have been
  revealed by direct fuzzing of that system.

You also get integrated test-case reduction for free - meaning that it's trivial to
report a canonical minimal example for each distinct failure discovered while fuzzing!

See [here for the core "strategies"](https://hypothesis.readthedocs.io/en/latest/data.html),
for arbitrary data, [here for Numpy + Pandas support](https://hypothesis.readthedocs.io/en/latest/numpy.html),
or [here for a variety of third-party extensions](https://hypothesis.readthedocs.io/en/latest/strategies.html)
supporting everything from protobufs, to jsonschemas, to networkx graphs or geojson
or valid Python source code.

To use Hypothesis in OSS-Fuzz, install it in your Dockerfile with

```shell
RUN pip3 install hypothesis
```

See [the `ujson` structured fuzzer](https://github.com/google/oss-fuzz/blob/master/projects/ujson/hypothesis_structured_fuzzer.py)
for an example "polyglot" which can either be run with `pytest` as a standard test function,
or run with OSS-Fuzz as a fuzz harness.