summaryrefslogtreecommitdiff
path: root/simpleperf/README.md
blob: 5311f03d3505532e624cee83f857451616806fa7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
# Simpleperf

Simpleperf is a native profiling tool for Android. It can be used to profile
both Android applications and native processes running on Android. It can
profile both Java and C++ code on Android. It can be used on Android L
and above.

Simpleperf is part of the Android Open Source Project. The source code is [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/).
The latest document is [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/README.md).
Bugs and feature requests can be submitted at http://github.com/android-ndk/ndk/issues.


## Table of Contents

- [Simpleperf introduction](#simpleperf-introduction)
    - [Why simpleperf](#why-simpleperf)
    - [Tools in simpleperf](#tools-in-simpleperf)
    - [Simpleperf's profiling principle](#simpleperfs-profiling-principle)
    - [Main simpleperf commands](#main-simpleperf-commands)
        - [Simpleperf list](#simpleperf-list)
        - [Simpleperf stat](#simpleperf-stat)
        - [Simpleperf record](#simpleperf-record)
        - [Simpleperf report](#simpleperf-report)
- [Android application profiling](#android-application-profiling)
    - [Prepare an Android application](#prepare-an-android-application)
    - [Record and report profiling data (using command-lines)](#record-and-report-profiling-data-using-commandlines)
    - [Record and report profiling data (using python scripts)](#record-and-report-profiling-data-using-python-scripts)
    - [Record and report call graph](#record-and-report-call-graph)
    - [Visualize profiling data](#visualize-profiling-data)
    - [Annotate source code](#annotate-source-code)
- [Answers to common issues](#answers-to-common-issues)
    - [The correct way to pull perf.data on host](#the-correct-way-to-pull-perfdata-on-host)

## Simpleperf introduction

### Why simpleperf

Simpleperf works similar to linux-tools-perf, but it has some specific features for
Android profiling:

1. Aware of Android environment

    a. It can profile embedded shared libraries in apk.

    b. It reads symbols and debug information from .gnu_debugdata section.

    c. It gives suggestions when errors occur.

    d. When recording with -g option, unwind the stack before writting to file to
    save storage space.

    e. It supports adding additional information (like symbols) in perf.data, to
    support recording on device and reporting on host.

2. Using python scripts for profiling tasks

3. Easy to release

    a. Simpleperf executables on device are built as static binaries. They can be
    pushed on any Android device and run.

    b. Simpleperf executables on host are built as static binaries, and support
    different hosts: mac, linux and windows.


### Tools in simpleperf

Simpleperf is periodically released with Android ndk, located at `simpleperf/`.
The latest release can be found [here](https://android.googlesource.com/platform/prebuilts/simpleperf/).
Simpleperf tools contain executables, shared libraries and python scripts.

**Simpleperf executables running on Android device**
Simpleperf executables running on Android device are located at `bin/android/`.
Each architecture has one executable, like `bin/android/arm64/simpleperf`. It
can record and report profiling data. It provides a command-line interface
broadly the same as the linux-tools perf, and also supports some additional
features for Android-specific profiling.

**Simpleperf executables running on hosts**
Simpleperf executables running on hosts are located at `bin/darwin`, `bin/linux`
and `bin/windows`. Each host and architecture has one executable, like
`bin/linux/x86_64/simpleperf`. It provides a command-line interface for
reporting profiling data on hosts.

**Simpleperf report shared libraries used on host**
Simpleperf report shared libraries used on host are located at `bin/darwin`,
`bin/linux` and `bin/windows`. Each host and architecture has one library, like
`bin/linux/x86_64/libsimpleperf_report.so`. It is a library for parsing
profiling data.

**Python scripts**
Python scripts are written to help different profiling tasks.

`annotate.py` is used to annotate source files based on profiling data.

`app_profiler.py` is used to profile Android applications.

`binary_cache_builder.py` is used to pull libraries from Android devices.

`pprof_proto_generator.py` is used to convert profiling data to format used by pprof.

`report.py` is used to provide a GUI interface to report profiling result.

`report_sample.py` is used to generate flamegraph.

`simpleperf_report_lib.py` provides a python interface for parsing profiling data.


### Simpleperf's profiling principle

Modern CPUs have a hardware component called the performance monitoring unit
(PMU). The PMU has several hardware counters, counting events like how many cpu
cycles have happened, how many instructions have executed, or how many cache
misses have happened.

The Linux kernel wraps these hardware counters into hardware perf events. In
addition, the Linux kernel also provides hardware independent software events
and tracepoint events. The Linux kernel exposes all this to userspace via the
perf_event_open system call, which simpleperf uses.

Simpleperf has three main functions: stat, record and report.

The stat command gives a summary of how many events have happened in the
profiled processes in a time period. Here’s how it works:
1. Given user options, simpleperf enables profiling by making a system call to
linux kernel.
2. Linux kernel enables counters while scheduling on the profiled processes.
3. After profiling, simpleperf reads counters from linux kernel, and reports a
counter summary.

The record command records samples of the profiled process in a time period.
Here’s how it works:
1. Given user options, simpleperf enables profiling by making a system call to
linux kernel.
2. Simpleperf creates mapped buffers between simpleperf and linux kernel.
3. Linux kernel enable counters while scheduling on the profiled processes.
4. Each time a given number of events happen, linux kernel dumps a sample to a
mapped buffer.
5. Simpleperf reads samples from the mapped buffers and generates perf.data.

The report command reads a "perf.data" file and any shared libraries used by
the profiled processes, and outputs a report showing where the time was spent.


### Main simpleperf commands

Simpleperf supports several subcommands, including list, stat, record and report.
Each subcommand supports different options. This section only covers the most
important subcommands and options. To see all subcommands and options,
use --help.

    # List all subcommands.
    $ simpleperf --help

    # Print help message for record subcommand.
    $ simpleperf record --help


#### Simpleperf list

simpleperf list is used to list all events available on the device. Different
devices may support different events because of differences in hardware and
kernel.

    $ simpleperf list
    List of hw-cache events:
      branch-loads
      ...
    List of hardware events:
      cpu-cycles
      instructions
      ...
    List of software events:
      cpu-clock
      task-clock
      ...


#### Simpleperf stat

simpleperf stat is used to get a raw event counter information of the profiled program
or system-wide. By passing options, we can select which events to use, which
processes/threads to monitor, how long to monitor and the print interval.
Below is an example.

    # Stat using default events (cpu-cycles,instructions,...), and monitor
    # process 7394 for 10 seconds.
    $ simpleperf stat -p 7394 --duration 10
    Performance counter statistics:

     1,320,496,145  cpu-cycles         # 0.131736 GHz                     (100%)
       510,426,028  instructions       # 2.587047 cycles per instruction  (100%)
         4,692,338  branch-misses      # 468.118 K/sec                    (100%)
    886.008130(ms)  task-clock         # 0.088390 cpus used               (100%)
               753  context-switches   # 75.121 /sec                      (100%)
               870  page-faults        # 86.793 /sec                      (100%)

    Total test time: 10.023829 seconds.

**Select events**
We can select which events to use via -e option. Below are examples:

    # Stat event cpu-cycles.
    $ simpleperf stat -e cpu-cycles -p 11904 --duration 10

    # Stat event cache-references and cache-misses.
    $ simpleperf stat -e cache-references,cache-misses -p 11904 --duration 10

When running the stat command, if the number of hardware events is larger than
the number of hardware counters available in the PMU, the kernel shares hardware
counters between events, so each event is only monitored for part of the total
time. In the example below, there is a percentage at the end of each row,
showing the percentage of the total time that each event was actually monitored.

    # Stat using event cache-references, cache-references:u,....
    $ simpleperf stat -p 7394 -e     cache-references,cache-references:u,cache-references:k,cache-misses,cache-misses:u,cache-misses:k,instructions --duration 1
    Performance counter statistics:

    4,331,018  cache-references     # 4.861 M/sec    (87%)
    3,064,089  cache-references:u   # 3.439 M/sec    (87%)
    1,364,959  cache-references:k   # 1.532 M/sec    (87%)
       91,721  cache-misses         # 102.918 K/sec  (87%)
       45,735  cache-misses:u       # 51.327 K/sec   (87%)
       38,447  cache-misses:k       # 43.131 K/sec   (87%)
    9,688,515  instructions         # 10.561 M/sec   (89%)

    Total test time: 1.026802 seconds.

In the example above, each event is monitored about 87% of the total time. But
there is no guarantee that any pair of events are always monitored at the same
time. If we want to have some events monitored at the same time, we can use
--group option. Below is an example.

    # Stat using event cache-references, cache-references:u,....
    $ simpleperf stat -p 7394 --group cache-references,cache-misses --group cache-references:u,cache-misses:u --group cache-references:k,cache-misses:k -e instructions --duration 1
    Performance counter statistics:

    3,638,900  cache-references     # 4.786 M/sec          (74%)
       65,171  cache-misses         # 1.790953% miss rate  (74%)
    2,390,433  cache-references:u   # 3.153 M/sec          (74%)
       32,280  cache-misses:u       # 1.350383% miss rate  (74%)
      879,035  cache-references:k   # 1.251 M/sec          (68%)
       30,303  cache-misses:k       # 3.447303% miss rate  (68%)
    8,921,161  instructions         # 10.070 M/sec         (86%)

    Total test time: 1.029843 seconds.

**Select target to monitor**
We can select which processes or threads to monitor via -p option or -t option.
Monitoring a process is the same as monitoring all threads in the process.
Simpleperf can also fork a child process to run the new command and then monitor
the child process. Below are examples.

    # Stat process 11904 and 11905.
    $ simpleperf stat -p 11904,11905 --duration 10

    # Stat thread 11904 and 11905.
    $ simpleperf stat -t 11904,11905 --duration 10

    # Start a child process running `ls`, and stat it.
    $ simpleperf stat ls

**Decide how long to monitor**
When monitoring existing threads, we can use --duration option to decide how long
to monitor. When monitoring a child process running a new command, simpleperf
monitors until the child process ends. In this case, we can use Ctrl-C to stop monitoring
at any time. Below are examples.

    # Stat process 11904 for 10 seconds.
    $ simpleperf stat -p 11904 --duration 10

    # Stat until the child process running `ls` finishes.
    $ simpleperf stat ls

    # Stop monitoring using Ctrl-C.
    $ simpleperf stat -p 11904 --duration 10
    ^C

**Decide the print interval**
When monitoring perf counters, we can also use --interval option to decide the print
interval. Below are examples.

    # Print stat for process 11904 every 300ms.
    $ simpleperf stat -p 11904 --duration 10 --interval 300

    # Print system wide stat at interval of 300ms for 10 seconds (rooted device only).
    # system wide profiling needs root privilege
    $ su 0 simpleperf stat -a --duration 10 --interval 300

**Display counters in systrace**
simpleperf can also work with systrace to dump counters in the collected trace.
Below is an example to do a system wide stat

    # capture instructions (kernel only) and cache misses with interval of 300 milliseconds for 15 seconds
    $ su 0 simpleperf stat -e instructions:k,cache-misses -a --interval 300 --duration 15
    # on host launch systrace to collect trace for 10 seconds
    (HOST)$ external/chromium-trace/systrace.py --time=10 -o new.html sched gfx view
    # open the collected new.html in browser and perf counters will be shown up


#### Simpleperf record

simpleperf record is used to dump records of the profiled program. By passing
options, we can select which events to use, which processes/threads to monitor,
what frequency to dump records, how long to monitor, and where to store records.

    # Record on process 7394 for 10 seconds, using default event (cpu-cycles),
    # using default sample frequency (4000 samples per second), writing records
    # to perf.data.
    $ simpleperf record -p 7394 --duration 10
    simpleperf I 07-11 21:44:11 17522 17522 cmd_record.cpp:316] Samples recorded: 21430. Samples lost: 0.

**Select events**
In most cases, the cpu-cycles event is used to evaluate consumed cpu time.
As a hardware event, it is both accurate and efficient. We can also use other
events via -e option. Below is an example.

    # Record using event instructions.
    $ simpleperf record -e instructions -p 11904 --duration 10

**Select target to monitor**
The way to select target in record command is similar to that in stat command.
Below are examples.

    # Record process 11904 and 11905.
    $ simpleperf record -p 11904,11905 --duration 10

    # Record thread 11904 and 11905.
    $ simpleperf record -t 11904,11905 --duration 10

    # Record a child process running `ls`.
    $ simpleperf record ls

**Set the frequency to record**
We can set the frequency to dump records via the -f or -c options. For example,
-f 4000 means dumping approximately 4000 records every second when the monitored
thread runs. If a monitored thread runs 0.2s in one second (it can be preempted
or blocked in other times), simpleperf dumps about 4000 * 0.2 / 1.0 = 800
records every second. Another way is using -c option. For example, -c 10000
means dumping one record whenever 10000 events happen. Below are examples.

    # Record with sample frequency 1000: sample 1000 times every second running.
    $ simpleperf record -f 1000 -p 11904,11905 --duration 10

    # Record with sample period 100000: sample 1 time every 100000 events.
    $ simpleperf record -c 100000 -t 11904,11905 --duration 10

**Decide how long to monitor**
The way to decide how long to monitor in record command is similar to that in
stat command. Below are examples.

    # Record process 11904 for 10 seconds.
    $ simpleperf record -p 11904 --duration 10

    # Record until the child process running `ls` finishes.
    $ simpleperf record ls

    # Stop monitoring using Ctrl-C.
    $ simpleperf record -p 11904 --duration 10
    ^C

**Set the path to store records**
By default, simpleperf stores records in perf.data in current directory. We can
use -o option to set the path to store records. Below is an example.

    # Write records to data/perf2.data.
    $ simpleperf record -p 11904 -o data/perf2.data --duration 10


#### Simpleperf report

simpleperf report is used to report based on perf.data generated by simpleperf
record command. Report command groups records into different sample entries,
sorts sample entries based on how many events each sample entry contains, and
prints out each sample entry. By passing options, we can select where to find
perf.data and executable binaries used by the monitored program, filter out
uninteresting records, and decide how to group records.

Below is an example. Records are grouped into 4 sample entries, each entry is
a row. There are several columns, each column shows piece of information
belonging to a sample entry. The first column is Overhead, which shows the
percentage of events inside current sample entry in total events. As the
perf event is cpu-cycles, the overhead can be seen as the percentage of cpu
time used in each function.

    # Reports perf.data, using only records sampled in libsudo-game-jni.so,
    # grouping records using thread name(comm), process id(pid), thread id(tid),
    # function name(symbol), and showing sample count for each row.
    $ simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so --sort comm,pid,tid,symbol -n
    Cmdline: /data/data/com.example.sudogame/simpleperf record -p 7394 --duration 10
    Arch: arm64
    Event: cpu-cycles (type 0, config 0)
    Samples: 28235
    Event count: 546356211

    Overhead  Sample  Command    Pid   Tid   Symbol
    59.25%    16680   sudogame  7394  7394  checkValid(Board const&, int, int)
    20.42%    5620    sudogame  7394  7394  canFindSolution_r(Board&, int, int)
    13.82%    4088    sudogame  7394  7394  randomBlock_r(Board&, int, int, int, int, int)
    6.24%     1756    sudogame  7394  7394  @plt

**Set the path to read records**
By default, simpleperf reads perf.data in current directory. We can use -i
option to select another file to read records.

    $ simpleperf report -i data/perf2.data

**Set the path to find executable binaries**
If reporting function symbols, simpleperf needs to read executable binaries
used by the monitored processes to get symbol table and debug information. By
default, the paths are the executable binaries used by monitored processes while
recording. However, these binaries may not exist when reporting or not contain
symbol table and debug information. So we can use --symfs to redirect the paths.
Below is an example.

    $ simpleperf report
    # In this case, when simpleperf wants to read executable binary /A/b,
    # it reads file in /A/b.

    $ simpleperf report --symfs /debug_dir
    # In this case, when simpleperf wants to read executable binary /A/b,
    # it prefers file in /debug_dir/A/b to file in /A/b.

**Filter records**
When reporting, it happens that not all records are of interest. Simpleperf
supports five filters to select records of interest. Below are examples.

    # Report records in threads having name sudogame.
    $ simpleperf report --comms sudogame

    # Report records in process 7394 or 7395
    $ simpleperf report --pids 7394,7395

    # Report records in thread 7394 or 7395.
    $ simpleperf report --tids 7394,7395

    # Report records in libsudo-game-jni.so.
    $ simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so

    # Report records in function checkValid or canFindSolution_r.
    $ simpleperf report --symbols "checkValid(Board const&, int, int);canFindSolution_r(Board&, int, int)"

**Decide how to group records into sample entries**
Simpleperf uses --sort option to decide how to group sample entries. Below are
examples.

    # Group records based on their process id: records having the same process
    # id are in the same sample entry.
    $ simpleperf report --sort pid

    # Group records based on their thread id and thread comm: records having
    # the same thread id and thread name are in the same sample entry.
    $ simpleperf report --sort tid,comm

    # Group records based on their binary and function: records in the same
    # binary and function are in the same sample entry.
    $ simpleperf report --sort dso,symbol

    # Default option: --sort comm,pid,tid,dso,symbol. Group records in the same
    # thread, and belong to the same function in the same binary.
    $ simpleperf report


## Android application profiling

This section shows how to profile an Android application.
[Here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/README.md) are examples. And we use
[SimpleperfExamplePureJava](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/SimpleperfExamplePureJava) project to show the profiling results.

Simpleperf only supports profiling native instructions in binaries in ELF
format. If the Java code is executed by interpreter, or with jit cache, it
can’t be profiled by simpleperf. As Android supports Ahead-of-time compilation,
it can compile Java bytecode into native instructions with debug information.
On devices with Android version <= M, we need root privilege to compile Java
bytecode with debug information. However, on devices with Android version >= N,
we don't need root privilege to do so.

Profiling an Android application involves three steps:
1. Prepare the application.
2. Record profiling data.
3. Report profiling data.

To profile, we can use either command lines or python scripts. Below shows both.


### Prepare an Android application

Before profiling, we need to install the application to be profiled on an Android device.
To get valid profiling results, please check following points:

**1. The application should be debuggable.**
It means [android:debuggable](https://developer.android.com/guide/topics/manifest/application-element.html#debug)
should be true. So we need to use debug [build type](https://developer.android.com/studio/build/build-variants.html#build-types)
instead of release build type. It is understandable because we can't profile others' apps.
However, on a rooted Android device, the application doesn't need to be debuggable.

**2. Run on an Android device >= L.**
Profiling on emulators are not yet supported. And to profile Java code, we need
the jvm running in oat mode, which is only available >= L.

**3. On Android O, add `wrap.sh` in the apk.**
To profile Java code, we need the jvm running in oat mode. But on Android O,
debuggable applications are forced to run in jit mode. To work around this,
we need to add a `wrap.sh` in the apk. So if you are running on Android O device,
Check [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/SimpleperfExamplePureJava/app/profiling.gradle)
for how to add `wrap.sh` in the apk.

**4. Make sure C++ code is compiled with optimizing flags.**
If the application contains C++ code, it can be compiled with -O0 flag in debug build type.
This makes C++ code slow. Check [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/SimpleperfExamplePureJava/app/profiling.gradle)
for how to avoid that.

**5. Use native libraries with debug info in the apk when possible.**
If the application contains C++ code or pre-compiled native libraries, try to use
unstripped libraries in the apk. This helps simpleperf generating better profiling
results. Check [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/SimpleperfExamplePureJava/app/profiling.gradle)
for how to use unstripped libraries.

Here we use [SimpleperfExamplePureJava](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/SimpleperfExamplePureJava) as an example.
It builds an app-profiling.apk for profiling.

    $ git clone https://android.googlesource.com/platform/system/extras
    $ cd extras/simpleperf/demo
    # Open SimpleperfExamplesPureJava project with Android studio,
    # and build this project sucessfully, otherwise the `./gradlew` command below will fail.
    $ cd SimpleperfExamplePureJava

    # On windows, use "gradlew" instead.
    $ ./gradlew clean assemble
    $ adb install -r app/build/outputs/apk/app-profiling.apk


### Record and report profiling data (using command-lines)

We recommend using python scripts for profiling because they are more convenient.
But using command-line will give us a better understanding of the profile process
step by step. So we first show how to use command lines.

**1. Enable profiling**

    $ adb shell setprop security.perf_harden 0

**2. Fully compile the app**

We need to compile Java bytecode into native instructions to profile Java code
in the application. This needs different commands on different Android versions.

On Android >= N:

    $ adb shell setprop debug.generate-debug-info true
    $ adb shell cmd package compile -f -m speed com.example.simpleperf.simpleperfexamplepurejava
    # Restart the app to take effect
    $ adb shell am force-stop com.example.simpleperf.simpleperfexamplepurejava

On Android M devices, We need root privilege to force Android to fully compile
Java code into native instructions in ELF binaries with debug information. We
also need root privilege to read compiled native binaries (because installd
writes them to a directory whose uid/gid is system:install). So profiling Java
code can only be done on rooted devices.

    $ adb root
    $ adb shell setprop dalvik.vm.dex2oat-flags -g

    # Reinstall the app.
    $ adb install -r app/build/outputs/apk/app-profiling.apk

On Android L devices, we also need root privilege to compile the app with debug info
and access the native binaries.

    $ adb root
    $ adb shell setprop dalvik.vm.dex2oat-flags --include-debug-symbols

    # Reinstall the app.
    $ adb install -r app/build/outputs/apk/app-profiling.apk


**3. Find the app process**

    # Start the app if needed
    $ adb shell am start -n com.example.simpleperf.simpleperfexamplepurejava/.MainActivity

    # Run `ps` in the app's context. On Android >= O devicces, run `ps -e` instead.
    $ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava ps | grep simpleperf
    u0_a151   6885  3346  1590504 53980 SyS_epoll_ 6fc2024b6c S com.example.simpleperf.simpleperfexamplepurejava

So the id of the app process is `6885`. We will use this number in the command lines below,
please replace this number with what you get by running `ps` command.

**4. Download simpleperf to the app's data directory**

    # Find which architecture the app is using.
    $ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava cat /proc/6885/maps | grep boot.oat
    708e6000-70e33000 r--p 00000000 103:09 1214                              /system/framework/arm64/boot.oat

    # The app uses /arm64/boot.oat, so push simpleperf in bin/android/arm64/ to device.
    $ cd ../../scripts/
    $ adb push bin/android/arm64/simpleperf /data/local/tmp
    $ adb shell chmod a+x /data/local/tmp/simpleperf
    $ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava cp /data/local/tmp/simpleperf .


**5. Record perf.data**

    $ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava ./simpleperf record -p 6885 --duration 10
    simpleperf I 04-27 20:41:11  6940  6940 cmd_record.cpp:357] Samples recorded: 40008. Samples lost: 0.

    $ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava ls -lh perf.data
    simpleperf I 04-27 20:31:40  5999  5999 cmd_record.cpp:357] Samples recorded: 39949. Samples lost: 0.

The profiling data is recorded at perf.data.

Normally we need to use the app when profiling, otherwise we may record no samples.
But in this case, the MainActivity starts a busy thread. So we don't need to use
the app while profiling.

There are many options to record profiling data, check [record command](#simpleperf-record) for details.

**6. Report perf.data**

    # Pull perf.data on host.
    $ adb shell "run-as com.example.simpleperf.simpleperfexamplepurejava cat perf.data | tee /data/local/tmp/perf.data >/dev/null"
    $ adb pull /data/local/tmp/perf.data

    # Report samples using corresponding simpleperf executable on host.
    # On windows, use "bin\windows\x86_64\simpleperf" instead.
    $ bin/linux/x86_64/simpleperf report
    ...
    Overhead  Command   Pid   Tid   Shared Object                                                                     Symbol
    83.54%    Thread-2  6885  6900  /data/app/com.example.simpleperf.simpleperfexamplepurejava-2/oat/arm64/base.odex  void com.example.simpleperf.simpleperfexamplepurejava.MainActivity$1.run()
    16.11%    Thread-2  6885  6900  /data/app/com.example.simpleperf.simpleperfexamplepurejava-2/oat/arm64/base.odex  int com.example.simpleperf.simpleperfexamplepurejava.MainActivity$1.callFunction(int)

See [here](#the-correct-way-to-pull-perfdata-on-host) for why we use tee rather than just >.
There are many ways to show reports, check [report command](#simpleperf-report) for details.


### Record and report profiling data (using python scripts)

Besides command lines, We can use `app-profiler.py` to profile Android applications.
It downloads simpleperf on device, records perf.data, and collects profiling
results and native binaries on host. It is configured by `app-profiler.config`.

**1. Fill `app-profiler.config`**

    Change `app_package_name` line to  app_package_name="com.example.simpleperf.simpleperfexamplepurejava"
    Change `apk_file_path` line to apk_file_path = "../SimpleperfExamplePureJava/app/build/outputs/apk/app-profiling.apk"
    Change `android_studio_project_dir` line to android_studio_project_dir = "../SimpleperfExamplePureJava/"
    Change `record_options` line to record_options = "--duration 10"

`apk_file_path` is needed to fully compile the application on Android L/M. It is
not necessary on Android >= N.

`android_studio_project_dir` is used to search native libraries in the
application. It is not necessary for profiling.

`record_options` can be set to any option accepted by simpleperf record command.

**2. Run `app-profiler.py`**

    $ python app_profiler.py


If running successfully, it will collect profiling data in perf.data in current
directory, and related native binaries in binary_cache/.

**3. Report perf.data**

We can use `report.py` to report perf.data.

    $ python report.py

We can add any option accepted by `simpleperf report` command to `report.py`.


### Record and report call graph

A call graph is a tree showing function call relations. Below is an example.

    main() {
        FunctionOne();
        FunctionTwo();
    }
    FunctionOne() {
        FunctionTwo();
        FunctionThree();
    }
    callgraph:
        main-> FunctionOne
           |    |
           |    |-> FunctionTwo
           |    |-> FunctionThree
           |
           |-> FunctionTwo


#### Record dwarf based call graph

When using command lines, add `-g` option like below:

    $ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava ./simpleperf record -g -p 6685 --duration 10

When using python scripts, change `app-profiler.config` as below:

    Change `record_options` line to record_options = "--duration 10 -g"

Recording dwarf based call graph needs support of debug information
in native binaries. So if using native libraries in the application,
it is better to contain non-stripped native libraries in the apk.


#### Record stack frame based call graph

When using command lines, add `--call-graph fp` option like below:

    $ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava ./simpleperf record --call-graph fp -p 6685 --duration 10

When using python scripts, change `app-profiler.config` as below:

    Change `record_options` line to record_options = "--duration 10 --call-graph fp"

Recording stack frame based call graphs needs support of stack frame
register. Notice that on arm architecture, the stack frame register
is not well supported, even if compiled using -O0 -g -fno-omit-frame-pointer
options. It is because the kernel can't unwind user stack containing both
arm/thumb code. **So please consider using dwarf based call graph on arm
architecture, or profiling in arm64 environment.**


#### Report call graph

To report call graph using command lines, add `-g` option.

    $ bin/linux/x86_64/simpleperf report -g
    ...
    Children  Self    Command          Pid    Tid    Shared Object                                                                     Symbol
    99.97%    0.00%   Thread-2         10859  10876  /system/framework/arm64/boot.oat                                                  java.lang.Thread.run
       |
       -- java.lang.Thread.run
          |
           -- void com.example.simpleperf.simpleperfexamplepurejava.MainActivity$1.run()
               |--83.66%-- [hit in function]
               |
               |--16.22%-- int com.example.simpleperf.simpleperfexamplepurejava.MainActivity$1.callFunction(int)
               |    |--99.97%-- [hit in function]

To report call graph using python scripts, add `-g` option.

    $ python report.py -g
    # Double-click an item started with '+' to show its callgraph.

### Visualize profiling data

`simpleperf_report_lib.py` provides an interface reading samples from perf.data.
By using it, You can write python scripts to read perf.data or convert perf.data
to other formats. Below are two examples.


### Show flamegraph

    $ python report_sample.py >out.perf
    $ stackcollapse-perf.pl out.perf >out.folded
    $ ./flamegraph.pl out.folded >a.svg


### Visualize using pprof

pprof is a tool for visualization and analysis of profiling data. It can
be got from https://github.com/google/pprof. pprof_proto_generator.py can
generate profiling data in a format acceptable by pprof.

    $ python pprof_proto_generator.py
    $ pprof -pdf pprof.profile


### Annotate source code

`annotate.py` reads perf.data, binaries in `binary-cache` (collected by `app-profiler.py`)
and source code, and generates annoated source code in `annotated_files/`.

**1. Run annotate.py**

    $ python annotate.py -s ../SimpleperfExamplePureJava

`addr2line` is need to annotate source code. It can be found in Android ndk
release, in paths like toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64/bin/aarch64-linux-android-addr2line.
Please use `--addr2line` option to set the path of `addr2line` if annotate.py
can't find it.

**2. Read annotated code**

The annotated source code is located at `annotated_files/`.
`annotated_files/summary` shows how each source file is annotated.

One annotated source file is `annotated_files/java/com/example/simpleperf/simpleperfexamplepurejava/MainActivity.java`.
It's content is similar to below:

    // [file] shows how much time is spent in current file.
    /* [file] acc_p: 99.966552%, p: 99.837438% */package com.example.simpleperf.simpleperfexamplepurejava;
    ...
    // [func] shows how much time is spent in current function.
    /* [func] acc_p: 16.213395%, p: 16.209250% */            private int callFunction(int a) {
    ...
    // This shows how much time is spent in current line.
    // acc_p field means how much time is spent in current line and functions called by current line.
    // p field means how much time is spent just in current line.
    /* acc_p: 99.966552%, p: 83.628188%        */                    i = callFunction(i);


## Answers to common issues

### The correct way to pull perf.data on host
As perf.data is generated in app's context, it can't be pulled directly to host.
One way is to `adb shell run-as xxx cat perf.data >perf.data`. However, it
doesn't work well on Windows, because the content can be modified when it goes
through the pipe. So we first copy it from app's context to shell's context,
then pull it on host. The commands are as below:

    $adb shell "run-as xxx cat perf.data | tee /data/local/tmp/perf.data >/dev/null"
    $adb pull /data/local/tmp/perf.data

## Inferno

![logo](./inferno/inferno_small.png)

### Description

Inferno is a flamegraph generator for native (C/C++) Android apps. It was
originally written to profile and improve surfaceflinger performance
(Android compositor) but it can be used for any native Android application
. You can see a sample report generated with Inferno
[here](./inferno/report.html). Report are self-contained in HTML so they can be
exchanged easily.

Notice there is no concept of time in a flame graph since all callstack are
merged together. As a result, the width of a flamegraph represents 100% of
the number of samples and the height is related to the number of functions on
the stack when sampling occurred.


![flamegraph sample](./inferno/main_thread_flamegraph.png)

In the flamegraph featured above you can see the main thread of SurfaceFlinger.
It is immediatly apparent that most of the CPU time is spent processing messages
`android::SurfaceFlinger::onMessageReceived`. The most expensive task is to ask
 the screen to be refreshed as `android::DisplayDevice::prepare` shows in orange
. This graphic division helps to see what part of the program is costly and
where a developer's effort to improve performances should go.

### Example of bottleneck

A flamegraph give you instant vision on the CPU cycles cost centers but
it can also be used to find specific offenders. To find them, look for
plateaus. It is easier to see an example:

![flamegraph sample](./inferno/bottleneck.png)

In the previous flamegraph, two
plateaus (due to `android::BufferQueueCore::validateConsistencyLocked`)
are immediately apparent.

### How it works
Inferno relies on simpleperf to record the callstack of a native application
thousands of times per second. Simpleperf takes care of unwinding the stack
either using frame pointer (recommended) or dwarf. At the end of the recording
`simpleperf` also symbolize all IPs automatically. The record are aggregated and
dumps dumped to a file `perf.data`. This file is pulled from the Android device
and processed on the host by Inferno. The callstacks are merged together to
visualize in which part of an app the CPU cycles are spent.

### How to use it

Open a terminal and from `simpleperf` directory type:
```
./inferno.sh  (on Linux/Mac)
./inferno.bat (on Windows)
```

Inferno will collect data, process them and automatically open your web browser
to display the HTML report.

### Parameters

You can select how long to sample for, the color of the node and many other
things. Use `-h` to get a list of all supported parameters.

```
./inferno.sh -h
```

### Troubleshooting

#### Messy flame graph
A healthy flame graph features a single call site at its base
(see `inferno/report.html`).
If you don't see a unique call site like `_start` or `_start_thread` at the base
from which all flames originate, something went wrong. : Stack unwinding may
fail to reach the root callsite. These incomplete
callstack are impossible to merge properly. By default Inferno asks
 `simpleperf` to unwind the stack via the kernel and frame pointers. Try to
 perform unwinding with dwarf `-du`, you can further tune this setting.


#### No flames
If you see no flames at all or a mess of 1 level flame without a common base,
this may be because you compiled without frame pointers. Make sure there is no
` -fomit-frame-pointer` in your build config. Alternatively, ask simpleperf to
collect data with dward unwinding `-du`.



#### High percentage of lost samples

If simpleperf reports a lot of lost sample it is probably because you are
unwinding with `dwarf`. Dwarf unwinding involves copying the stack before it is
processed. Try to use frame pointer unwinding which can be done by the kernel
and it much faster.

The cost of frame pointer is negligible on arm64 parameter but considerable
 on arm 32-bit arch (due to register pressure). Use a 64-bit build for better
 profiling.

#### run-as: package not debuggable
If you cannot run as root, make sure the app is debuggable otherwise simpleperf
will not be able to profile it.