summaryrefslogtreecommitdiff
path: root/doc/collect_etm_data_for_autofdo.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/collect_etm_data_for_autofdo.md')
-rw-r--r--doc/collect_etm_data_for_autofdo.md121
1 files changed, 113 insertions, 8 deletions
diff --git a/doc/collect_etm_data_for_autofdo.md b/doc/collect_etm_data_for_autofdo.md
index f9f5a15..5313c4e 100644
--- a/doc/collect_etm_data_for_autofdo.md
+++ b/doc/collect_etm_data_for_autofdo.md
@@ -81,9 +81,18 @@ branch with count info for binary2
We need to split perf_inject.data, and make sure one file only contains info for one binary.
-Then we can use [AutoFDO](https://github.com/google/autofdo) to create profile like below:
+Then we can use [AutoFDO](https://github.com/google/autofdo) to create profile. AutoFDO only works
+for binaries having an executable segment as its first loadable segment. But binaries built in
+Android may not follow this rule. Simpleperf inject command knows how to work around this problem.
+But there is a check in AutoFDO forcing binaries to start with an executable segment. We need to
+disable the check in AutoFDO, by commenting out L127-L136 in
+https://github.com/google/autofdo/commit/188db2834ce74762ed17108ca344916994640708#diff-2d132ecbb5e4f13e0da65419f6d1759dd27d6b696786dd7096c0c34d499b1710R127-R136.
+Then we can use `create_llvm_prof` in AutoFDO to create profiles used by clang.
```sh
+# perf_inject_binary1.data is split from perf_inject.data, and only contains branch info for binary1.
+host $ autofdo/create_llvm_prof -profile perf_inject_binary1.data -profiler text -binary path_of_binary1 -out a.prof -format binary
+
# perf_inject_kernel.data is split from perf_inject.data, and only contains branch info for [kernel.kallsyms].
host $ autofdo/create_llvm_prof -profile perf_inject_kernel.data -profiler text -binary vmlinux -out a.prof -format binary
```
@@ -91,10 +100,81 @@ host $ autofdo/create_llvm_prof -profile perf_inject_kernel.data -profiler text
Then we can use a.prof for PGO during compilation, via `-fprofile-sample-use=a.prof`.
[Here](https://clang.llvm.org/docs/UsersManual.html#using-sampling-profilers) are more details.
+### A complete example: etm_test_loop.cpp
+
+`etm_test_loop.cpp` is an example to show the complete process.
+The source code is in [etm_test_loop.cpp](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/runtest/etm_test_loop.cpp).
+The build script is in [Android.bp](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/runtest/Android.bp).
+It builds an executable called `etm_test_loop`, which runs on device.
+
+Step 1: Build `etm_test_loop` binary.
+
+```sh
+(host) <AOSP>$ . build/envsetup.sh
+(host) <AOSP>$ lunch aosp_arm64-userdebug
+(host) <AOSP>$ make etm_test_loop
+```
+
+Step 2: Run `etm_test_loop` on device, and collect ETM data for its running.
+
+```sh
+(host) <AOSP>$ adb push out/target/product/generic_arm64/system/bin/etm_test_loop /data/local/tmp
+(host) <AOSP>$ adb root
+(host) <AOSP>$ adb shell
+(device) / # cd /data/local/tmp
+(device) /data/local/tmp # chmod a+x etm_test_loop
+(device) /data/local/tmp # simpleperf record -e cs-etm:u ./etm_test_loop
+simpleperf I cmd_record.cpp:729] Recorded for 0.0370068 seconds. Start post processing.
+simpleperf I cmd_record.cpp:799] Aux data traced: 1689136
+(device) /data/local/tmp # simpleperf inject -i perf.data --output branch-list -o branch_list.data
+simpleperf W dso.cpp:557] failed to read min virtual address of [vdso]: File not found
+(device) /data/local/tmp # exit
+(host) <AOSP>$ adb pull /data/local/tmp/branch_list.data
+```
+
+Step 3: Convert ETM data to AutoFDO data.
+
+```sh
+# Build simpleperf tool on host.
+(host) <AOSP>$ make simpleperf_ndk
+(host) <AOSP>$ simpleperf_ndk64 inject -i branch_list.data -o perf_inject_etm_test_loop.data --symdir out/target/product/generic_arm64/symbols/system/bin
+simpleperf W cmd_inject.cpp:505] failed to build instr ranges for binary [vdso]: File not found
+(host) <AOSP>$ cat perf_inject_etm_test_loop.data
+13
+1000-1010:1
+1014-1050:1
+...
+112c->0:1
+// /data/local/tmp/etm_test_loop
+
+(host) <AOSP>$ create_llvm_prof -profile perf_inject_etm_test_loop.data -profiler text -binary out/target/product/generic_arm64/symbols/system/bin/etm_test_loop -out etm_test_loop.afdo -format binary
+(host) <AOSP>$ ls -lh etm_test_loop.afdo
+rw-r--r-- 1 user group 241 Aug 29 16:04 etm_test_loop.afdo
+```
+
+Step 4: Use AutoFDO data to build optimized binary.
+
+```sh
+(host) <AOSP>$ mkdir toolchain/pgo-profiles/sampling/
+(host) <AOSP>$ cp etm_test_loop.afdo toolchain/pgo-profiles/sampling/
+(host) <AOSP>$ vi system/extras/simpleperf/runtest/Android.bp
+# edit Android.bp to enable afdo for etm_test_loop.
+# cc_binary {
+# name: "etm_test_loop",
+# srcs: ["etm_test_loop.cpp"],
+# afdo: true,
+# }
+(host) <AOSP>$ make etm_test_loop
+```
+
+If comparing the disassembly of `out/target/product/generic_arm64/symbols/system/bin/etm_test_loop`
+before and after optimizing with AutoFDO data, we can see different preferences when branching.
+
+
## Collect ETM data with a daemon
Android also has a daemon collecting ETM data periodically. It only runs on userdebug and eng
-devices. The source code is in `<aosp-top>/system/extras/profcollectd`.
+devices. The source code is in https://android.googlesource.com/platform/system/extras/+/master/profcollectd/.
## Support ETM in the kernel
@@ -107,21 +187,46 @@ The Coresight driver can be enabled by below kernel configs:
CONFIG_CORESIGHT=y
CONFIG_CORESIGHT_LINK_AND_SINK_TMC=y
CONFIG_CORESIGHT_SOURCE_ETM4X=y
- CONFIG_CORESIGHT_DYNAMIC_REPLICATOR=y
```
-On Kernel 5.10+, we can build Coresight driver as kernel modules instead.
+On Kernel 5.10+, we recommend building Coresight driver as kernel modules. Because it works with
+GKI kernel.
+
+```config
+ CONFIG_CORESIGHT=m
+ CONFIG_CORESIGHT_LINK_AND_SINK_TMC=m
+ CONFIG_CORESIGHT_SOURCE_ETM4X=m
+```
-Android common kernel 5.10+ should have all the Coresight patches needed. And we have backported
-necessary Coresight patches to Android common kernel 4.14 and 4.19. Android common kernel 5.4
-misses a few patches. Please create an [ndk issue](https://github.com/android/ndk/issues) if you
-need ETM function on 5.4 kernel.
+Android common kernel 5.10+ should have all the Coresight patches needed to collect ETM data.
+Android common kernel 5.4 misses two patches. But by adding patches in
+https://android-review.googlesource.com/q/topic:test_etm_on_hikey960_5.4, we can collect ETM data
+on hikey960 with 5.4 kernel.
+For Android common kernel 4.14 and 4.19, we have backported all necessary Coresight patches.
Besides Coresight driver, we also need to add Coresight devices in device tree. An example is in
https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/arm/juno-base.dtsi. There should
be a path flowing ETM data from ETM device through funnels, ETF and replicators, all the way to
ETR, which writes ETM data to system memory.
+One optional flag in ETM device tree is "arm,coresight-loses-context-with-cpu". It saves ETM
+registers when a CPU enters low power state. It may be needed to avoid
+"coresight_disclaim_device_unlocked" warning when doing system wide collection.
+
+One optional flag in ETR device tree is "arm,scatter-gather". Simpleperf requests 4M system memory
+for ETR to store ETM data. Without IOMMU, the memory needs to be contiguous. If the kernel can't
+fulfill the request, simpleperf will report out of memory error. Fortunately, we can use
+"arm,scatter-gather" flag to let ETR run in scatter gather mode, which uses non-contiguous memory.
+
+
+### A possible problem: trace_id mismatch
+
+Each CPU has an ETM device, which has a unique trace_id assigned from the kernel.
+The formula is: `trace_id = 0x10 + cpu * 2`, as in https://github.com/torvalds/linux/blob/master/include/linux/coresight-pmu.h#L37.
+If the formula is modified by local patches, then simpleperf inject command can't parse ETM data
+properly and is likely to give empty output.
+
+
## Enable ETM in the bootloader
Unless ARMv8.4 Self-hosted Trace extension is implemented, ETM is considered as an external debug