summaryrefslogtreecommitdiff
path: root/simpleperf/record.cpp
AgeCommit message (Collapse)Author
2019-08-08simpleperf: add AuxRecord and AUXTRACE feature section.Yabin Cui
In the kernel, each time generating a block of etm data, it also generates a PERF_RECORD_AUX record. An Aux record contains a timestamp showing when a block of etm data is generated. It can be used to synchronize etm data with other records (like mmap and comm records). So we want to parse etm data each time seeing an Aux record (as in dump cmd). It needs to know etm data locations in perf.data without reading the whole file. To fulfill that, this CL also adds AUXTRACE feature section, in the same format as in linux perf. Also dump AUX records and their corresponding etm data in dump cmd. Bug: 135204414 Test: run simpleperf_unit_test. Test: run `simpleperf record -e cs-etm xxx` && `perf report -D --stdio`. Change-Id: Ifae716a10fefe0f3d4822a0214384b40ada9da45
2019-07-29simpleperf: support recording etm data in perf.data.Yabin Cui
1. In event_fd.cpp, add functions to create aux buffer and read etm data. 2. In record.h, Add AuxTraceRecord. 2. In RecordReadThread.cpp, wrap etm data into AuxTraceRecords. 3. Add logic to read and write AuxTraceRecords in perf.data. 4. Show recorded etm data size after recording. Bug: 135204414 Test: run simpleperf_unit_test. Change-Id: I3b20fe8f3c786f130f38e34962ca9f86a31fc584
2019-07-19simpleperf: add PERF_RECORD_AUXTRACE_INFO record.Yabin Cui
PERF_RECORD_AUXTRACE_INFO is used to record etm configurations on device. Make its content the same as in linux perf. Bug: 135204414 Test: run simpleperf_unit_test. Change-Id: I5d32cbe22acbc690d2ba47473ff344241982a0c3
2018-10-05Add noexcept to move constructors and assignment operators.Chih-Hung Hsieh
Bug: 116614593 Test: build with WITH_TIDY=1 Change-Id: I92ae7348e51e0610cef54b80087189bb1fe5b12c
2018-07-17simpleperf: record complete process name.Yabin Cui
The kernel limits length of process name to 15. But many app processes have name with length longer than 15. This patch records complete process name: When receiving a comm record for a process, try reading complete name from /proc/pid/cmdline and storing the name in the comm record. Bug: none Test: run simpleperf manually, it shows complete app name in reports. Test: run simpleperf_unit_test. Change-Id: Id29f2a2522ef5d2949828450be2d9d2f508a328d
2018-07-16simpleperf: prefer to use mmap2 records instead of mmap records.Yabin Cui
When doing system wide recording, simpleperf spents a lot of time in calling ApkInspector::FindElfInApkByOffset() in UpdateMmapRecordForEmbeddedPath() in cmd_record.cpp (about 35% main thread time). This is because app processes map many files in the apk into the memory. Actually we only need to change the maps of executable files in the apk. However, mmap record doesn't show whether the mapped file is executable. So switch to mmap2 record when possible. mmap2 record is supported starting from kernel 3.16, and dumps more info than mmap record, like the protect flag of the map. After switching to mmap2 records, the cost of calling ApkInspector::FindElfInApkByOffset() is decreased to about 1% main thread time. Also switch to dump mmap and comm records only for the first event type. Also avoid using IsRegularFile() in IsMappingOnlyExistInMemory, which saves about 3% main thread time. Also add a test to request mmap2 record support in Q. Bug: none Test: run simpleperf for system wide recording manually. Test: run simpleperf_unit_test. Change-Id: Ib0f42f509cb10b3242503d54d048f9c90885affa
2018-07-11simpleperf: fix an abort caused by ip zero in kernel callchain.Yabin Cui
When the kernel fails to unwind a kernel callchain, it may generate zero ip address. RecordFileReader::ReadRecord() removes zero ips in kernel callchain by adjusting r->callchain_data.ip_nr. However, it will make a check in SampleRecord::BuildBinaryWithNewCallChain() abort. This patch fixes it by moving the logic of erasing zero ip addresses to SampleRecord::AdjustCallChainGeneratedByKernel(), which replaces zero ip address with a context value, which will not be shown to user. Also change SampleRecord::ExcludeKernelCallChain() to support consecutive context values in callchain, which may be generated by SampleRecord::AdjustCallChainGeneratedByKernel(). Bug: none Test: run simpleperf_unit_test Change-Id: I85e5bfc4bf2bfddfbd2925748fa89d6e28d69ffc
2018-06-29simpleperf: remove some deprecated code.Yabin Cui
Remove perf_clock.cpp. perf_clock.cpp was used to drop samples generated while dumping thread maps in system wide recording. But that strategy isn't useful and has been removed. Remove SampleRecord::RemoveInvalidStackData(). Simplify SampleRecord::GetValidStackSize(). Bug: 110174247 Test: run simpleperf_unit_test. Change-Id: Ie934c8ecf5d57147b163e490368e71716da45258
2018-06-27simpleperf: add record read thread.Yabin Cui
The change is to reduce sample lost rate when recording dwarf-based callgraph. It includes below changes: 1. Add RecordBuffer class to store record data. 2. Add RecordReadThread to create a separate high priority thread reading records from kernel buffer to a RecordBuffer. 3. Cut stack data in sample records when free space in record buffer is below low level. 4. Drop sample records when free space in record buffer is below critical level. 5. Use different record buffer sizes for system wide profiling and non system wide profiling. 6. Refactor code replacing regs and stack data to callchains in SampleRecord. On walleye, set cpu percentage for profiling to 50: $ ./old_simpleperf record -a -g --duration 30 --log debug simpleperf I cmd_record.cpp:545] Samples recorded: 80524. Samples lost: 22993. $ ./new_simpleperf record -a -g --duration 30 --log debug simpleperf I cmd_record.cpp:555] Samples recorded: 99776. Samples lost: 0. Bug: 110174247 Test: run simpleperf_unit_test. Test: run simpleperf manually. Change-Id: I10c8a090abc36e9feb712357cbb20a20b205af14
2018-06-18simpleperf: remove RecordCache.Yabin Cui
When using debug-unwind cmd for system wide profiling result, I found it might took a lot of memory because RecordCache cached too many samples. Since simpleperf no longer relies on RecordCache to sort records, I think it is fine to remove RecordCache instead of fixing it. Bug: none. Test: run simpleperf_unit_test. Change-Id: Ie28ce17b4158add455004a56bbdac745f9d05f19
2018-04-13simpleperf: remove Java interpreter frames by default.Yabin Cui
In the callchains of interpreted Java code, two Java frames are separated by several interpreter frames, which make it harder for users to find Java frames. So this patch removes Java interpreter frames by default in report-sample command output and report_lib_interface. But it also provides the ability to show Java interpreter frames via --show-art-frames option. Bug: http://b/73126888 Test: run simpleperf_unit_test. Test: run test.py. Change-Id: I9a89e2f6679dc1455df8c669628fce198ae7d576
2018-03-23Merge "simpleperf: support profiling jited java code."Yabin Cui
2018-03-21simpleperf: add SIMPLE_PERF_RECORD_TRACE_DATA record type.Yabin Cui
PERF_RECORD_TRACE_DATA can't have record size >= 64K. This causes some aborts when recording tracepoint events, as in https://github.com/android-ndk/ndk/issues/493. So fix this by adding a custom type SIMPLE_PERF_RECORD_TRACING_DATA, which isn't limited by 64K. Also fix an error parsing formats of tracepoing events. Bug: http://b/75278602 Test: run simpleperf_unit_test. Change-Id: Ib5ebd0b6f981b00c2a256d17cdfd0e725d75a272
2018-03-21simpleperf: support profiling jited java code.Yabin Cui
1. For each jit symfile, generate a Mmap2Record with a special flag PROT_JIT_SYMFILE_MAP. 2. Call ReadMmapEventData() before dumping jit Mmap2Records, to keep the order of samples and mmap records. 3. Handle finding symbols from maps with PROT_JIT_SYMFILE_MAP flags. 4. Pass PROT_JIT_SYMFILE_MAP flag to libunwindstack, to unwind through jited methods. Bug: http://b/73127105 Test: run simpleperf manually. Test: run simpleperf_unit_test. Change-Id: I2b2f77ff457f7eb2f10193e987a181e4791a29ee
2018-01-31simpleperf: Switch to use post-unwind by default in record cmd.Yabin Cui
When recording google.sample.tunnel app for 30s: It took 3s to unwind samples and write unwound samples to file. It took 0.3s to write samples containing stack/reg data to file. The result shows recording with post unwinding consumes much less time than unwinding samples immediately. This means we can record with higher freq and get smaller lose rate when using post unwinding. So make below changes: 1. Make post unwinding by default. 2. Replace --post-unwind with --no-post-unwind option. 3. Make --trace-offcpu and callchain joiner work with post unwinding. 4. Remove special operations in --log debug mode. Those will be supported in a new command. Bug: http://b/72556486 Test: run simpleperf_unit_test. Test: run python test.py. Change-Id: I9a5a5defda9d040985e674c43db19ee68e7aa305
2018-01-05simpleperf: improve unwinding result report.Yabin Cui
1. Add stack range in unwinding result. 2. Add option to omit callchains fixed by callchain joiner. Bug: http://b/69383534 Test: manually. Change-Id: I9672061a8972ac79c321fc5d5e63950369c63e9c
2017-12-18simpleperf: improve recording unwinding results.Yabin Cui
1. Add MAP_MISSING result type. 2. Pass --log option to the simpleperf process running in app's context, in order to dump unwinding results when profiling android apps. Bug: http://b/69383534 Test: run simpleperf_unit_test. Test: run simpleperf manually. Change-Id: I72173060a5808e5ffb7318640509cabe53395063
2017-12-15simpleperf: report unwinding failures.Yabin Cui
1. When --log debug is used, store unwinding results in UnwindingResultRecords in perf.data. 2. Use unwinding_result_reporter.py to report unwinding results. This is to help finding different unwinding failures. Bug: http://b/69383534 Test: run simpleperf_unit_test. Test: run unwinding_result_reporter.py manually. Change-Id: I6d7f107e9758b1ec55ed35b49657bb41d47e2178
2017-12-12simpleperf: Use CallChainJoiner.Yabin Cui
1. In record cmd, split most code in Run() into three functions to make it easier to maintain. 2. In record cmd, use CallChainJoiner by default when -g option is used. And allow using --no-callchain-joiner option to disable the joiner, and --callchain-joiner-min-matching-nodes to adjust the joiner. 3. Adjust the interface of UnwindCallChain() to return sps used by the joiner. 4. Add functions in SampleRecord to use callchains returned by the joiner. Add CallChainRecord to keep callchains returned by the joiner for debugging. 5. In dump cmd, show callchains of SampleRecord and CallChainRecord for debugging. Bug: http://b/69383534 Test: run simpleperf_unit_test. Test: run python test.py. Change-Id: I951b169dfba0f7c50b6d4d741df83f02f8010626
2017-11-02simpleperf: fix callchains generated by the kernel.Yabin Cui
The kernel stores return addrs in the callchain, but we want the addrs of call instructions along the callchain. So adjust callchains generated by the kernel. Also avoid using const_cast<> in record.cpp by constructing Record classes with non const buffers. Bug: None. Test: `python report_html.py --add_disassembly`. Test: run simpleperf_unit_test. Change-Id: I8c5f369e333ec9bc96cf5b5166ac670c3e3b5c62
2017-07-24simpleperf: exclude kernel callchains when needed.Yabin Cui
Exclude kernel callchains when users only monitor events in user space. After this change, when users use `record -e cpu-cycles:u --trace-offcpu`, the samples of the implicitly added sched:sched_switch event won't contain any kernel callchain. Bug: http://b/37572306 Test: run simpleperf_unit_test. Change-Id: Iffcb61bac796e734825e68f847f24b4006b44360
2017-02-23simpleperf: generate one report for each event attr.Yabin Cui
Bug: http://b/35475170 Test: run simpleperf_unit_test. Test: run report.py. Change-Id: Ie9329a64c701bce38f7b440c16cb47e99e83db45
2017-02-03simpleperf: add inplace-sampler event type.Yabin Cui
Add inplace-sampler event type, so it can be used in record/list command. This cl doesn't add code for communicating with profiled process, and fake records in InplaceSamplerClient.cpp for testing purpose. Refactor runtest.py to test inplace-sampler profiling. Bug: http://b/30974760 Test: run runtest.py --inplace-sampler. Change-Id: I92d8b03583c58b3589207f5c655e03853899be3a
2016-10-26simpleperf: merge records from different buffers in memory.Yabin Cui
By reading records from all buffers at once, we can merge records in memory instead of sorting them in perf.data. To make it clear, this patch only contains the code to merge records in memory, and I will remove old method later. Bug: http://b/32343227 Test: run simpleperf_unit_test. Test: run simpleperf_runtest.py. Change-Id: Iea2da06c072243c2014f43c8aa6d96a23cfb9123
2016-10-20simpleperf: support building sample record manually.Yabin Cui
And other small changes: add time when building comm record. move some Move*BinaryFormat to utils.h. Handle wrong symbol whoes address can be ULLONG_MAX. Bug: http://b/30974760 Test: simpleperf_unit_test. Change-Id: I2956d3c4b781c580fe93a6e5b77e0469f7f4f43f
2016-08-30simpleperf: fix two errors.Yabin Cui
Fix two errors when reporting perf.data generated by linux perf. And add corresponding tests. Test: run simpleperf_unit_test. Change-Id: I04dd88461fdd6a85763847570bac16db1ccb81fa
2016-08-26simpleperf: support hotplug events in record cmd.Yabin Cui
1. When a cpu is down, read records from event files on that cpu, then close those event files. 2. When a cpu is up, open event files on that cpu, and create mapped buffer for those event files to dump records. 3. Instead of creating a mapped buffer for each event type on each cpu, we can just create a mapped buffer for all event types on each cpu. 4. When new event files are created, store a EventIdRecord record in perf.data to notify record_file_reader.cpp. Bug: http://b/29245608 Test: run simpleperf record cmd and make cpu offline and online. Test: run simpleperf_unit_test. Change-Id: Ib97a24b6292fa143e9b35cb105bdddf1e826d60a
2016-08-05simpleperf: reduce Record construction overhead while recording.Yabin Cui
Avoid binary allocation and memory copy in ReadRecordsFromBuffer(), thus reduce Record construction overhead in EventSelectionSet::ReadMmapEventDataForFd(). Remove RecordCache used while recording. Replace it with RecordFileWriter::SortDataSection(). For unwinding while recording, use low watermark to make records almost sorted when dumped from the kernel. Bug: 30649868 Test: run simpleperf_unit_test. Change-Id: Ie5fb942046900a5960b3c990cf4177c026eaadfb
2016-08-04simpleperf: keep binary in class Record.Yabin Cui
It removes memory copy and heap allocation/deallocation in Record::BinaryFormat(), and is a preparation to remove memory copy and heap allocation in Record constructor. Bug: 30649868 Test: run simpleperf_unit_test. Change-Id: Ic8dd80e43f7b547a9beaf896d726b56aeb5d55a2
2016-07-11simpleperf: add min_vaddr in DsoRecord.Yabin Cui
Min virtual address of a shared library is needed when mapping ip addresses to function symbols. So we should dump it in DsoRecord. Bug: 28114205 Test: run simpleperf_unit_test. Change-Id: Ib986ee598281cf60caa3a2c5408100b9e7678143
2016-07-06simpleperf: fix RecordCache.Yabin Cui
RecordCache::Push(vector<..>) doesn't update last_time_, this makes RecordCache don't pop any record before PopAll(). Bug: 29581559 Change-Id: Icea806346b7ad812e606eaf05747797b766ebd71 Test: run simpleperf_unit_test.
2016-06-23Simpleperf: Add SPLIT and SPLIT_END records to handle big records.Yabin Cui
Previously we split KernelSymbolRecord because it is > 65535. Then I found TracingDataRecord can also be > 65535. So it is better to handle big records when reading and writing perf.data. record_file_writer.cpp splits a big record into multiple SPLIT records followed by a SPLIT_END record, and record_file_reader.cpp restores the big record when reading SPLIT and SPLIT_END records. Also Add RecordHeader to represent record having size > 65535. Bug: 29581559 Change-Id: I0b4556988f77b3431c7f1a28fce65cf225d6a067 Test: run simpleperf_unit_test.
2016-06-22Simpleperf: report lost record count and warn if 10% records are lost.Yabin Cui
Bug: 29126335 Change-Id: Id4a5b51120389387ec3ab45ea9ad9a276aa6ce2a Test: run simpleperf with high -f option and check the lost record warning.
2016-06-15simpleperf: replace SIMPLEPERF_ALIGN macro with Align inline function.Yabin Cui
Change-Id: Id9e9e67174ab3f857eb2baa9609351b60586b8dd
2016-06-13simpleperf: fix mac build.Yabin Cui
Bug: 28114205 Change-Id: I84ad011b10c19e07576b718ba4b6b6c52a823366
2016-06-03simpleperf: dump tracing data when needed.Yabin Cui
When monitoring tracepoint events, dump tracing data to perf.data can enable reporting on a different machine. Bug: 27403614 Change-Id: Ie1af624717a245cacbeb44b4c1bcd499fc9ad8db
2016-06-02simpleperf: add report-sample command.Yabin Cui
1. Add report-sample command to report each sample with symbol information. 2. Add --dump-symbols option to record command to collect dso and symbol information in perf.data. Bug: 28114205 Change-Id: I37424ee6abd74a21ad41cd3b6c4249cf0625c201
2016-05-31simpleperf: record kernel symbols in perf.data.Yabin Cui
To better support kernel profiling, record kernel symbols in perf.data when necessary. An option --no-dump-kernel-symbols is added in record command to always avoid recording kernel symbols. The way to handle all zero /proc/modules and /proc/kallsyms is improved. Add Better support in finding symbols for kernel modules. Bug: 27403614 Change-Id: I470151c54f8a45ad1c101c1b94490e33d7fd7485
2016-04-05simpleperf: support reporting more than one event type.Yabin Cui
When sampling kernel trace points, it is like to sample more than one even type. Like `simpleperf record -e kmem:mm_page_alloc,kmem:mm_page_free`. 1. change record command to dump event_id for all records. 2. change report command and record reader to support multiple event attrs. 3. hide record_cache inside EventSelectionSet. 4. add test to report multiple event types. Bug: 27403614 Change-Id: Ic22a5527d68e7a843e3cf95e85381f8ad6bcb196
2016-03-18Simpleperf: remove dependency on global current_arch.Yabin Cui
When running unit tests on arm64 devices: [OK] ReportCommandTest.dwarf_callgraph [OK] record_cmd.dwarf_callchain_sampling. ERROR: can't unwind data recorded on a different architecture. It is because ReportCommandtest.dwarf_callgraph opens a perf.data recorded on x86_64, and changes current_arch. It causes a problem when the test record_cmd.dwarf_callchain_sampling calls libbacktrace built on aarch64. Athough it doesn't make the test fail, we should fix this. Change-Id: I2cd70369a769ef2199cab2302b8b824369be0907
2016-03-01simpleperf: fix analyzer warning.Yabin Cui
Bug: 27432175 Change-Id: If0e8bc724cf659508726215d515d3df30cbebe6b
2016-02-25simpleperf: port cmd_report_test to nonlinux.Yabin Cui
And fix one build_id bug introduced by previous patch. Bug: 26962895 Change-Id: Ibb8bd6ec77ee862bb01c26342d3b3024468e75b2
2016-02-17simpleperf: report symbols of native libraries in apk file.Yabin Cui
Changes included: 1. provide interface in read_apk.h to read build id and symbols. 2. report symbols of native libraries in apk file. 3. refactor code in read_elf.cpp and read_apk.cpp. 4. add verbose log. 5. add -o report_file_name option for report command. 6. add corresponding unit tests. Bug: 26962895 Change-Id: I0d5398996e0c29dba4a6f5226692b758ca096bbd
2016-02-02Support profiling of shared libs embedded in APKs.Than McIntosh
Some APKs contain shared libraries that the linker handles by mmap'ing directly from their APKs (if the library is uncompressed and the proper manifest flag is set). With this patch simpleperf now breaks out samples on a per-li basis and reports the name of the lib within the APK. Example output: Cmdline: /system/xbin/simpleperf record -a sleep 30 Samples: 140672 of event 'cpu-cycles' Event count: 84111474884 Overhead Command Pid Tid Shared Object 90.22% b_open_from_apk 19066 19066 /data/app/com.android.frameworks.coretests.install_jni_lib_open_from_apk-2/base.apk!lib/armeabi-v7a/libgcdstuff.so 4.85% b_open_from_apk 19066 19066 /data/app/com.android.frameworks.coretests.install_jni_lib_open_from_apk-2/base.apk!lib/armeabi-v7a/libframeworks_coretests_jni.so 1.19% simpleperf 19085 19085 /system/lib/libc.so ... Bug: 22560619 Change-Id: I1e0f2e155e03b33935eac24e104c3fd7b9a7e33c
2016-01-11Simpleperf: adjust sort strategy in RecordCache.Yabin Cui
In order to report correctly, We should keep the order of self created records when reading perf.data. So adjust sort strategy in RecordCache to avoid reordering it. Bug: 26214604 Change-Id: I40812ee5f4f6051103d40459edf4b4a2d7a80313
2015-12-04Track rename from base/ to android-base/.Elliott Hughes
Change-Id: Ic15d4778c7accd1382de0b440a437aba2cf67016
2015-10-23Simpleperf: Don't load whole perf.data into memory.Yabin Cui
perf.data can be too large to be loaded into memory. To avoid this, use fread() instead of mmap() to read perf.data, and always use RecordCache to sort records. Fix unit tests failure caused by previous change. Bug: 25194400 Change-Id: If29dc0bb0ed992ba34202c2cb1a204a1d9123b7a
2015-10-12Simpleperf: do stack unwinding while recording.Yabin Cui
Dumping user's stack consumes lots of disk space, which makes long period recording impossible. This patch does stack unwinding before writing to perf.data, so it doesn't need to save user's stack. Previous behavior is still supported with --post-unwind option. A record cache is used for online record processing. Bug: 22229391 Change-Id: Idcc6ec46924fff3fcc8c165d62f8af875b173cd4
2015-10-02Simpleperf: do dwarf unwinding in record command.Yabin Cui
As libbacktrace only supports unwinding for the same architecture it is running on, simpleperf report command running on host can't unwind perf.data collected on device. So we'd better do unwinding work in record command on device. Bug: 22229391 Change-Id: I085ca074ea83dab79f08563523bdbc7a36650a64
2015-08-19Simpleperf: add raw data in sample records for tracepoint events.Yabin Cui
tracepoint events store tracing info in raw data in sample records. And we need to enable it in sample_type. Change-Id: Icd866059f4703b56724845d7526ae58099e83113