Age | Commit message (Collapse) | Author |
|
In the kernel, each time generating a block of etm data, it also
generates a PERF_RECORD_AUX record. An Aux record contains a timestamp
showing when a block of etm data is generated. It can be used to
synchronize etm data with other records (like mmap and comm records).
So we want to parse etm data each time seeing an Aux record (as in dump
cmd). It needs to know etm data locations in perf.data without reading
the whole file. To fulfill that, this CL also adds AUXTRACE feature section,
in the same format as in linux perf.
Also dump AUX records and their corresponding etm data in dump cmd.
Bug: 135204414
Test: run simpleperf_unit_test.
Test: run `simpleperf record -e cs-etm xxx` && `perf report -D --stdio`.
Change-Id: Ifae716a10fefe0f3d4822a0214384b40ada9da45
|
|
1. In event_fd.cpp, add functions to create aux buffer and read etm
data.
2. In record.h, Add AuxTraceRecord.
2. In RecordReadThread.cpp, wrap etm data into AuxTraceRecords.
3. Add logic to read and write AuxTraceRecords in perf.data.
4. Show recorded etm data size after recording.
Bug: 135204414
Test: run simpleperf_unit_test.
Change-Id: I3b20fe8f3c786f130f38e34962ca9f86a31fc584
|
|
PERF_RECORD_AUXTRACE_INFO is used to record etm configurations on device.
Make its content the same as in linux perf.
Bug: 135204414
Test: run simpleperf_unit_test.
Change-Id: I5d32cbe22acbc690d2ba47473ff344241982a0c3
|
|
Bug: 116614593
Test: build with WITH_TIDY=1
Change-Id: I92ae7348e51e0610cef54b80087189bb1fe5b12c
|
|
The kernel limits length of process name to 15. But many app processes
have name with length longer than 15. This patch records complete
process name: When receiving a comm record for a process, try reading
complete name from /proc/pid/cmdline and storing the name in the comm
record.
Bug: none
Test: run simpleperf manually, it shows complete app name in reports.
Test: run simpleperf_unit_test.
Change-Id: Id29f2a2522ef5d2949828450be2d9d2f508a328d
|
|
When doing system wide recording, simpleperf spents a lot of time
in calling ApkInspector::FindElfInApkByOffset() in
UpdateMmapRecordForEmbeddedPath() in cmd_record.cpp (about 35%
main thread time). This is because app processes map many files
in the apk into the memory.
Actually we only need to change the maps of executable files in the
apk. However, mmap record doesn't show whether the mapped file is
executable. So switch to mmap2 record when possible. mmap2 record is
supported starting from kernel 3.16, and dumps more info than mmap
record, like the protect flag of the map. After switching to mmap2
records, the cost of calling ApkInspector::FindElfInApkByOffset()
is decreased to about 1% main thread time.
Also switch to dump mmap and comm records only for the first event
type.
Also avoid using IsRegularFile() in IsMappingOnlyExistInMemory, which
saves about 3% main thread time.
Also add a test to request mmap2 record support in Q.
Bug: none
Test: run simpleperf for system wide recording manually.
Test: run simpleperf_unit_test.
Change-Id: Ib0f42f509cb10b3242503d54d048f9c90885affa
|
|
When the kernel fails to unwind a kernel callchain, it may generate
zero ip address. RecordFileReader::ReadRecord() removes zero ips in
kernel callchain by adjusting r->callchain_data.ip_nr. However, it
will make a check in SampleRecord::BuildBinaryWithNewCallChain() abort.
This patch fixes it by moving the logic of erasing zero ip addresses
to SampleRecord::AdjustCallChainGeneratedByKernel(), which replaces
zero ip address with a context value, which will not be shown to user.
Also change SampleRecord::ExcludeKernelCallChain() to support
consecutive context values in callchain, which may be generated by
SampleRecord::AdjustCallChainGeneratedByKernel().
Bug: none
Test: run simpleperf_unit_test
Change-Id: I85e5bfc4bf2bfddfbd2925748fa89d6e28d69ffc
|
|
Remove perf_clock.cpp. perf_clock.cpp was used to drop samples generated
while dumping thread maps in system wide recording. But that strategy isn't
useful and has been removed.
Remove SampleRecord::RemoveInvalidStackData().
Simplify SampleRecord::GetValidStackSize().
Bug: 110174247
Test: run simpleperf_unit_test.
Change-Id: Ie934c8ecf5d57147b163e490368e71716da45258
|
|
The change is to reduce sample lost rate when recording
dwarf-based callgraph.
It includes below changes:
1. Add RecordBuffer class to store record data.
2. Add RecordReadThread to create a separate high priority
thread reading records from kernel buffer to a RecordBuffer.
3. Cut stack data in sample records when free space in
record buffer is below low level.
4. Drop sample records when free space in record buffer is
below critical level.
5. Use different record buffer sizes for system wide profiling
and non system wide profiling.
6. Refactor code replacing regs and stack data to callchains
in SampleRecord.
On walleye, set cpu percentage for profiling to 50:
$ ./old_simpleperf record -a -g --duration 30 --log debug
simpleperf I cmd_record.cpp:545] Samples recorded: 80524. Samples lost: 22993.
$ ./new_simpleperf record -a -g --duration 30 --log debug
simpleperf I cmd_record.cpp:555] Samples recorded: 99776. Samples lost: 0.
Bug: 110174247
Test: run simpleperf_unit_test.
Test: run simpleperf manually.
Change-Id: I10c8a090abc36e9feb712357cbb20a20b205af14
|
|
When using debug-unwind cmd for system wide profiling result, I
found it might took a lot of memory because RecordCache cached
too many samples. Since simpleperf no longer relies on RecordCache
to sort records, I think it is fine to remove RecordCache instead
of fixing it.
Bug: none.
Test: run simpleperf_unit_test.
Change-Id: Ie28ce17b4158add455004a56bbdac745f9d05f19
|
|
In the callchains of interpreted Java code, two Java frames are
separated by several interpreter frames, which make it harder for
users to find Java frames. So this patch removes Java interpreter
frames by default in report-sample command output and
report_lib_interface. But it also provides the ability to show
Java interpreter frames via --show-art-frames option.
Bug: http://b/73126888
Test: run simpleperf_unit_test.
Test: run test.py.
Change-Id: I9a89e2f6679dc1455df8c669628fce198ae7d576
|
|
|
|
PERF_RECORD_TRACE_DATA can't have record size >= 64K. This causes
some aborts when recording tracepoint events, as in
https://github.com/android-ndk/ndk/issues/493.
So fix this by adding a custom type SIMPLE_PERF_RECORD_TRACING_DATA,
which isn't limited by 64K.
Also fix an error parsing formats of tracepoing events.
Bug: http://b/75278602
Test: run simpleperf_unit_test.
Change-Id: Ib5ebd0b6f981b00c2a256d17cdfd0e725d75a272
|
|
1. For each jit symfile, generate a Mmap2Record with a special flag
PROT_JIT_SYMFILE_MAP.
2. Call ReadMmapEventData() before dumping jit Mmap2Records, to keep
the order of samples and mmap records.
3. Handle finding symbols from maps with PROT_JIT_SYMFILE_MAP flags.
4. Pass PROT_JIT_SYMFILE_MAP flag to libunwindstack, to unwind
through jited methods.
Bug: http://b/73127105
Test: run simpleperf manually.
Test: run simpleperf_unit_test.
Change-Id: I2b2f77ff457f7eb2f10193e987a181e4791a29ee
|
|
When recording google.sample.tunnel app for 30s:
It took 3s to unwind samples and write unwound samples to file.
It took 0.3s to write samples containing stack/reg data to file.
The result shows recording with post unwinding consumes much
less time than unwinding samples immediately. This means we can
record with higher freq and get smaller lose rate when using
post unwinding. So make below changes:
1. Make post unwinding by default.
2. Replace --post-unwind with --no-post-unwind option.
3. Make --trace-offcpu and callchain joiner work with post unwinding.
4. Remove special operations in --log debug mode. Those will be
supported in a new command.
Bug: http://b/72556486
Test: run simpleperf_unit_test.
Test: run python test.py.
Change-Id: I9a5a5defda9d040985e674c43db19ee68e7aa305
|
|
1. Add stack range in unwinding result.
2. Add option to omit callchains fixed by callchain joiner.
Bug: http://b/69383534
Test: manually.
Change-Id: I9672061a8972ac79c321fc5d5e63950369c63e9c
|
|
1. Add MAP_MISSING result type.
2. Pass --log option to the simpleperf process running in app's context,
in order to dump unwinding results when profiling android apps.
Bug: http://b/69383534
Test: run simpleperf_unit_test.
Test: run simpleperf manually.
Change-Id: I72173060a5808e5ffb7318640509cabe53395063
|
|
1. When --log debug is used, store unwinding results in
UnwindingResultRecords in perf.data.
2. Use unwinding_result_reporter.py to report unwinding results.
This is to help finding different unwinding failures.
Bug: http://b/69383534
Test: run simpleperf_unit_test.
Test: run unwinding_result_reporter.py manually.
Change-Id: I6d7f107e9758b1ec55ed35b49657bb41d47e2178
|
|
1. In record cmd, split most code in Run() into three functions to make it easier to maintain.
2. In record cmd, use CallChainJoiner by default when -g option is used. And allow using
--no-callchain-joiner option to disable the joiner, and --callchain-joiner-min-matching-nodes
to adjust the joiner.
3. Adjust the interface of UnwindCallChain() to return sps used by the joiner.
4. Add functions in SampleRecord to use callchains returned by the joiner.
Add CallChainRecord to keep callchains returned by the joiner for debugging.
5. In dump cmd, show callchains of SampleRecord and CallChainRecord for debugging.
Bug: http://b/69383534
Test: run simpleperf_unit_test.
Test: run python test.py.
Change-Id: I951b169dfba0f7c50b6d4d741df83f02f8010626
|
|
The kernel stores return addrs in the callchain, but we
want the addrs of call instructions along the callchain.
So adjust callchains generated by the kernel.
Also avoid using const_cast<> in record.cpp by constructing
Record classes with non const buffers.
Bug: None.
Test: `python report_html.py --add_disassembly`.
Test: run simpleperf_unit_test.
Change-Id: I8c5f369e333ec9bc96cf5b5166ac670c3e3b5c62
|
|
Exclude kernel callchains when users only monitor events in user space.
After this change, when users use `record -e cpu-cycles:u --trace-offcpu`,
the samples of the implicitly added sched:sched_switch event won't contain
any kernel callchain.
Bug: http://b/37572306
Test: run simpleperf_unit_test.
Change-Id: Iffcb61bac796e734825e68f847f24b4006b44360
|
|
Bug: http://b/35475170
Test: run simpleperf_unit_test.
Test: run report.py.
Change-Id: Ie9329a64c701bce38f7b440c16cb47e99e83db45
|
|
Add inplace-sampler event type, so it can be used in
record/list command. This cl doesn't add code for communicating
with profiled process, and fake records in InplaceSamplerClient.cpp
for testing purpose.
Refactor runtest.py to test inplace-sampler profiling.
Bug: http://b/30974760
Test: run runtest.py --inplace-sampler.
Change-Id: I92d8b03583c58b3589207f5c655e03853899be3a
|
|
By reading records from all buffers at once, we can merge records
in memory instead of sorting them in perf.data. To make it clear,
this patch only contains the code to merge records in memory, and
I will remove old method later.
Bug: http://b/32343227
Test: run simpleperf_unit_test.
Test: run simpleperf_runtest.py.
Change-Id: Iea2da06c072243c2014f43c8aa6d96a23cfb9123
|
|
And other small changes:
add time when building comm record.
move some Move*BinaryFormat to utils.h.
Handle wrong symbol whoes address can be ULLONG_MAX.
Bug: http://b/30974760
Test: simpleperf_unit_test.
Change-Id: I2956d3c4b781c580fe93a6e5b77e0469f7f4f43f
|
|
Fix two errors when reporting perf.data generated by linux perf.
And add corresponding tests.
Test: run simpleperf_unit_test.
Change-Id: I04dd88461fdd6a85763847570bac16db1ccb81fa
|
|
1. When a cpu is down, read records from event files on that cpu,
then close those event files.
2. When a cpu is up, open event files on that cpu, and create
mapped buffer for those event files to dump records.
3. Instead of creating a mapped buffer for each event type on each
cpu, we can just create a mapped buffer for all event types on
each cpu.
4. When new event files are created, store a EventIdRecord record in
perf.data to notify record_file_reader.cpp.
Bug: http://b/29245608
Test: run simpleperf record cmd and make cpu offline and online.
Test: run simpleperf_unit_test.
Change-Id: Ib97a24b6292fa143e9b35cb105bdddf1e826d60a
|
|
Avoid binary allocation and memory copy in ReadRecordsFromBuffer(),
thus reduce Record construction overhead in
EventSelectionSet::ReadMmapEventDataForFd().
Remove RecordCache used while recording. Replace it with
RecordFileWriter::SortDataSection(). For unwinding while
recording, use low watermark to make records almost sorted
when dumped from the kernel.
Bug: 30649868
Test: run simpleperf_unit_test.
Change-Id: Ie5fb942046900a5960b3c990cf4177c026eaadfb
|
|
It removes memory copy and heap allocation/deallocation in
Record::BinaryFormat(), and is a preparation to remove memory
copy and heap allocation in Record constructor.
Bug: 30649868
Test: run simpleperf_unit_test.
Change-Id: Ic8dd80e43f7b547a9beaf896d726b56aeb5d55a2
|
|
Min virtual address of a shared library is needed when mapping ip
addresses to function symbols. So we should dump it in DsoRecord.
Bug: 28114205
Test: run simpleperf_unit_test.
Change-Id: Ib986ee598281cf60caa3a2c5408100b9e7678143
|
|
RecordCache::Push(vector<..>) doesn't update last_time_, this makes
RecordCache don't pop any record before PopAll().
Bug: 29581559
Change-Id: Icea806346b7ad812e606eaf05747797b766ebd71
Test: run simpleperf_unit_test.
|
|
Previously we split KernelSymbolRecord because it is > 65535. Then
I found TracingDataRecord can also be > 65535. So it is better to
handle big records when reading and writing perf.data.
record_file_writer.cpp splits a big record into multiple SPLIT
records followed by a SPLIT_END record, and record_file_reader.cpp
restores the big record when reading SPLIT and SPLIT_END records.
Also Add RecordHeader to represent record having size > 65535.
Bug: 29581559
Change-Id: I0b4556988f77b3431c7f1a28fce65cf225d6a067
Test: run simpleperf_unit_test.
|
|
Bug: 29126335
Change-Id: Id4a5b51120389387ec3ab45ea9ad9a276aa6ce2a
Test: run simpleperf with high -f option and check the lost record warning.
|
|
Change-Id: Id9e9e67174ab3f857eb2baa9609351b60586b8dd
|
|
Bug: 28114205
Change-Id: I84ad011b10c19e07576b718ba4b6b6c52a823366
|
|
When monitoring tracepoint events, dump tracing data to perf.data
can enable reporting on a different machine.
Bug: 27403614
Change-Id: Ie1af624717a245cacbeb44b4c1bcd499fc9ad8db
|
|
1. Add report-sample command to report each sample with symbol information.
2. Add --dump-symbols option to record command to collect dso and symbol
information in perf.data.
Bug: 28114205
Change-Id: I37424ee6abd74a21ad41cd3b6c4249cf0625c201
|
|
To better support kernel profiling, record kernel symbols in perf.data
when necessary. An option --no-dump-kernel-symbols is added in
record command to always avoid recording kernel symbols.
The way to handle all zero /proc/modules and /proc/kallsyms is
improved. Add Better support in finding symbols for kernel modules.
Bug: 27403614
Change-Id: I470151c54f8a45ad1c101c1b94490e33d7fd7485
|
|
When sampling kernel trace points, it is like to sample more than
one even type. Like `simpleperf record -e kmem:mm_page_alloc,kmem:mm_page_free`.
1. change record command to dump event_id for all records.
2. change report command and record reader to support multiple
event attrs.
3. hide record_cache inside EventSelectionSet.
4. add test to report multiple event types.
Bug: 27403614
Change-Id: Ic22a5527d68e7a843e3cf95e85381f8ad6bcb196
|
|
When running unit tests on arm64 devices:
[OK] ReportCommandTest.dwarf_callgraph
[OK] record_cmd.dwarf_callchain_sampling.
ERROR: can't unwind data recorded on a different architecture.
It is because ReportCommandtest.dwarf_callgraph opens a perf.data
recorded on x86_64, and changes current_arch. It causes a problem when
the test record_cmd.dwarf_callchain_sampling calls libbacktrace built
on aarch64. Athough it doesn't make the test fail, we should fix this.
Change-Id: I2cd70369a769ef2199cab2302b8b824369be0907
|
|
Bug: 27432175
Change-Id: If0e8bc724cf659508726215d515d3df30cbebe6b
|
|
And fix one build_id bug introduced by previous patch.
Bug: 26962895
Change-Id: Ibb8bd6ec77ee862bb01c26342d3b3024468e75b2
|
|
Changes included:
1. provide interface in read_apk.h to read build id and symbols.
2. report symbols of native libraries in apk file.
3. refactor code in read_elf.cpp and read_apk.cpp.
4. add verbose log.
5. add -o report_file_name option for report command.
6. add corresponding unit tests.
Bug: 26962895
Change-Id: I0d5398996e0c29dba4a6f5226692b758ca096bbd
|
|
Some APKs contain shared libraries that the linker handles
by mmap'ing directly from their APKs (if the library is
uncompressed and the proper manifest flag is set). With
this patch simpleperf now breaks out samples on a per-li
basis and reports the name of the lib within the APK.
Example output:
Cmdline: /system/xbin/simpleperf record -a sleep 30
Samples: 140672 of event 'cpu-cycles'
Event count: 84111474884
Overhead Command Pid Tid Shared Object
90.22% b_open_from_apk 19066 19066
/data/app/com.android.frameworks.coretests.install_jni_lib_open_from_apk-2/base.apk!lib/armeabi-v7a/libgcdstuff.so
4.85% b_open_from_apk 19066 19066
/data/app/com.android.frameworks.coretests.install_jni_lib_open_from_apk-2/base.apk!lib/armeabi-v7a/libframeworks_coretests_jni.so
1.19% simpleperf 19085 19085 /system/lib/libc.so
...
Bug: 22560619
Change-Id: I1e0f2e155e03b33935eac24e104c3fd7b9a7e33c
|
|
In order to report correctly, We should keep the order of self created
records when reading perf.data. So adjust sort strategy in RecordCache
to avoid reordering it.
Bug: 26214604
Change-Id: I40812ee5f4f6051103d40459edf4b4a2d7a80313
|
|
Change-Id: Ic15d4778c7accd1382de0b440a437aba2cf67016
|
|
perf.data can be too large to be loaded into memory.
To avoid this, use fread() instead of mmap() to read perf.data,
and always use RecordCache to sort records.
Fix unit tests failure caused by previous change.
Bug: 25194400
Change-Id: If29dc0bb0ed992ba34202c2cb1a204a1d9123b7a
|
|
Dumping user's stack consumes lots of disk space, which makes long period recording
impossible. This patch does stack unwinding before writing to perf.data, so it doesn't
need to save user's stack. Previous behavior is still supported with --post-unwind option.
A record cache is used for online record processing.
Bug: 22229391
Change-Id: Idcc6ec46924fff3fcc8c165d62f8af875b173cd4
|
|
As libbacktrace only supports unwinding for the same architecture it is running on, simpleperf
report command running on host can't unwind perf.data collected on device. So we'd better do
unwinding work in record command on device.
Bug: 22229391
Change-Id: I085ca074ea83dab79f08563523bdbc7a36650a64
|
|
tracepoint events store tracing info in raw data in sample records.
And we need to enable it in sample_type.
Change-Id: Icd866059f4703b56724845d7526ae58099e83113
|