Age | Commit message (Collapse) | Author |
|
There are additional files that were not being properly
decompressed when running the benchmarks. Modify the code so
these files are decompressed too.
Test: Run benchmarks from clean environment and verify everything
Test: runs properly.
Change-Id: I7818ec8fbda49bdef938011a9cc70cc3cc32e050
|
|
Add to existing benchmark collections that use ELF files (
ElfBenchmark, SymbolBenchmark, OfflineUnwindBenchmarks) to include
benchmarks that use an ELF file with a large amount of unwind/debug
information.
Adding these large ELF files will enable more representative benchmark
results for typical unwinds.
See b/192012600 for additional information regarding these benchmarks.
Bug: 192012600
Test: Benchmarking CL. Benchmarks still run and unit tests still pass.
Change-Id: I2197928e4a79c83b65d87ec2bfaa64665616b919
|
|
The XZ file format supports random access if the file was
compressed with the right flags (eg xz --block-size=4096).
Add support for it so that we only decompress the parts
of mini-debug-info that we need.
This reduces mini-debug-info memory usage by 20% without any
specific optimizations which take advantage of the laziness.
(with perfetto system-wide profiling running for a minute)
BM_symbol_find_single_from_sorted (best case lookup scenario)
is 4x faster and uses 5x less memory.
BM_symbol_not_present_from_sorted (worst case lookup scenario)
is 2x faster and uses 3x less memory.
Bug: 110133331
Test: libunwindstack_test
Change-Id: Id067b2d07807463f86ed5fd908b83d079c5a057e
|
|
The symbol name related reads and memory operations
take about half of the symbol name reading cost now.
This CL adds ref-counted read-only shared string cache,
which essentially eliminates all of the costs.
BM_symbol_find_single_many_times is >10% faster on ARM.
(which is 20% if we exclude the fixed ELF loading cost)
Real-world profiles seem even more encouraging.
The extra memory cost is negligible (by definition,
a small fraction of the decompressed mini-debug-info:
specifically, the subset of strtab for hit functions).
Furthermore, this effectively dedups strings when
consecutive unwinds hit similar set of functions.
(perfetto might keep several unwind results live)
Test: m libunwindstack_unit_test
Change-Id: I5cf600bb972fdb9d0f3a57ed0997bead2efa38f4
|
|
In general we want to measure two scenarios:
- Time to do first unwind (including loading overheads).
- Time per unwind in long term stead state.
Test: unwind_benchmarks
Change-Id: I918373e174dd7c887b065de6765158bba689cab4
|
|
This function is responsible for majority of CPU time in prefetto.
Reduce the number of memory reads (don't read strings byte-by-byte).
Update all calls of ReadString to include the third parameter to have
a max read.
Add an Elf creation benchmark since this function is on the elf
creation path.
Test: libunwindstack_unit_test
Change-Id: Ia36e1f1a5ba76c9e9f13c43fb9e3691dde7897f2
|
|
Add a number of benchmarks to time how long it takes to look
up symbols.
Test: Ran benchmarks on device.
Change-Id: Iab7aab3f60c2c7056395beca3d36263420bcb5dc
|