diff options
author | Benoit Jacob <benoitjacob@google.com> | 2020-01-25 11:01:23 -0500 |
---|---|---|
committer | Benoit Jacob <benoitjacob@google.com> | 2020-03-10 16:43:24 -0400 |
commit | d4abb8650256a76cec0f51dffe5f9976640bef19 (patch) | |
tree | 47683fb1eb1b414d0bf972ac9a5500f1e43869f5 /BUILD | |
parent | c3bb0b7d618a15932e37aae626d3337d03725b35 (diff) | |
download | ruy-d4abb8650256a76cec0f51dffe5f9976640bef19.tar.gz |
Changes to BlockMap, in particular add Hilbert-curve fractal traversal above a certain size threshold.
Renames cache_friendly_traversal_threshold to local_data_cache_size so it's more explicit about what it is in practice. Introduce shared_data_cache_size, needed in the decision of whether to use Hilbert curve. Hilbert curve is more expensive to decode and only worth it if it allows to reduce DRAM accesses, which depends on shared_data_cache_size. Centralize defaults in a new :cpu_cache_size library. Centralize the reading of these defaults in Spec so that users can override these consistently by passing own spec (either to provide more accurate/runtime values or for test coverage purposes).
On Pixel4, This does not significantly affect latencies, outside of a 1%-2% improvement on latencies on 4 threads on very large matrix sizes.
The motivation for this is that it reduces DRAM accesses: the PMU observes typically a 10% reduction, up to 20%, of 'L3 data cache refill' events on very large matrix multiplications (1000x1000 and above). DRAM accesses should be an increasing function of that, perhaps even more or less proportional to that, so this indicates that this change will significantly reduce DRAM accesses and thus power usage. This was observed consistently on all 2x2=4 combinations of {1, 4} threads on {little, big} cores on Pixel4.
PiperOrigin-RevId: 291531754
Diffstat (limited to 'BUILD')
-rw-r--r-- | BUILD | 18 |
1 files changed, 17 insertions, 1 deletions
@@ -205,6 +205,7 @@ cc_test( srcs = ["block_map_test.cc"], deps = [ ":block_map", + ":cpu_cache_size", ":path", "@com_google_googletest//:gtest", ], @@ -281,6 +282,17 @@ cc_library( ) cc_library( + name = "cpu_cache_size", + hdrs = ["cpu_cache_size.h"], + copts = ruy_copts_base(), + visibility = ruy_visibility(), + deps = [ + ":path", + ":platform", + ], +) + +cc_library( name = "trace", srcs = [ "trace.cc", @@ -310,7 +322,10 @@ cc_library( hdrs = ["spec.h"], copts = ruy_copts_base(), visibility = ruy_visibility(), - deps = [":matrix"], + deps = [ + ":cpu_cache_size", + ":matrix", + ], ) cc_library( @@ -954,6 +969,7 @@ ruy_benchmark_opt_sets( "3ff", "7ff", "fff", + "1fff", ], deps = [ ":test_lib", |