aboutsummaryrefslogtreecommitdiff
path: root/BUILD
diff options
context:
space:
mode:
authorBenoit Jacob <benoitjacob@google.com>2020-01-25 11:01:23 -0500
committerBenoit Jacob <benoitjacob@google.com>2020-03-10 16:43:24 -0400
commitd4abb8650256a76cec0f51dffe5f9976640bef19 (patch)
tree47683fb1eb1b414d0bf972ac9a5500f1e43869f5 /BUILD
parentc3bb0b7d618a15932e37aae626d3337d03725b35 (diff)
downloadruy-d4abb8650256a76cec0f51dffe5f9976640bef19.tar.gz
Changes to BlockMap, in particular add Hilbert-curve fractal traversal above a certain size threshold.
Renames cache_friendly_traversal_threshold to local_data_cache_size so it's more explicit about what it is in practice. Introduce shared_data_cache_size, needed in the decision of whether to use Hilbert curve. Hilbert curve is more expensive to decode and only worth it if it allows to reduce DRAM accesses, which depends on shared_data_cache_size. Centralize defaults in a new :cpu_cache_size library. Centralize the reading of these defaults in Spec so that users can override these consistently by passing own spec (either to provide more accurate/runtime values or for test coverage purposes). On Pixel4, This does not significantly affect latencies, outside of a 1%-2% improvement on latencies on 4 threads on very large matrix sizes. The motivation for this is that it reduces DRAM accesses: the PMU observes typically a 10% reduction, up to 20%, of 'L3 data cache refill' events on very large matrix multiplications (1000x1000 and above). DRAM accesses should be an increasing function of that, perhaps even more or less proportional to that, so this indicates that this change will significantly reduce DRAM accesses and thus power usage. This was observed consistently on all 2x2=4 combinations of {1, 4} threads on {little, big} cores on Pixel4. PiperOrigin-RevId: 291531754
Diffstat (limited to 'BUILD')
-rw-r--r--BUILD18
1 files changed, 17 insertions, 1 deletions
diff --git a/BUILD b/BUILD
index 42c0d43..6f68910 100644
--- a/BUILD
+++ b/BUILD
@@ -205,6 +205,7 @@ cc_test(
srcs = ["block_map_test.cc"],
deps = [
":block_map",
+ ":cpu_cache_size",
":path",
"@com_google_googletest//:gtest",
],
@@ -281,6 +282,17 @@ cc_library(
)
cc_library(
+ name = "cpu_cache_size",
+ hdrs = ["cpu_cache_size.h"],
+ copts = ruy_copts_base(),
+ visibility = ruy_visibility(),
+ deps = [
+ ":path",
+ ":platform",
+ ],
+)
+
+cc_library(
name = "trace",
srcs = [
"trace.cc",
@@ -310,7 +322,10 @@ cc_library(
hdrs = ["spec.h"],
copts = ruy_copts_base(),
visibility = ruy_visibility(),
- deps = [":matrix"],
+ deps = [
+ ":cpu_cache_size",
+ ":matrix",
+ ],
)
cc_library(
@@ -954,6 +969,7 @@ ruy_benchmark_opt_sets(
"3ff",
"7ff",
"fff",
+ "1fff",
],
deps = [
":test_lib",