aboutsummaryrefslogtreecommitdiff
path: root/bench/tensors/README
diff options
context:
space:
mode:
authorMiao Wang <miaowang@google.com>2017-03-08 17:18:33 +0000
committerandroid-build-merger <android-build-merger@google.com>2017-03-08 17:18:33 +0000
commit6688b8b2600a93ffeb369d4eb439f7b212639f39 (patch)
tree0488797fc544fe977bec6418c73445759f052482 /bench/tensors/README
parentd2df80e95c49f43bff1133d61f0f5863d003935b (diff)
parent7de1f32623fe9b8d80455905f4f23b944bcb5e48 (diff)
downloadeigen-6688b8b2600a93ffeb369d4eb439f7b212639f39.tar.gz
Merge "Rebase Eigen to 3.3.3."
am: 7de1f32623 Change-Id: I8a2b86ed74cba8cc0d438beab1914751fa45487c
Diffstat (limited to 'bench/tensors/README')
-rw-r--r--bench/tensors/README21
1 files changed, 21 insertions, 0 deletions
diff --git a/bench/tensors/README b/bench/tensors/README
new file mode 100644
index 000000000..3a5fdbe17
--- /dev/null
+++ b/bench/tensors/README
@@ -0,0 +1,21 @@
+The tensor benchmark suite is made of several parts.
+
+The first part is a generic suite, in which each benchmark comes in 2 flavors: one that runs on CPU, and one that runs on GPU.
+
+To compile the floating point CPU benchmarks, simply call:
+g++ tensor_benchmarks_cpu.cc benchmark_main.cc -I ../../ -std=c++11 -O3 -DNDEBUG -pthread -mavx -o benchmarks_cpu
+
+To compile the floating point GPU benchmarks, simply call:
+nvcc tensor_benchmarks_gpu.cu benchmark_main.cc -I ../../ -std=c++11 -O2 -DNDEBUG -use_fast_math -ftz=true -arch compute_35 -o benchmarks_gpu
+
+We also provide a version of the generic GPU tensor benchmarks that uses half floats (aka fp16) instead of regular floats. To compile these benchmarks, simply call the command line below. You'll need a recent GPU that supports compute capability 5.3 or higher to run them and nvcc 7.5 or higher to compile the code.
+nvcc tensor_benchmarks_fp16_gpu.cu benchmark_main.cc -I ../../ -std=c++11 -O2 -DNDEBUG -use_fast_math -ftz=true -arch compute_53 -o benchmarks_fp16_gpu
+
+last but not least, we also provide a suite of benchmarks to measure the scalability of the contraction code on CPU. To compile these benchmarks, call
+g++ contraction_benchmarks_cpu.cc benchmark_main.cc -I ../../ -std=c++11 -O3 -DNDEBUG -pthread -mavx -o benchmarks_cpu
+
+To compile the benchmark for SYCL, using ComputeCpp you currently need 2 passes (only for translation units containing device code):
+1. The device compilation pass that generates the device code (SYCL kernels and referenced device functions) and glue code needed by the host compiler to reference the device code from host code.
+{ComputeCpp_ROOT}/bin/compute++ -I ../../ -I {ComputeCpp_ROOT}/include/ -std=c++11 -mllvm -inline-threshold=1000 -Wno-ignored-attributes -sycl -intelspirmetadata -emit-llvm -no-serial-memop -sycl-compress-name -DBUILD_PLATFORM_SPIR -DNDBUG -O3 -c tensor_benchmarks_sycl.cc
+2. The host compilation pass that generates the final host binary.
+clang++-3.7 -include tensor_benchmarks_sycl.sycl benchmark_main.cc tensor_benchmarks_sycl.cc -pthread -I ../../ -I {ComputeCpp_ROOT}/include/ -L {ComputeCpp_ROOT}/lib/ -lComputeCpp -lOpenCL -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11 -o tensor_benchmark_sycl