Merge "Rebase Eigen to 3.3.3."

am: 7de1f32623 Change-Id: I8a2b86ed74cba8cc0d438beab1914751fa45487c
author: Miao Wang <miaowang@google.com> 2017-03-08 17:18:33 +0000
committer: android-build-merger <android-build-merger@google.com> 2017-03-08 17:18:33 +0000
commit: 6688b8b2600a93ffeb369d4eb439f7b212639f39 (patch)
tree: 0488797fc544fe977bec6418c73445759f052482 /bench/tensors/README
parent: d2df80e95c49f43bff1133d61f0f5863d003935b (diff)
parent: 7de1f32623fe9b8d80455905f4f23b944bcb5e48 (diff)
download: eigen-6688b8b2600a93ffeb369d4eb439f7b212639f39.tar.gz
1 files changed, 21 insertions, 0 deletions
diff --git a/bench/tensors/README b/bench/tensors/README
new file mode 100644
index 000000000..3a5fdbe17
--- /dev/null
+++ b/bench/tensors/README
@@ -0,0 +1,21 @@
+The tensor benchmark suite is made of several parts.
+
+The first part is a generic suite, in which each benchmark comes in 2 flavors: one that runs on CPU, and one that runs on GPU.
+
+To compile the floating point CPU benchmarks, simply call:
+g++ tensor_benchmarks_cpu.cc benchmark_main.cc -I ../../ -std=c++11 -O3 -DNDEBUG -pthread -mavx -o benchmarks_cpu
+
+To compile the floating point GPU benchmarks, simply call:
+nvcc tensor_benchmarks_gpu.cu benchmark_main.cc -I ../../ -std=c++11 -O2 -DNDEBUG -use_fast_math -ftz=true -arch compute_35 -o benchmarks_gpu
+
+We also provide a version of the generic GPU tensor benchmarks that uses half floats (aka fp16) instead of regular floats. To compile these benchmarks, simply call the command line below. You'll need a recent GPU that supports compute capability 5.3 or higher to run them and nvcc 7.5 or higher to compile the code.
+nvcc tensor_benchmarks_fp16_gpu.cu benchmark_main.cc -I ../../ -std=c++11 -O2 -DNDEBUG -use_fast_math -ftz=true -arch compute_53 -o benchmarks_fp16_gpu
+
+last but not least, we also provide a suite of benchmarks to measure the scalability of the contraction code on CPU. To compile these benchmarks, call
+g++ contraction_benchmarks_cpu.cc benchmark_main.cc -I ../../ -std=c++11 -O3 -DNDEBUG -pthread -mavx -o benchmarks_cpu
+
+To compile the benchmark for SYCL, using ComputeCpp you currently need 2 passes (only for translation units containing device code):
+1. The device compilation pass that generates the device code (SYCL kernels and referenced device functions) and glue code needed by the host compiler to reference the device code from host code.
+{ComputeCpp_ROOT}/bin/compute++ -I ../../ -I {ComputeCpp_ROOT}/include/ -std=c++11 -mllvm -inline-threshold=1000 -Wno-ignored-attributes -sycl -intelspirmetadata -emit-llvm -no-serial-memop -sycl-compress-name -DBUILD_PLATFORM_SPIR -DNDBUG -O3 -c tensor_benchmarks_sycl.cc
+2. The host compilation pass that generates the final host binary.
+clang++-3.7 -include tensor_benchmarks_sycl.sycl benchmark_main.cc tensor_benchmarks_sycl.cc -pthread -I ../../ -I {ComputeCpp_ROOT}/include/ -L {ComputeCpp_ROOT}/lib/ -lComputeCpp -lOpenCL -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11 -o tensor_benchmark_sycl
author	Miao Wang <miaowang@google.com>	2017-03-08 17:18:33 +0000
committer	android-build-merger <android-build-merger@google.com>	2017-03-08 17:18:33 +0000
commit	6688b8b2600a93ffeb369d4eb439f7b212639f39 (patch)
tree	0488797fc544fe977bec6418c73445759f052482 /bench/tensors/README
parent	d2df80e95c49f43bff1133d61f0f5863d003935b (diff)
parent	7de1f32623fe9b8d80455905f4f23b944bcb5e48 (diff)
download	eigen-6688b8b2600a93ffeb369d4eb439f7b212639f39.tar.gz