diff options
Diffstat (limited to 'README.txt')
-rw-r--r-- | README.txt | 260 |
1 files changed, 0 insertions, 260 deletions
diff --git a/README.txt b/README.txt deleted file mode 100644 index e29f0e4..0000000 --- a/README.txt +++ /dev/null @@ -1,260 +0,0 @@ -gemmlowp: a small self-contained low-precision GEMM library -=========================================================== - -This is not a full linear algebra library, only a GEMM library: it only does -general matrix multiplication ("GEMM"). - -The meaning of "low precision" is detailed in this document: - doc/low-precision.txt - -Some of the general design is explained in - doc/design.txt - - -Disclaimer -========== - -This is not an official Google product (experimental or otherwise), it is just -code that happens to be owned by Google. - - -Mailing list -============ - -gemmlowp-related discussion, about either development or usage, is welcome -on this Google Group (mailing list / forum): - - https://groups.google.com/forum/#!forum/gemmlowp - - -Portability, target platforms/architectures -=========================================== - -Should be portable to any platform with some C++11 and POSIX support, -while we have optional optimized code paths for specific architectures. - -Required: - C++11 (a small conservative subset of it) - -Required for some features: - * Some POSIX interfaces: - * pthreads (for multi-threaded operation and for profiling). - * sysconf (for multi-threaded operation to detect number of cores; - may be bypassed). - -Optional: - Architecture-specific code paths use intrinsics or inline assembly. - See "Architecture-specific optimized code paths" below. - -Architecture-specific optimized code paths -========================================== - -We have some optimized code paths for specific instruction sets. -Some are written in inline assembly, some are written in C++ using -intrinsics. Both GCC and Clang are supported. - -At the moment, we have a full set of optimized code paths (kernels, -packing and unpacking paths) only for ARM NEON, supporting both -ARMv7 (32bit) and ARMv8 (64bit). - -We also have a partial set of optimized code paths (only kernels -at the moment) for Intel SSE. It supports both x86 and x86-64 but -only targets SSE4. The lack of packing/unpacking code paths means -that performance isn't optimal yet. - -Details of what it takes to make an efficient port of gemmlowp, namely -writing a suitable GEMM kernel and accompanying packing code, are -explained in this file: - doc/kernels.txt - - -Public interfaces -================= - -1. gemmlowp public interface ----------------------------- - - gemmlowp's main public interface is in the public/ subdirectory. The - header to include is - public/gemmlowp.h. - This is a headers-only library, so there is nothing to link to. - -2. EightBitIntGemm standard interface -------------------------------------- - - Additionally, the eight_bit_int_gemm/ subdirectory provides an - implementation of the standard EightBitIntGemm interface. The header - to include is - eight_bit_int_gemm/eight_bit_int_gemm.h - This is *NOT* a headers-only library, users need to link to - eight_bit_int_gemm/eight_bit_int_gemm.cc. - The API is similar to the standard BLAS GEMM interface, and implements - C = A * B. If the transpose flags for a matrix argument are false, its memory - order is treated as column major, and row major if its true. - - -Building -======== - -Building by manually invoking your compiler -------------------------------------------- - -Because gemmlowp is so simple, working with it involves only -single-command-line compiler invokations. Therefore we expect that -most people working with gemmlowp will either manually invoke their -compiler, or write their own rules for their own preferred build -system. - -Keep in mind (previous section) that gemmlowp itself is a pure-headers-only -library so there is nothing to build, and the eight_bit_int_gemm library -consists of a single eight_bit_int_gemm.cc file to build. - -For a Android gemmlowp development workflow, the scripts/ directory -contains a script to build and run a program on an Android device: - scripts/test-android.sh - -Building using Bazel --------------------- - -That being said, we also maintain a Bazel BUILD system as part of -gemmlowp. Its usage is not mandatory at all and is only one -possible way that gemmlowp libraries and tests may be built. If -you are interested, Bazel's home page is - http://bazel.io/ -And you can get started with using Bazel to build gemmlowp targets -by first creating an empty WORKSPACE file in a parent directory, -for instance: - -$ cd gemmlowp/.. # change to parent directory containing gemmlowp/ -$ touch WORKSPACE # declare that to be our workspace root -$ bazel build gemmlowp:all - - -Testing -======= - -Testing by manually building and running tests ----------------------------------------------- - -The test/ directory contains unit tests. The primary unit test is - test/test.cc -Since it covers also the EightBitIntGemm interface, it needs to be -linked against - eight_bit_int_gemm/eight_bit_int_gemm.cc -It also uses realistic data captured from a neural network run in - test/test_data.cc - -Thus you'll want to pass the following list of source files to your -compiler/linker: - test/test.cc - eight_bit_int_gemm/eight_bit_int_gemm.cc - test/test_data.cc - -The scripts/ directory contains a script to build and run a program -on an Android device: - scripts/test-android.sh - -It expects the CXX environment variable to point to an Android toolchain's -C++ compiler, and expects source files (and optionally, cflags) as -command-line parameters. To build and run the above-mentioned main unit test, -first set CXX e.g.: - -$ export CXX=/some/toolchains/arm-linux-androideabi-4.8/bin/arm-linux-androideabi-g++ - -Then run: - -$ ./scripts/test-android.sh \ -test/test.cc \ -eight_bit_int_gemm/eight_bit_int_gemm.cc \ -test/test_data.cc - - -Testing using Bazel -------------------- - -Alternatively, you can use Bazel to build and run tests. See the Bazel -instruction in the above section on building. Once your Bazel workspace -is set up, you can for instance do: - -$ bazel test gemmlowp:all - - -Troubleshooting Compilation -=========================== - -If you're having trouble finding the compiler, follow these instructions to -build a standalone toolchain: -https://developer.android.com/ndk/guides/standalone_toolchain.html - -Here's an example of setting up Clang 3.5: - -$ export INSTALL_DIR=~/toolchains/clang-21-stl-gnu -$ $NDK/build/tools/make-standalone-toolchain.sh \ ---toolchain=arm-linux-androideabi-clang3.5 --platform=android-21 \ ---install-dir=$INSTALL_DIR -$ export CXX="$INSTALL_DIR/bin/arm-linux-androideabi-g++ \ ---sysroot=$INSTALL_DIR/sysroot" - -Some compilers (e.g. the default clang++ in the same bin directory) don't -support NEON assembly. The benchmark build process will issue a warning if -support isn't detected, and you should make sure you're using a compiler like -arm-linux-androideabi-g++ that does include NEON. - - -Benchmarking -============ - -The main benchmark is - benchmark.cc -It doesn't need to be linked to any -other source file. We recommend building with assertions disabled (-DNDEBUG). - -For example, the benchmark can be built and run on an Android device by doing: - -$ ./scripts/test-android.sh test/benchmark.cc -DNDEBUG - -If GEMMLOWP_TEST_PROFILE is defined then the benchmark will be built with -profiling instrumentation (which makes it slower) and will dump profiles. -See next section on profiling. - - -Profiling -========= - -The profiling/ subdirectory offers a very simple non-interrupting sampling -profiler that only requires pthreads (no signals). - -It relies on source code being instrumented with pseudo-stack labels. -See profiling/instrumentation.h. -A full example of using this profiler is given in profiling/profiler.h. - - -Contributing -============ - -Contribution-related discussion is always welcome on the gemmlowp -mailing list (see above). - -We try to keep a current list of TODO items in the todo/ directory. -Prospective contributors are welcome to pick one to work on, and -communicate about it on the gemmlowp mailing list. - -Details of the contributing process, including legalese, are in CONTRIBUTING. - -Performance goals -================= - -Our performance goals differ from typical GEMM performance goals in the -following ways: - -1. We care not only about speed, but also about minimizing power usage. - We specifically care about charge usage in mobile/embedded devices. - This implies that we care doubly about minimizing memory bandwidth usage: - we care about it, like any GEMM, because of the impact on speed, and we - also care about it because it is a key factor of power usage. - -2. Most GEMMs are optimized primarily for large dense matrix sizes (>= 1000). - We do care about large sizes, but we also care specifically about the - typically smaller matrix sizes encountered in various mobile applications. - This means that we have to optimize for all sizes, not just for large enough - sizes. |