aboutsummaryrefslogtreecommitdiff
path: root/README.txt
diff options
context:
space:
mode:
Diffstat (limited to 'README.txt')
-rw-r--r--README.txt260
1 files changed, 0 insertions, 260 deletions
diff --git a/README.txt b/README.txt
deleted file mode 100644
index e29f0e4..0000000
--- a/README.txt
+++ /dev/null
@@ -1,260 +0,0 @@
-gemmlowp: a small self-contained low-precision GEMM library
-===========================================================
-
-This is not a full linear algebra library, only a GEMM library: it only does
-general matrix multiplication ("GEMM").
-
-The meaning of "low precision" is detailed in this document:
- doc/low-precision.txt
-
-Some of the general design is explained in
- doc/design.txt
-
-
-Disclaimer
-==========
-
-This is not an official Google product (experimental or otherwise), it is just
-code that happens to be owned by Google.
-
-
-Mailing list
-============
-
-gemmlowp-related discussion, about either development or usage, is welcome
-on this Google Group (mailing list / forum):
-
- https://groups.google.com/forum/#!forum/gemmlowp
-
-
-Portability, target platforms/architectures
-===========================================
-
-Should be portable to any platform with some C++11 and POSIX support,
-while we have optional optimized code paths for specific architectures.
-
-Required:
- C++11 (a small conservative subset of it)
-
-Required for some features:
- * Some POSIX interfaces:
- * pthreads (for multi-threaded operation and for profiling).
- * sysconf (for multi-threaded operation to detect number of cores;
- may be bypassed).
-
-Optional:
- Architecture-specific code paths use intrinsics or inline assembly.
- See "Architecture-specific optimized code paths" below.
-
-Architecture-specific optimized code paths
-==========================================
-
-We have some optimized code paths for specific instruction sets.
-Some are written in inline assembly, some are written in C++ using
-intrinsics. Both GCC and Clang are supported.
-
-At the moment, we have a full set of optimized code paths (kernels,
-packing and unpacking paths) only for ARM NEON, supporting both
-ARMv7 (32bit) and ARMv8 (64bit).
-
-We also have a partial set of optimized code paths (only kernels
-at the moment) for Intel SSE. It supports both x86 and x86-64 but
-only targets SSE4. The lack of packing/unpacking code paths means
-that performance isn't optimal yet.
-
-Details of what it takes to make an efficient port of gemmlowp, namely
-writing a suitable GEMM kernel and accompanying packing code, are
-explained in this file:
- doc/kernels.txt
-
-
-Public interfaces
-=================
-
-1. gemmlowp public interface
-----------------------------
-
- gemmlowp's main public interface is in the public/ subdirectory. The
- header to include is
- public/gemmlowp.h.
- This is a headers-only library, so there is nothing to link to.
-
-2. EightBitIntGemm standard interface
--------------------------------------
-
- Additionally, the eight_bit_int_gemm/ subdirectory provides an
- implementation of the standard EightBitIntGemm interface. The header
- to include is
- eight_bit_int_gemm/eight_bit_int_gemm.h
- This is *NOT* a headers-only library, users need to link to
- eight_bit_int_gemm/eight_bit_int_gemm.cc.
- The API is similar to the standard BLAS GEMM interface, and implements
- C = A * B. If the transpose flags for a matrix argument are false, its memory
- order is treated as column major, and row major if its true.
-
-
-Building
-========
-
-Building by manually invoking your compiler
--------------------------------------------
-
-Because gemmlowp is so simple, working with it involves only
-single-command-line compiler invokations. Therefore we expect that
-most people working with gemmlowp will either manually invoke their
-compiler, or write their own rules for their own preferred build
-system.
-
-Keep in mind (previous section) that gemmlowp itself is a pure-headers-only
-library so there is nothing to build, and the eight_bit_int_gemm library
-consists of a single eight_bit_int_gemm.cc file to build.
-
-For a Android gemmlowp development workflow, the scripts/ directory
-contains a script to build and run a program on an Android device:
- scripts/test-android.sh
-
-Building using Bazel
---------------------
-
-That being said, we also maintain a Bazel BUILD system as part of
-gemmlowp. Its usage is not mandatory at all and is only one
-possible way that gemmlowp libraries and tests may be built. If
-you are interested, Bazel's home page is
- http://bazel.io/
-And you can get started with using Bazel to build gemmlowp targets
-by first creating an empty WORKSPACE file in a parent directory,
-for instance:
-
-$ cd gemmlowp/.. # change to parent directory containing gemmlowp/
-$ touch WORKSPACE # declare that to be our workspace root
-$ bazel build gemmlowp:all
-
-
-Testing
-=======
-
-Testing by manually building and running tests
-----------------------------------------------
-
-The test/ directory contains unit tests. The primary unit test is
- test/test.cc
-Since it covers also the EightBitIntGemm interface, it needs to be
-linked against
- eight_bit_int_gemm/eight_bit_int_gemm.cc
-It also uses realistic data captured from a neural network run in
- test/test_data.cc
-
-Thus you'll want to pass the following list of source files to your
-compiler/linker:
- test/test.cc
- eight_bit_int_gemm/eight_bit_int_gemm.cc
- test/test_data.cc
-
-The scripts/ directory contains a script to build and run a program
-on an Android device:
- scripts/test-android.sh
-
-It expects the CXX environment variable to point to an Android toolchain's
-C++ compiler, and expects source files (and optionally, cflags) as
-command-line parameters. To build and run the above-mentioned main unit test,
-first set CXX e.g.:
-
-$ export CXX=/some/toolchains/arm-linux-androideabi-4.8/bin/arm-linux-androideabi-g++
-
-Then run:
-
-$ ./scripts/test-android.sh \
-test/test.cc \
-eight_bit_int_gemm/eight_bit_int_gemm.cc \
-test/test_data.cc
-
-
-Testing using Bazel
--------------------
-
-Alternatively, you can use Bazel to build and run tests. See the Bazel
-instruction in the above section on building. Once your Bazel workspace
-is set up, you can for instance do:
-
-$ bazel test gemmlowp:all
-
-
-Troubleshooting Compilation
-===========================
-
-If you're having trouble finding the compiler, follow these instructions to
-build a standalone toolchain:
-https://developer.android.com/ndk/guides/standalone_toolchain.html
-
-Here's an example of setting up Clang 3.5:
-
-$ export INSTALL_DIR=~/toolchains/clang-21-stl-gnu
-$ $NDK/build/tools/make-standalone-toolchain.sh \
---toolchain=arm-linux-androideabi-clang3.5 --platform=android-21 \
---install-dir=$INSTALL_DIR
-$ export CXX="$INSTALL_DIR/bin/arm-linux-androideabi-g++ \
---sysroot=$INSTALL_DIR/sysroot"
-
-Some compilers (e.g. the default clang++ in the same bin directory) don't
-support NEON assembly. The benchmark build process will issue a warning if
-support isn't detected, and you should make sure you're using a compiler like
-arm-linux-androideabi-g++ that does include NEON.
-
-
-Benchmarking
-============
-
-The main benchmark is
- benchmark.cc
-It doesn't need to be linked to any
-other source file. We recommend building with assertions disabled (-DNDEBUG).
-
-For example, the benchmark can be built and run on an Android device by doing:
-
-$ ./scripts/test-android.sh test/benchmark.cc -DNDEBUG
-
-If GEMMLOWP_TEST_PROFILE is defined then the benchmark will be built with
-profiling instrumentation (which makes it slower) and will dump profiles.
-See next section on profiling.
-
-
-Profiling
-=========
-
-The profiling/ subdirectory offers a very simple non-interrupting sampling
-profiler that only requires pthreads (no signals).
-
-It relies on source code being instrumented with pseudo-stack labels.
-See profiling/instrumentation.h.
-A full example of using this profiler is given in profiling/profiler.h.
-
-
-Contributing
-============
-
-Contribution-related discussion is always welcome on the gemmlowp
-mailing list (see above).
-
-We try to keep a current list of TODO items in the todo/ directory.
-Prospective contributors are welcome to pick one to work on, and
-communicate about it on the gemmlowp mailing list.
-
-Details of the contributing process, including legalese, are in CONTRIBUTING.
-
-Performance goals
-=================
-
-Our performance goals differ from typical GEMM performance goals in the
-following ways:
-
-1. We care not only about speed, but also about minimizing power usage.
- We specifically care about charge usage in mobile/embedded devices.
- This implies that we care doubly about minimizing memory bandwidth usage:
- we care about it, like any GEMM, because of the impact on speed, and we
- also care about it because it is a key factor of power usage.
-
-2. Most GEMMs are optimized primarily for large dense matrix sizes (>= 1000).
- We do care about large sizes, but we also care specifically about the
- typically smaller matrix sizes encountered in various mobile applications.
- This means that we have to optimize for all sizes, not just for large enough
- sizes.