# Output pipelines in gemmlowp In gemmlowp, the "output pipeline" is the process that takes a final `int32` accumulator value (the output of the compute/kernel stage), and processes it to obtain the final value (typically a `uint8` value) and write it to the destination matrix. Gemmlowp has some genericity in what arithmetic transformations take place in the output pipeline, so as to allow different users to implement different quantization paradigms. See [low-precision.md](low-precision.md) and [quantization.md](quantization.md). Besides implementing a quantization paradigm, the other thing that output pipelines is good for, is implementing fused operations where a matrix multiplication feeds into other operations applied to its result, without additional array traversals. For instance, when implementing neural network inference, one might have a Convolutional layer with a bias-addition and an activation. One then wants to feed the result of the matrix multiplication implementing the Convolutional operator itself, directly into the bias-addition and activation function. gemmlowp's output pipelines allow implementing that: the bias-addition and activation function are just additional stages in the output pipeline. ## Usage The gemmlowp entry point allowing to use an arbitrary output pipeline is `GemmWithOutputPipeline` in [public/gemmlowp.h](../public/gemmlowp.h). The output pipeline is specified as a `std::tuple` of "output stages", each of which defining an elementary arithmetic transformation. All available output stages are defined in [public/output_stages.h](../public/output_stages.h). ## Example usage The best part to see examples of using various output pipelines is in the unit test, ``` test/test.cc ``` specifically in this function: ``` TestOutputStages ``` Separately, a self-contained example showing how to use gemmlowp to compute a quantized matrix multiplication with a sounds quantization paradigm, is here: [doc/quantization_example.cc](quantization_example.cc)