Update LSTM documentation

Bug: 67184259 Bug: 67844091 Added the exact equations that are used in the current implementation. Minor clean-ups to activation function, scratch buffer size, and wording about building a model. Local rendering here. http://yangni-z840.mtv.corp.google.com/nnapi/group__NeuralNetworks.html#ggaabbe492c60331b13038e39d4207940e0ad0377e8c305e596fb7f64ff896671fc5 Test: None. Doc change only. Ran doxygen locally and visual inspects. Change-Id: I75548afb126a1c1b30121125fdbc3af9b356b300
author: Yang Ni <yangni@google.com> 2017-09-29 15:44:14 -0700
committer: Yang Ni <yangni@google.com> 2017-10-24 15:33:32 -0700
commit: c308618ad6e6ae365e063bf431db7fa1036eee07 (patch)
tree: f2f86d00d1bd6a43b987046f0c36f638f452419e /nn/runtime/include/NeuralNetworks.h
parent: f315e6afa7cff2b271a4c4c8f0643d43508f92db (diff)
download: ml-c308618ad6e6ae365e063bf431db7fa1036eee07.tar.gz
1 files changed, 116 insertions, 56 deletions
diff --git a/nn/runtime/include/NeuralNetworks.h b/nn/runtime/include/NeuralNetworks.h
index beaf6befc..0be52f2ec 100644
--- a/nn/runtime/include/NeuralNetworks.h
+++ b/nn/runtime/include/NeuralNetworks.h
@@ -686,104 +686,165 @@ typedef enum {
     ANEURALNETWORKS_LSH_PROJECTION = 15,
 
     /**
-     * Long short-term memory unit (LSTM) recurrent network layer.
-     *
-     * The default non-peephole implementation is based on:
-     * http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf
+     * Performs a single time step in a Long Short-Term Memory (LSTM) layer
+     *
+     * The LSTM operation is described by the following equations.
+     *
+     * \f{eqnarray*}{
+     * i_t =& \sigma(W_{xi}x_t+W_{hi}h_{t-1}+W_{ci}C_{t-1}+b_i) & \\
+     * f_t =& \sigma(W_{xf}x_t+W_{hf}h_{t-1}+W_{cf}C_{t-1}+b_f) & \\
+     * C_t =& clip(f_t \odot C_{t-1} + i_t \odot g(W_{xc}x_t+W_{hc}h_{t-1}+b_c),\ t_{cell})& \\
+     * o_t =& \sigma(W_{xo}x_t+W_{ho}h_{t-1}+W_{co}C_t+b_o)& \\
+     *      & clip(W_{proj}(o_t \odot g(C_t))+b_{proj},\ t_{proj}) & if\ there\ is\ a\ projection; \\
+     * h_t =& & \\
+     *      & o_t \odot g(C_t) & otherwise. \\
+     * \f}
+     * Where:
+     * * \f$x_t\f$ is the input,
+     * * \f$i_t\f$ is the input gate,
+     * * \f$f_t\f$ is the forget gate,
+     * * \f$C_t\f$ is the cell state,
+     * * \f$o_t\f$ is the output,
+     * * \f$h_t\f$ is the output state,
+     * * \f$\sigma\f$ is the logistic sigmoid function,
+     * * \f$g\f$ is the cell input and cell output activation function, usually \f$tahn\f$,
+     * * \f$W_{xi}\f$ is the input-to-input weight matrix,
+     * * \f$W_{hi}\f$ is the recurrent to input weight matrix,
+     * * \f$W_{ci}\f$ is the cell-to-input weight matrix,
+     * * \f$b_i\f$ is the input gate bias,
+     * * \f$W_{xf}\f$ is the input-to-forget weight matrix,
+     * * \f$W_{hf}\f$ is the recurrent-to-forget weight matrix,
+     * * \f$W_{cf}\f$ is the cell-to-forget weight matrix,
+     * * \f$b_f\f$ is the forget gate bias,
+     * * \f$W_{xc}\f$ is the input-to-cell weight matrix,
+     * * \f$W_{hc}\f$ is the recurrent-to-cell weight matrix,
+     * * \f$b_c\f$ is the cell bias,
+     * * \f$W_{xo}\f$ is the input-to-output weight matrix,
+     * * \f$W_{ho}\f$ is the recurrent-to-output weight matrix,
+     * * \f$W_{co}\f$ is the cell-to-output weight matrix,
+     * * \f$b_o\f$ is the output gate bias,
+     * * \f$W_{proj}\f$ is the projection weight matrix,
+     * * \f$b_{proj}\f$ is the projection bias,
+     * * \f$t_{cell}\f$ is the threshold for clipping the cell state, and
+     * * \f$t_{proj}\f$ is the threshold for clipping the projected output.
+     * * \f$\odot\f$ is the <a href="https://en.wikipedia.org/wiki/Hadamard_product_(matrices)">
+     *   Hadamard product</a> that takes two matrices and produces another
+     *   matrix, each element of which is the product of the corresponding
+     *   elements of the input matrices.
+     *
+     * The operation has the following independently optional inputs:
+     * * The input-to-input weights (\f$W_{xi}\f$), recurrent-to-input weights (\f$W_{hi}\f$),
+     *   cell-to-input (\f$W_{ci}\f$) weights, and input gate bias (\f$b_i\f$) either all have values,
+     *   or none of them have values (i.e., all set to null). If they have no
+     *   values, coupling of input and forget gates (CIFG) is used, in which case
+     *   the input gate (\f$i_t\f$) is calculated using the following equation instead.
+     *   \f{eqnarray*}{
+     *   i_t = 1 - f_t
+     *   \f}
+     * * The cell-to-input weights (\f$W_{ci}\f$), cell-to-forget weights (\f$W_{cf}\f$), and cell-to-output
+     *   weights (\f$W_{co}\f$) either all have values or none of them have values.
+     *   If they have values, the peephole optimization is used.
+     * * The projection weights (\f$W_{proj}\f$) is required only for the recurrent projection
+     *   layer, and should otherwise have no value.
+     * * The projection bias (\f$b_{proj}\f$) may (but not required to) have a value if the
+     *   recurrent projection layer exists, and should otherwise have no value.
+     *
+     * References:
+     *
+     * The default non-peephole non-CIFG implementation is based on:
+     * http://www.bioinf.jku.at/publications/older/2604.pdf
      * S. Hochreiter and J. Schmidhuber. "Long Short-Term Memory". Neural
      * Computation, 9(8):1735-1780, 1997.
      *
-     * The peephole implementation is based on:
+     * The peephole implementation and projection layer is based on:
      * https://research.google.com/pubs/archive/43905.pdf
      * Hasim Sak, Andrew Senior, and Francoise Beaufays. "Long short-term memory
      * recurrent neural network architectures for large scale acoustic modeling."
      * INTERSPEECH, 2014.
+     * (However, the concept of peephole optimization was introduced in work
+     * prior to this paper.)
      *
      * The coupling of input and forget gate (CIFG) is based on:
      * http://arxiv.org/pdf/1503.04069.pdf
      * Greff et al. "LSTM: A Search Space Odyssey"
      *
-     * The class has the following independently optional inputs:
-     * * If input gate (if CIFG): “input_to_forget_weights”,
-     *   “recurrent_to_input_weights”, “cell_to_input_weights”, “input_gate_bias”.
-     * * If no peephole connections: “cell_to_input_weights”,
-     *   “cell_to_forget_weights”, “cell_to_output_weights”.
-     * * If no projection layer: “projection_weights” and “projection_bias”.
-     * * If no projection bias: “projection_bias”.
-     *
      * Supported tensor types (type T):
      * * {@link ANEURALNETWORKS_TENSOR_FLOAT32}
      *
      * Inputs:
-     * * 0: Input.
+     * * 0: The input (\f$x_t\f$).
      *      A 2-D tensor of type T, of shape [batch_size, input_size], where
      *      “batch_size” corresponds to the batching dimension, and “input_size”
      *      is the size of the input.
-     * * 1: input_to_input_weights.
+     * * 1: The input-to-input weights (\f$W_{xi}\f$). Optional.
      *      A 2-D tensor of type T, of shape [num_units, input_size], where
      *      “num_units” corresponds to the number of cell units.
-     * * 2: input_to_forget_weights.
+     * * 2: The input-to-forget weights (\f$W_{xf}\f$).
      *      A 2-D tensor of type T, of shape [num_units, input_size].
-     * * 3: input_to_cell_weights.
+     * * 3: The input-to-cell weights (\f$W_{xc}\f$).
      *      A 2-D tensor of type T, of shape [num_units, input_size].
-     * * 4: input_to_output_weights.
+     * * 4: The input-to-output weights (\f$W_{xo}\f$).
      *      A 2-D tensor of type T, of shape [num_units, input_size].
-     * * 5: recurrent_to_input_weights.
+     * * 5: The recurrent-to-input weights (\f$W_{hi}\f$). Optional.
      *      A 2-D tensor of type T, of shape [num_units, output_size], where
      *      “output_size” corresponds to either the number of cell units (i.e.,
      *      “num_units”), or the second dimension of the “projection_weights”, if
      *      defined.
-     * * 6: recurrent_to_forget_weights.
+     * * 6: The recurrent-to-forget weights (\f$W_{hf}\f$).
      *      A 2-D tensor of type T, of shape [num_units, output_size].
-     * * 7: recurrent_to_cell_weights.
+     * * 7: The recurrent-to-cell weights (\f$W_{hc}\f$).
      *      A 2-D tensor of type T, of shape [num_units, output_size].
-     * * 8: recurrent_to_output_weights.
+     * * 8: The recurrent-to-output weights (\f$W_{ho}\f$).
      *      A 2-D tensor of type T, of shape [num_units, output_size].
-     * * 9: cell_to_input_weights.
+     * * 9: The cell-to-input weights (\f$W_{ci}\f$). Optional.
      *      A 1-D tensor of type T, of shape [num_units].
-     * * 10:cell_to_forget_weights.
+     * * 10:The cell-to-forget weights (\f$W_{cf}\f$). Optional.
      *      A 1-D tensor of type T, of shape [num_units].
-     * * 11:cell_to_output_weights.
+     * * 11:The cell-to-output weights (\f$W_{co}\f$). Optional.
      *      A 1-D tensor of type T, of shape [num_units].
-     * * 12:input_gate_bias.
+     * * 12:The input gate bias (\f$b_i\f$). Optional.
      *      A 1-D tensor of type T, of shape [num_units].
-     * * 13:forget_gate_bias.
+     * * 13:The forget gate bias (\f$b_f\f$).
      *      A 1-D tensor of type T, of shape [num_units].
-     * * 14:cell_bias.
+     * * 14:The cell bias (\f$b_c\f$).
      *      A 1-D tensor of type T, of shape [num_units].
-     * * 15:output_gate_bias.
+     * * 15:The output gate bias (\f$b_o\f$).
      *      A 1-D tensor of type T, of shape [num_units].
-     * * 16:projection_weights.
+     * * 16:The projection weights (\f$W_{proj}\f$). Optional.
      *      A 2-D tensor of type T, of shape [output_size, num_units].
-     * * 17:projection_bias.
+     * * 17:The projection bias (\f$b_{proj}\f$). Optional.
      *      A 1-D tensor of type T, of shape [output_size].
-     * * 18: output_state (in).
+     * * 18:The output state (in) (\f$h_{t-1}\f$).
      *      A 2-D tensor of type T, of shape [batch_size, output_size].
-     * * 19: cell_state (in).
+     * * 19:The cell state (in) (\f$C_{t-1}\f$).
      *      A 2-D tensor of type T, of shape [batch_size, num_units].
-     * * 20:fused_activation_function.
-     *      An optional {@link FuseCode} value indicating the activation
-     *      function.
-     *      If “NONE” is specified then it results in a linear activation.
-     * * 21:cell_clip.
-     *      A clipping threshold for the cell state, such that values are bound
+     * * 20:The activation function (\f$g\f$).
+     *      A value indicating the activation function:
+     *      <ul>
+     *      <li>0: None;
+     *      <li>1: Relu;
+     *      <li>3: Relu6;
+     *      <li>4: Tanh;
+     *      <li>6: Sigmoid.
+     *      </ul>
+     * * 21:The clipping threshold (\f$t_{cell}\f$) for the cell state, such that values are bound
      *      within [-cell_clip, cell_clip]. If set to 0.0 then clipping is
      *      disabled.
-     * * 22:proj_clip.
-     *      A clipping threshold for the output from the projection layer, such
+     * * 22:The clipping threshold (\f$t_{proj}\f$) for the output from the projection layer, such
      *      that values are bound within [-proj_clip, proj_clip]. If set to 0.0
      *      then clipping is disabled.
      *
      * Outputs:
-     * * 0: scratch_buffer.
-     *      A 3-D tensor of type T, of shape [batch_size, num_cell, 4].
-     * * 1: output_state (out).
+     * * 0: The scratch buffer.
+     *      A 2-D tensor of type T, of shape [batch_size, num_units * 4] with
+     *      CIFG, or [batch_size, num_units * 3] without CIFG.
+     * * 1: The output state (out) (\f$h_t\f$).
      *      A 2-D tensor of type T, of shape [batch_size, output_size].
-     * * 2: cell_state (out).
+     * * 2: The cell state (out) (\f$C_t\f$).
      *      A 2-D tensor of type T, of shape [batch_size, num_units].
-     * * 3: output.
+     * * 3: The output (\f$o_t\f$).
      *      A 2-D tensor of type T, of shape [batch_size, output_size]. This is
-     *      effectively the same as the current “output_state” value.
+     *      effectively the same as the current “output state (out)” value.
      */
     ANEURALNETWORKS_LSTM = 16,
 
@@ -1093,9 +1154,8 @@ typedef enum {
      *
      * Specifically, for rank 1, this layer implements the operation:
      *
-     *    memory = push(conv1d(inputs, weights_feature, feature_dim,
-     *                  "ANEURALNETWORKS_PADDING_VALID"));
-     *    outputs = activation(memory * weights_time + bias);
+     *     memory = push(conv1d(inputs, weights_feature, feature_dim, "ANEURALNETWORKS_PADDING_VALID"));
+     *     outputs = activation(memory * weights_time + bias);
      *
      * Where:
      * * “weights_feature” is a weights matrix that processes the inputs (by
@@ -1279,10 +1339,10 @@ typedef struct ANeuralNetworksMemory ANeuralNetworksMemory;
  * ANeuralNetworksModel is an opaque type that contains a description of the
  * mathematical operations that constitute the model.
  *
- * <p>The model will be built by calling<ul>
- * <li>{@link ANeuralNetworksModel_create},</li>
- * <li>{@link ANeuralNetworksModel_addOperation},</li>
- * <li>{@link ANeuralNetworksModel_addOperand},</li>
+ * <p>Build the model by calling<ul>
+ * <li>{@link ANeuralNetworksModel_create}</li>
+ * <li>{@link ANeuralNetworksModel_addOperation}</li>
+ * <li>{@link ANeuralNetworksModel_addOperand}</li>
  * </ul>
  *
  * A model is completed by calling {@link ANeuralNetworksModel_finish}.
@@ -1779,7 +1839,7 @@ int ANeuralNetworksExecution_setInput(ANeuralNetworksExecution* execution, int32
  * <p>The provided memory must outlive the execution.</p>
  *
  * If the input is optional, you can indicate that it is omitted by
- * using @{Link ANeuralNetworks_setInput} instead, passing nullptr for buffer
+ * using {@link ANeuralNetworks_setInput} instead, passing nullptr for buffer
  * and 0 for length.
  *
  * See {@link ANeuralNetworksExecution} for information on multithreaded usage.
@@ -1843,7 +1903,7 @@ int ANeuralNetworksExecution_setOutput(ANeuralNetworksExecution* execution, int3
  * {@link ANeuralNetworksExecution}.
  *
  * If the output is optional, you can indicate that it is omitted by
- * using @{Link ANeuralNetworks_setOutput} instead, passing nullptr for buffer
+ * using {@link ANeuralNetworks_setOutput} instead, passing nullptr for buffer
  * and 0 for length.
  *
  * <p>The provided memory must outlive the execution.</p>
author	Yang Ni <yangni@google.com>	2017-09-29 15:44:14 -0700
committer	Yang Ni <yangni@google.com>	2017-10-24 15:33:32 -0700
commit	c308618ad6e6ae365e063bf431db7fa1036eee07 (patch)
tree	f2f86d00d1bd6a43b987046f0c36f638f452419e /nn/runtime/include/NeuralNetworks.h
parent	f315e6afa7cff2b271a4c4c8f0643d43508f92db (diff)
download	ml-c308618ad6e6ae365e063bf431db7fa1036eee07.tar.gz