tf.keras.layers.Lambda

TensorFlow 1 version

View source on GitHub

Layer normalization layer (Ba et al., 2016).

Inherits From: Layer, Module

View aliases

Compat aliases for migration

See Migration guide for more details.

tf.compat.v1.keras.layers.LayerNormalization

tf.keras.layers.LayerNormalization(
    axis=-1, epsilon=0.001, center=True, scale=True,
    beta_initializer='zeros', gamma_initializer='ones',
    beta_regularizer=None, gamma_regularizer=None, beta_constraint=None,
    gamma_constraint=None, trainable=True, name=None, **kwargs
)

Used in the notebooks

Used in the tutorials
Transformer model for language understanding Normalizations

Normalize the activations of the previous layer for each given example in a batch independently, rather than across a batch like Batch Normalization. i.e. applies a transformation that maintains the mean activation within each example close to 0 and the activation standard deviation close to 1.

Given a tensor inputs, moments are calculated and normalization is performed across the axes specified in axis.

Example:

data = tf.constant(np.arange(10).reshape(5, 2) * 10, dtype=tf.float32)
print(data)
tf.Tensor(
[[ 0. 10.]
 [20. 30.]
 [40. 50.]
 [60. 70.]
 [80. 90.]], shape=(5, 2), dtype=float32)

layer = tf.keras.layers.LayerNormalization(axis=1)
output = layer(data)
print(output)
tf.Tensor(
[[-1. 1.]
 [-1. 1.]
 [-1. 1.]
 [-1. 1.]
 [-1. 1.]], shape=(5, 2), dtype=float32)

Notice that with Layer Normalization the normalization happens across the axes within each example, rather than across different examples in the batch.

If scale or center are enabled, the layer will scale the normalized outputs by broadcasting them with a trainable variable gamma, and center the outputs by broadcasting with a trainable variable beta. gamma will default to a ones tensor and beta will default to a zeros tensor, so that centering and scaling are no-ops before training has begun.

So, with scaling and centering enabled the normalization equations are as follows: Let the intermediate activations for a mini-batch to be the inputs.

For each sample x_i in inputs with k features, we compute the mean and variance of the sample:

  mean_i = sum(x_i[j] for j in range(k)) / k
  var_i = sum((x_i[j] - mean_i) ** 2 for j in range(k)) / k

and then compute a normalized x_i_normalized, including a small factor epsilon for numerical stability.

  x_i_normalized = (x_i - mean_i) / sqrt(var_i + epsilon)

And finally x_i_normalized is linearly transformed by gamma and beta, which are learned parameters:

  output_i = x_i_normalized * gamma + beta

gamma and beta will span the axes of inputs specified in axis, and this part of the inputs' shape must be fully defined.

For example:

layer = tf.keras.layers.LayerNormalization(axis=[1, 2, 3])
layer.build([5, 20, 30, 40])
print(layer.beta.shape)
(20, 30, 40)
print(layer.gamma.shape)
(20, 30, 40)

Note that other implementations of layer normalization may choose to define gamma and beta over a separate set of axes from the axes being normalized across. For example, Group Normalization (Wu et al. 2018) with group size of 1 corresponds to a Layer Normalization that normalizes across height, width, and channel and has gamma and beta span only the channel dimension. So, this Layer Normalization implementation will not match a Group Normalization layer with group size set to 1.

Arguments
`axis`	Integer or List/Tuple. The axis or axes to normalize across. Typically this is the features axis/axes. The left-out axes are typically the batch axis/axes. This argument defaults to `-1`, the last dimension in the input.
`epsilon`	Small float added to variance to avoid dividing by zero. Defaults to 1e-3
`center`	If True, add offset of `beta` to normalized tensor. If False, `beta` is ignored. Defaults to True.
`scale`	If True, multiply by `gamma`. If False, `gamma` is not used. Defaults to True. When the next layer is linear (also e.g. `nn.relu`), this can be disabled since the scaling will be done by the next layer.
`beta_initializer`	Initializer for the beta weight. Defaults to zeros.
`gamma_initializer`	Initializer for the gamma weight. Defaults to ones.
`beta_regularizer`	Optional regularizer for the beta weight. None by default.
`gamma_regularizer`	Optional regularizer for the gamma weight. None by default.
`beta_constraint`	Optional constraint for the beta weight. None by default.
`gamma_constraint`	Optional constraint for the gamma weight. None by default.
`trainable`	Boolean, if `True` the variables will be marked as trainable. Defaults to True.

Input shape: Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the samples axis) when using this layer as the first layer in a model. Output shape: Same shape as input. Reference:

Lei Ba et al., 2016.

TensorFlow

tf

tf.audio

tf.autograph

tf.bitwise

tf.compat

tf.config

tf.data

tf.debugging

tf.distribute

tf.dtypes

tf.errors

tf.estimator

tf.experimental

tf.feature_column

tf.graph_util

tf.image

tf.initializers

tf.io

tf.keras

tf.linalg

tf.lite

tf.lookup

tf.losses

tf.math

tf.metrics

tf.nest

tf.nn

tf.optimizers

tf.quantization

tf.queue

tf.ragged

tf.random

tf.raw_ops

tf.saved_model

tf.sets

tf.signal

tf.sparse

tf.strings

tf.summary

tf.sysconfig

tf.test

tf.tpu

tf.train

tf.version

tf.xla

tf.keras / layers / layers.Lambda

View aliases

Used in the notebooks

Example:

For example:

Arguments