TensorFlow 1 version | View source on GitHub |
An deprecated optimizer that applies loss scaling.
Inherits From: LossScaleOptimizer
, Optimizer
tf.keras.mixed_precision.experimental.LossScaleOptimizer(
optimizer, loss_scale
)
Used in the notebooks
Used in the guide |
---|
This class is identical to the non-experimental
keras.mixed_precision.LossScaleOptimizer
except its constructor takes
different arguments. For this class (the experimental version), the
constructor takes a loss_scale
argument. For the non-experimental class,
the constructor encodes the loss scaling information in multiple arguments.
Note that unlike this class, the non-experimental class does not accept a
tf.compat.v1.mixed_precision.LossScale
, which is deprecated.
If you currently use this class, you should switch to the non-experimental
tf.keras.mixed_precision.LossScaleOptimizer
instead. We show several
examples of converting the use of the experimental class to the equivalent
non-experimental class.
# In all of the the examples below, `opt1` and `opt2` are identical
opt1 = tf.keras.mixed_precision.experimental.LossScaleOptimizer(
tf.keras.optimizers.SGD(), loss_scale='dynamic')
opt2 = tf.keras.mixed_precision.LossScaleOptimizer(
tf.keras.optimizers.SGD())
assert opt1.get_config() == opt2.get_config()
opt1 = tf.keras.mixed_precision.experimental.LossScaleOptimizer(
tf.keras.optimizers.SGD(), loss_scale=123)
# dynamic=False indicates to use fixed loss scaling. initial_scale=123
# refers to the initial loss scale, which is the single fixed loss scale
# when dynamic=False.
opt2 = tf.keras.mixed_precision.LossScaleOptimizer(
tf.keras.optimizers.SGD(), dynamic=False, initial_scale=123)
assert opt1.get_config() == opt2.get_config()
loss_scale = tf.compat.v1.mixed_precision.experimental.DynamicLossScale(
initial_loss_scale=2048, increment_period=500)
opt1 = tf.keras.mixed_precision.experimental.LossScaleOptimizer(
tf.keras.optimizers.SGD(), loss_scale=loss_scale)
opt2 = tf.keras.mixed_precision.LossScaleOptimizer(
tf.keras.optimizers.SGD(), initial_scale=2048,
dynamic_growth_steps=500)
assert opt1.get_config() == opt2.get_config()
Make sure to also switch from this class to the non-experimental class in
isinstance checks, if you have any. If you do not do this, your model may run
into hard-to-debug issues, as the experimental LossScaleOptimizer
subclasses
the non-experimental LossScaleOptimizer
, but not vice versa. It is safe to
switch isinstance checks to the non-experimental LossScaleOptimizer
even
before using the non-experimental LossScaleOptimizer
.
opt1 = tf.keras.mixed_precision.experimental.LossScaleOptimizer(
tf.keras.optimizers.SGD(), loss_scale='dynamic')
# The experimental class subclasses the non-experimental class
isinstance(opt1, tf.keras.mixed_precision.LossScaleOptimizer)
True
opt2 = tf.keras.mixed_precision.LossScaleOptimizer(
tf.keras.optimizers.SGD())
# The non-experimental class does NOT subclass the experimental class.
isinstance(opt2, tf.keras.mixed_precision.experimental.LossScaleOptimizer)
False
Args | |
---|---|
optimizer
|
The Optimizer instance to wrap. |
loss_scale
|
The loss scale to scale the loss and gradients. This can
either be an int/float to use a fixed loss scale, the string "dynamic"
to use dynamic loss scaling, or an instance of a LossScale. The string
"dynamic" equivalent to passing DynamicLossScale() , and passing an
int/float is equivalent to passing a FixedLossScale with the given loss
scale. If a DynamicLossScale is passed, DynamicLossScale.multiplier must
be 2 (the default).
|
Raises | |
---|---|
ValueError
|
in case of any invalid argument. |
Attributes | |
---|---|
dynamic
|
Bool indicating whether dynamic loss scaling is used. |
dynamic_counter
|
The number of steps since the loss scale was last increased or decreased.
This is None if The counter is incremented every step. Once it reaches
|
dynamic_growth_steps
|
The number of steps it takes to increase the loss scale.
This is None if Every |
initial_scale
|
The initial loss scale.
If |
inner_optimizer
|
The optimizer that this LossScaleOptimizer is wrapping. |
loss_scale
|
The current loss scale as a float32 scalar tensor. |
Methods
get_scaled_loss
get_scaled_loss(
loss
)
Scales the loss by the loss scale.
This method is only needed if you compute gradients manually, e.g. with
tf.GradientTape
. In that case, call this method to scale the loss before
passing the loss to tf.GradientTape
. If you use
LossScaleOptimizer.minimize
or LossScaleOptimizer.get_gradients
, loss
scaling is automatically applied and this method is unneeded.
If this method is called, get_unscaled_gradients
should also be called.
See the tf.keras.mixed_precision.LossScaleOptimizer
doc for
an example.
Args | |
---|---|
loss
|
The loss, which will be multiplied by the loss scale. Can either be a tensor or a callable returning a tensor. |
Returns | |
---|---|
loss multiplied by LossScaleOptimizer.loss_scale .
|
get_unscaled_gradients
get_unscaled_gradients(
grads
)
Unscales the gradients by the loss scale.
This method is only needed if you compute gradients manually, e.g. with
tf.GradientTape
. In that case, call this method to unscale the gradients
after computing them with tf.GradientTape
. If you use
LossScaleOptimizer.minimize
or LossScaleOptimizer.get_gradients
, loss
scaling is automatically applied and this method is unneeded.
If this method is called, get_scaled_loss
should also be called. See
the tf.keras.mixed_precision.LossScaleOptimizer
doc for an
example.
Args | |
---|---|
grads
|
A list of tensors, each which will be divided by the loss scale. Can have None values, which are ignored. |
Returns | |
---|---|
A new list the same size as grads , where every non-None value in grads
is divided by LossScaleOptimizer.loss_scale .
|