TensorFlow

API

 tf.strings / to_hash_bucket_strong


Decodes each string in input into a sequence of Unicode code points.

Used in the notebooks

Used in the guide Used in the tutorials

result[i1...iN, j] is the Unicode codepoint for the jth character in input[i1...iN], when decoded using input_encoding.

input An N dimensional potentially ragged string tensor with shape [D1...DN]. N must be statically known.
input_encoding String name for the unicode encoding that should be used to decode each string.
errors Specifies the response when an input string can't be converted using the indicated encoding. One of:
  • 'strict': Raise an exception for any illegal substrings.
  • 'replace': Replace illegal substrings with replacement_char.
  • 'ignore': Skip illegal substrings.
replacement_char The replacement codepoint to be used in place of invalid substrings in input when errors='replace'; and in place of C0 control characters in input when replace_control_characters=True.
replace_control_characters Whether to replace the C0 control characters (U+0000 - U+001F) with the replacement_char.
name A name for the operation (optional).

A N+1 dimensional int32 tensor with shape [D1...DN, (num_chars)]. The returned tensor is a tf.Tensor if input is a scalar, or a tf.RaggedTensor otherwise.

Example:

input = [s.encode('utf8') for s in (u'G\xf6\xf6dnight', u'\U0001f60a')]
tf.strings.unicode_decode(input, 'UTF-8').to_list()
[[71, 246, 246, 100, 110, 105, 103, 104, 116], [128522]]

此页内容是否对您有帮助