tf.keras.preprocessing.sequence.make_sampling

TensorFlow 1 version

View source on GitHub

Generates skipgram word pairs.

View aliases

Compat aliases for migration

See Migration guide for more details.

tf.compat.v1.keras.preprocessing.sequence.skipgrams

tf.keras.preprocessing.sequence.skipgrams(
    sequence, vocabulary_size, window_size=4, negative_samples=1.0, shuffle=True,
    categorical=False, sampling_table=None, seed=None
)

Used in the notebooks

Used in the tutorials
Word2Vec

This function transforms a sequence of word indexes (list of integers) into tuples of words of the form:

(word, word in the same window), with label 1 (positive samples).
(word, random word from the vocabulary), with label 0 (negative samples).

Read more about Skipgram in this gnomic paper by Mikolov et al.: Efficient Estimation of Word Representations in Vector Space

Arguments
`sequence`	A word sequence (sentence), encoded as a list of word indices (integers). If using a `sampling_table`, word indices are expected to match the rank of the words in a reference dataset (e.g. 10 would encode the 10-th most frequently occurring token). Note that index 0 is expected to be a non-word and will be skipped.
`vocabulary_size`	Int, maximum possible word index + 1
`window_size`	Int, size of sampling windows (technically half-window). The window of a word `w_i` will be `[i - window_size, i + window_size+1]`.
`negative_samples`	Float >= 0. 0 for no negative (i.e. random) samples. 1 for same number as positive samples.
`shuffle`	Whether to shuffle the word couples before returning them.
`categorical`	bool. if False, labels will be integers (eg. `[0, 1, 1 .. ]`), if `True`, labels will be categorical, e.g. `[[1,0],[0,1],[0,1] .. ]`.
`sampling_table`	1D array of size `vocabulary_size` where the entry i encodes the probability to sample a word of rank i.
`seed`	Random seed.

Returns
couples, labels: where `couples` are int pairs and `labels` are either 0 or 1.

Note:

By convention, index 0 in the vocabulary is a non-word and will be skipped.

TensorFlow

tf

tf.audio

tf.autograph

tf.bitwise

tf.compat

tf.config

tf.data

tf.debugging

tf.distribute

tf.dtypes

tf.errors

tf.estimator

tf.experimental

tf.feature_column

tf.graph_util

tf.image

tf.initializers

tf.io

tf.keras

tf.linalg

tf.lite

tf.lookup

tf.losses

tf.math

tf.metrics

tf.nest

tf.nn

tf.optimizers

tf.quantization

tf.queue

tf.ragged

tf.random

tf.raw_ops

tf.saved_model

tf.sets

tf.signal

tf.sparse

tf.strings

tf.summary

tf.sysconfig

tf.test

tf.tpu

tf.train

tf.version

tf.xla

tf.keras / preprocessing / preprocessing.sequence.make_sampling_table

View aliases

Used in the notebooks

Arguments

Returns

Note: