textformer.models¶

Each neural network architecture is defined in this package. From Seq2Seq to Transformers, you can use whatever suits your needs.

A package contaning all models (networks) for all common textformer modules.

class textformer.models.AttSeq2Seq(n_input=128, n_output=128, n_hidden_enc=128, n_hidden_dec=128, n_embedding=128, dropout=0.5, ignore_token=None, init_weights=None, device='cpu')¶

Bases: textformer.core.model.Model

An AttSeq2Seq class implements an attention-based Sequence-To-Sequence learning architecture.

References

D. Bahdanau, K. Cho, Y. Bengio. Neural machine translation by jointly learning to align and translate. Preprint arXiv:1409.0473 (2014).

__init__(self, n_input=128, n_output=128, n_hidden_enc=128, n_hidden_dec=128, n_embedding=128, dropout=0.5, ignore_token=None, init_weights=None, device='cpu')¶

Initialization method.

Parameters

n_input (int) – Number of input units.
n_output (int) – Number of output units.
n_hidden_enc (int) – Number of hidden units in the Encoder.
n_hidden_dec (int) – Number of hidden units in the Decoder.
n_embedding (int) – Number of embedding units.
dropout (float) – Amount of dropout to be applied.
ignore_token (int) – The index of a token to be ignore by the loss function.
init_weights (tuple) – Tuple holding the minimum and maximum values for weights initialization.
device (str) – Device that model should be trained on, e.g., cpu or cuda.

bleu(self, dataset, src_field, trg_field, max_length=50, n_grams=4)¶

Calculates BLEU score over a dataset from its difference between targets and predictions.

Note that you will need to implement this method directly on its child. Essentially, each neural network has its own bleu implementation, due to having different translation methods.

Parameters

dataset (torchtext.data.Dataset) – Dataset to have its BLEU calculated.
src_field (torchtext.data.Field) – Source vocabulary datatype instructions for tensor convertion.
trg_field (torchtext.data.Field) – Target vocabulary datatype instructions for tensor convertion.
max_length (int) – Maximum length of translated text.
n_grams (int) – Maxmimum n-grams to be used.

Returns

BLEU score from input dataset.

forward(self, x, y, teacher_forcing_ratio=0.5)¶

Performs a forward pass over the architecture.

Parameters

x (torch.Tensor) – Tensor containing the data.
y (torch.Tensor) – Tensor containing the true labels.
teacher_forcing_ratio (float) – Whether the next prediction should come from the predicted sample or from the true labels.

Returns

The predictions over the input tensor.

generate_text(self, start, field, length=10, temperature=1.0)¶

Generates text by feeding to the network the current token (t) and predicting the next token (t+1).

Parameters

field (torchtext.data.Field) – Datatype instructions for tensor convertion.
start (str) – The start string to generate the text.
length (int) – Length of generated text.
temperature (float) – A temperature value to sample the token.

Returns

A list of generated text.

translate_text(self, start, src_field, trg_field, max_length=10)¶

Translates text from the source vocabulary to the target vocabulary.

Note that you will need to implement this method directly on its child. Essentially, each neural network has its own translation implementation.

Parameters

start (str) – The string to be translated.
src_field (torchtext.data.Field) – Source vocabulary datatype instructions for tensor convertion.
trg_field (torchtext.data.Field) – Target vocabulary datatype instructions for tensor convertion.
max_length (int) – Maximum length of translated text.

Returns

A list of translated text.

class textformer.models.ConvSeq2Seq(n_input=128, n_output=128, n_hidden=128, n_embedding=128, n_layers=1, kernel_size=3, dropout=0.5, scale=0.5, max_length=100, ignore_token=None, init_weights=None, device='cpu')¶

Bases: textformer.core.model.Model

A ConvSeq2Seq class implements a Convolutional Sequence-To-Sequence learning architecture.

References

J. Gehring, et al. Convolutional sequence to sequence learning. Proceedings of the 34th International Conference on Machine Learning (2017).

__init__(self, n_input=128, n_output=128, n_hidden=128, n_embedding=128, n_layers=1, kernel_size=3, dropout=0.5, scale=0.5, max_length=100, ignore_token=None, init_weights=None, device='cpu')¶

Initialization method.

Parameters

n_input (int) – Number of input units.
n_output (int) – Number of output units.
n_hidden (int) – Number of hidden units.
n_embedding (int) – Number of embedding units.
n_layers (int) – Number of convolutional layers.
kernel_size (int) – Size of the convolutional kernels.
dropout (float) – Amount of dropout to be applied.
scale (float) – Value for the residual learning.
max_length (int) – Maximum length of positional embeddings.
ignore_token (int) – The index of a token to be ignored by the loss function.
init_weights (tuple) – Tuple holding the minimum and maximum values for weights initialization.
device (str) – Device that model should be trained on, e.g., cpu or cuda.

bleu(self, dataset, src_field, trg_field, max_length=50, n_grams=4)¶

Calculates BLEU score over a dataset from its difference between targets and predictions.

Note that you will need to implement this method directly on its child. Essentially, each neural network has its own bleu implementation, due to having different translation methods.

Parameters

dataset (torchtext.data.Dataset) – Dataset to have its BLEU calculated.
src_field (torchtext.data.Field) – Source vocabulary datatype instructions for tensor convertion.
trg_field (torchtext.data.Field) – Target vocabulary datatype instructions for tensor convertion.
max_length (int) – Maximum length of translated text.
n_grams (int) – Maxmimum n-grams to be used.

Returns

BLEU score from input dataset.

forward(self, x, y, teacher_forcing_ratio=0.0)¶

Performs a forward pass over the architecture.

Parameters

x (torch.Tensor) – Tensor containing the data.
y (torch.Tensor) – Tensor containing the true labels.
teacher_forcing_ratio (float) – Whether the next prediction should come from the predicted sample or from the true labels.

Returns

The predictions over the input tensor.

generate_text(self, start, field, length=10, temperature=1.0)¶

Generates text by feeding to the network the current token (t) and predicting the next token (t+1).

Parameters

field (torchtext.data.Field) – Datatype instructions for tensor convertion.
start (str) – The start string to generate the text.
length (int) – Length of generated text.
temperature (float) – A temperature value to sample the token.

Returns

A list of generated text.

translate_text(self, start, src_field, trg_field, max_length=10)¶

Translates text from the source vocabulary to the target vocabulary.

Note that you will need to implement this method directly on its child. Essentially, each neural network has its own translation implementation.

Parameters

start (str) – The string to be translated.
src_field (torchtext.data.Field) – Source vocabulary datatype instructions for tensor convertion.
trg_field (torchtext.data.Field) – Target vocabulary datatype instructions for tensor convertion.
max_length (int) – Maximum length of translated text.

Returns

A list of translated text.

class textformer.models.JointSeq2Seq(n_input=128, n_output=128, n_hidden=128, n_embedding=128, dropout=0.5, ignore_token=None, init_weights=None, device='cpu')¶

Bases: textformer.core.model.Model

A JointSeq2Seq class implements an enhanced version (joint learning) of the Sequence-To-Sequence learning architecture.

References

K. Cho, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. Preprint arXiv:1406.1078 (2014).

__init__(self, n_input=128, n_output=128, n_hidden=128, n_embedding=128, dropout=0.5, ignore_token=None, init_weights=None, device='cpu')¶

Initialization method.

Parameters

n_input (int) – Number of input units.
n_output (int) – Number of output units.
n_hidden (int) – Number of hidden units.
n_embedding (int) – Number of embedding units.
dropout (float) – Amount of dropout to be applied.
ignore_token (int) – The index of a token to be ignore by the loss function.
init_weights (tuple) – Tuple holding the minimum and maximum values for weights initialization.
device (str) – Device that model should be trained on, e.g., cpu or cuda.

bleu(self, dataset, src_field, trg_field, max_length=50, n_grams=4)¶

Calculates BLEU score over a dataset from its difference between targets and predictions.

Note that you will need to implement this method directly on its child. Essentially, each neural network has its own bleu implementation, due to having different translation methods.

Parameters

dataset (torchtext.data.Dataset) – Dataset to have its BLEU calculated.
src_field (torchtext.data.Field) – Source vocabulary datatype instructions for tensor convertion.
trg_field (torchtext.data.Field) – Target vocabulary datatype instructions for tensor convertion.
max_length (int) – Maximum length of translated text.
n_grams (int) – Maxmimum n-grams to be used.

Returns

BLEU score from input dataset.

forward(self, x, y, teacher_forcing_ratio=0.5)¶

Performs a forward pass over the architecture.

Parameters

x (torch.Tensor) – Tensor containing the data.
y (torch.Tensor) – Tensor containing the true labels.
teacher_forcing_ratio (float) – Whether the next prediction should come from the predicted sample or from the true labels.

Returns

The predictions over the input tensor.

generate_text(self, start, field, length=10, temperature=1.0)¶

Generates text by feeding to the network the current token (t) and predicting the next token (t+1).

Parameters

field (torchtext.data.Field) – Datatype instructions for tensor convertion.
start (str) – The start string to generate the text.
length (int) – Length of generated text.
temperature (float) – A temperature value to sample the token.

Returns

A list of generated text.

translate_text(self, start, src_field, trg_field, max_length=10)¶

Translates text from the source vocabulary to the target vocabulary.

Note that you will need to implement this method directly on its child. Essentially, each neural network has its own translation implementation.

Parameters

start (str) – The string to be translated.
src_field (torchtext.data.Field) – Source vocabulary datatype instructions for tensor convertion.
trg_field (torchtext.data.Field) – Target vocabulary datatype instructions for tensor convertion.
max_length (int) – Maximum length of translated text.

Returns

A list of translated text.

class textformer.models.Seq2Seq(n_input=128, n_output=128, n_hidden=128, n_embedding=128, n_layers=1, dropout=0.5, ignore_token=None, init_weights=None, device='cpu')¶

Bases: textformer.core.model.Model

A Seq2Seq class implements a Sequence-To-Sequence learning architecture.

References

I. Sutskever, O. Vinyals, Q. Le. Sequence to sequence learning with neural networks. Advances in neural information processing systems (2014).

__init__(self, n_input=128, n_output=128, n_hidden=128, n_embedding=128, n_layers=1, dropout=0.5, ignore_token=None, init_weights=None, device='cpu')¶

Initialization method.

Parameters

n_input (int) – Number of input units.
n_output (int) – Number of output units.
n_hidden (int) – Number of hidden units.
n_embedding (int) – Number of embedding units.
n_layers (int) – Number of RNN layers.
dropout (float) – Amount of dropout to be applied.
ignore_token (int) – The index of a token to be ignore by the loss function.
init_weights (tuple) – Tuple holding the minimum and maximum values for weights initialization.
device (str) – Device that model should be trained on, e.g., cpu or cuda.

bleu(self, dataset, src_field, trg_field, max_length=50, n_grams=4)¶

Calculates BLEU score over a dataset from its difference between targets and predictions.

Note that you will need to implement this method directly on its child. Essentially, each neural network has its own bleu implementation, due to having different translation methods.

Parameters

dataset (torchtext.data.Dataset) – Dataset to have its BLEU calculated.
src_field (torchtext.data.Field) – Source vocabulary datatype instructions for tensor convertion.
trg_field (torchtext.data.Field) – Target vocabulary datatype instructions for tensor convertion.
max_length (int) – Maximum length of translated text.
n_grams (int) – Maxmimum n-grams to be used.

Returns

BLEU score from input dataset.

forward(self, x, y, teacher_forcing_ratio=0.5)¶

Performs a forward pass over the architecture.

Parameters

x (torch.Tensor) – Tensor containing the data.
y (torch.Tensor) – Tensor containing the true labels.
teacher_forcing_ratio (float) – Whether the next prediction should come from the predicted sample or from the true labels.

Returns

The predictions over the input tensor.

generate_text(self, start, field, length=10, temperature=1.0)¶

Generates text by feeding to the network the current token (t) and predicting the next token (t+1).

Parameters

field (torchtext.data.Field) – Datatype instructions for tensor convertion.
start (str) – The start string to generate the text.
length (int) – Length of generated text.
temperature (float) – A temperature value to sample the token.

Returns

A list of generated text.

translate_text(self, start, src_field, trg_field, max_length=10)¶

Translates text from the source vocabulary to the target vocabulary.

Note that you will need to implement this method directly on its child. Essentially, each neural network has its own translation implementation.

Parameters

start (str) – The string to be translated.
src_field (torchtext.data.Field) – Source vocabulary datatype instructions for tensor convertion.
trg_field (torchtext.data.Field) – Target vocabulary datatype instructions for tensor convertion.
max_length (int) – Maximum length of translated text.

Returns

A list of translated text.

class textformer.models.Transformer(n_input=128, n_output=128, n_hidden=128, n_forward=256, n_layers=1, n_heads=3, dropout=0.1, max_length=100, source_pad_index=None, target_pad_index=None, init_weights=None, device='cpu')¶

Bases: textformer.core.model.Model

A Transformer class implements a Transformer-based learning architecture.

References

Vaswani, et al. Attention is all you need. Advances in neural information processing systems (2017).

__init__(self, n_input=128, n_output=128, n_hidden=128, n_forward=256, n_layers=1, n_heads=3, dropout=0.1, max_length=100, source_pad_index=None, target_pad_index=None, init_weights=None, device='cpu')¶

Initialization method.

Parameters

n_input (int) – Number of input units.
n_output (int) – Number of output units.
n_hidden (int) – Number of hidden units.
n_forward (int) – Number of feed forward units.
n_layers (int) – Number of attention layers.
n_heads (int) – Number of attention heads.
dropout (float) – Amount of dropout to be applied.
max_length (int) – Maximum length of positional embeddings.
source_pad_index (int) – The index of source vocabulary padding token.
target_pad_index (int) – The index of target vocabulary padding token.
init_weights (tuple) – Tuple holding the minimum and maximum values for weights initialization.
device (str) – Device that model should be trained on, e.g., cpu or cuda.

bleu(self, dataset, src_field, trg_field, max_length=50, n_grams=4)¶

Calculates BLEU score over a dataset from its difference between targets and predictions.

Note that you will need to implement this method directly on its child. Essentially, each neural network has its own bleu implementation, due to having different translation methods.

Parameters

dataset (torchtext.data.Dataset) – Dataset to have its BLEU calculated.
src_field (torchtext.data.Field) – Source vocabulary datatype instructions for tensor convertion.
trg_field (torchtext.data.Field) – Target vocabulary datatype instructions for tensor convertion.
max_length (int) – Maximum length of translated text.
n_grams (int) – Maxmimum n-grams to be used.

Returns

BLEU score from input dataset.

create_source_mask(self, x)¶

Creates the source mask used in the encoding process.

Parameters: x (tf.Tensor) – Tensor holding the inputs.
Returns: Mask over inputs tensor.

create_target_mask(self, y)¶

Creates the target mask used in the decoding process.

Parameters: y (tf.Tensor) – Tensor holding the targets.
Returns: Mask over targets tensor.

forward(self, x, y, teacher_forcing_ratio=0.0)¶

Performs a forward pass over the architecture.

Parameters

x (torch.Tensor) – Tensor containing the data.
y (torch.Tensor) – Tensor containing the true labels.
teacher_forcing_ratio (float) – Whether the next prediction should come from the predicted sample or from the true labels.

Returns

The predictions over the input tensor.

generate_text(self, start, field, length=10, temperature=1.0)¶

Generates text by feeding to the network the current token (t) and predicting the next token (t+1).

Parameters

field (torchtext.data.Field) – Datatype instructions for tensor convertion.
start (str) – The start string to generate the text.
length (int) – Length of generated text.
temperature (float) – A temperature value to sample the token.

Returns

A list of generated text.

translate_text(self, start, src_field, trg_field, max_length=10)¶

Translates text from the source vocabulary to the target vocabulary.

Note that you will need to implement this method directly on its child. Essentially, each neural network has its own translation implementation.

Parameters

start (str) – The string to be translated.
src_field (torchtext.data.Field) – Source vocabulary datatype instructions for tensor convertion.
trg_field (torchtext.data.Field) – Target vocabulary datatype instructions for tensor convertion.
max_length (int) – Maximum length of translated text.

Returns

A list of translated text.