i'm trying to build classifier for my study using bert and keras.
i got bert encoding (shape - (1,X,768)) when X is the number of words and spaces in the sentence.
how can i build keras model if X is not consistent?
Related
Prior to passing my tokens through encoder in BERT model, I would like to perform some processing on their embeddings. I extracted the embedding weight using:
from transformers import TFBertModel
# Load a pre-trained BERT model
model = TFBertModel.from_pretrained('bert-base-uncased')
# Get the embedding layer of the model
embedding_layer = model.get_layer('bert').get_input_embeddings()
# Extract the embedding weights
embedding_weights = embedding_layer.get_weights()
I found it contains 5 elements as shown in Figure.
enter image description here
In my understanding, the first three elements are the word embedding weights, token type embedding weights, and positional embedding weights. My question is what does the last two elements stand for?
I dive deep into the source code of bert model. But I cannot figure out the meaning of the last two elements.
In bert model, there is a post-processing of the embedding tensor that uses layer normalization followed by dropout ,
https://github.com/google-research/bert/blob/eedf5716ce1268e56f0a50264a88cafad334ac61/modeling.py#L362
I think that those two arrays are the gamma and beta of the normalization layer, https://www.tensorflow.org/api_docs/python/tf/keras/layers/LayerNormalization
They are learned parameters, and will span the axes of inputs specified in param "axis" which defaults to -1 (corresponding to 768 in embedding tensor).
I created several custom CNN models in TensorFlow 1.14 and output them as frozen graphs. Then I import the frozen.pb files into 'netron.app' to check their structures. Oddly, there are two elements at the input: x and Identity. Here is a screenshot.custom model input
But when I froze a pre-trained model, such as the mobileNetV2, the input only has the x. pre-trained mobileNetV2 input
Does anyone have a clue why?
Do we have an option to save a trained Gensim Word2Vec model as a saved model using tf 2.0 tf.saved_model.save? In other words, how can I save a trained embedding vector as a saved model signature to work with tensorflow 2.0. The following steps are not correct normally:
model = gensim.models.Word2Vec(...)
model.init_sims(..)
model.train(..)
model.save(..)
module = gensim.models.KeyedVectors.load_word2vec(...)
tf.saved_model.save(
module,
export_dir
)
EDIT:
This example helped me about how to do it : https://keras.io/examples/nlp/pretrained_word_embeddings/
Gensim does not use TensorFlow and it has its own methods for loading and saving models.
You would need to convert Gensim embeddings into a TensorFlow a model which only makes sense if you further plan to use your embeddings within TensorFlow and possibly fine-tune them for your task.
Gensim Word2Vec are two steps in TensorFlow:
Vocabulary lookup: a table that assigns indices to tokens.
Embedding lookup layer that picks up the actual embeddings for the indices.
Then, you can save it as any other TensorFlow model.
We can build the model with tensorflow layers. Is there any way we can display the model summary as like in Keras.
Keras Model Summary
No, there is no such option. TensorFlow is a lot more generic than Keras and allows arbitrary graph architectures, so showing such a structured summary does not make sense for arbitrary TensorFlow graphs. The closest is probably TensorBoard, which has a very handy interactive graph visualization tool.
Keras is part of TensorFlow (for some time) so you can always get nice things like:
model.output_shape # model summary representation
model.summary() # model configuration
model.get_config() # list all weight tensors in the model
model.get_weights() # get weights and biases
The exercise from intro to deep learning this assignment. It uses bag-of-words to represent a tweet. How to use word embeddings to achieve the same?I played around the word2vec tool, I came across following questions:
(i) How to obtain pre-trained embeddings to represent these tweets? (To use word2vec directly instead of training these tweets for embedding vectors.)How to use word2vec to use such pre-trained model?
(ii) How to train a tensorflow 2 hidden layer architecture once we obtain embeddings from word2vec (i.e. dimensions will change due to embedding_size) or (continuation of previous bow model what will be additional changes due to embeddings)
Previously it was:
input dimension : (None, vocab_size)
Layer-1: (input_data * weights_1) + biases_1
Layer-2: (layer_1 * weights_2) + biases_2
output layer: (layer_2 * n_classes) + n_classes
output dimension: (None, n_classes)
(iii) Is it necessary to obtain embeddings for given data of tweets by training word2vec from scratch? How to train data of around 14k tweets using word2vec (not gensim or GloVe)? Will word2vec preprocess # as stopping word?