TensorFlow keras expected dimension not expected LSTM - tensorflow

I'm trying to complete the LSTM composing using TensorFlow from
https://www.datacamp.com/tutorial/using-tensorflow-to-compose-music
I've got so far as the LSTM model, but a dimension error for the inputs is given. I've used the code provided in the tutorial. A new trainingset is created, but not converted, as for the auto encoder models in the examples above.
This piece of code is not included in the preparation step for the LSTM model
# Convert to one-hot encoding and swap chord and sequence dimensions
trainChords = tf.keras.utils.to_categorical(trainChords).transpose(0,2,1)
# Convert data to numpy array of type float
trainChords = np.array(trainChords, np.float)
# Flatten sequence of chords into single dimension
trainChordsFlat = trainChords.reshape(nSamples, nChordsSequence)
What do these steps do? Are they also required for the LSTM model?

Related

What does the embedding elements stand for in huggingFace bert model?

Prior to passing my tokens through encoder in BERT model, I would like to perform some processing on their embeddings. I extracted the embedding weight using:
from transformers import TFBertModel
# Load a pre-trained BERT model
model = TFBertModel.from_pretrained('bert-base-uncased')
# Get the embedding layer of the model
embedding_layer = model.get_layer('bert').get_input_embeddings()
# Extract the embedding weights
embedding_weights = embedding_layer.get_weights()
I found it contains 5 elements as shown in Figure.
enter image description here
In my understanding, the first three elements are the word embedding weights, token type embedding weights, and positional embedding weights. My question is what does the last two elements stand for?
I dive deep into the source code of bert model. But I cannot figure out the meaning of the last two elements.
In bert model, there is a post-processing of the embedding tensor that uses layer normalization followed by dropout ,
https://github.com/google-research/bert/blob/eedf5716ce1268e56f0a50264a88cafad334ac61/modeling.py#L362
I think that those two arrays are the gamma and beta of the normalization layer, https://www.tensorflow.org/api_docs/python/tf/keras/layers/LayerNormalization
They are learned parameters, and will span the axes of inputs specified in param "axis" which defaults to -1 (corresponding to 768 in embedding tensor).

How to get intermediate layers' output of pre-trained BERT model in HuggingFace Transformers library?

(I'm following this pytorch tutorial about BERT word embeddings, and in the tutorial the author is access the intermediate layers of the BERT model.)
What I want is to access the last, lets say, 4 last layers of a single input token of the BERT model in TensorFlow2 using HuggingFace's Transformers library. Because each layer outputs a vector of length 768, so the last 4 layers will have a shape of 4*768=3072 (for each token).
How can I implement this in TF/keras/TF2, to get the intermediate layers of pretrained model for an input token? (later I will try to get the tokens for each token in a sentence, but for now one token is enough).
I'm using the HuggingFace's BERT model:
!pip install transformers
from transformers import (TFBertModel, BertTokenizer)
bert_model = TFBertModel.from_pretrained("bert-base-uncased") # Automatically loads the config
bert_tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
sentence_marked = "hello"
tokenized_text = bert_tokenizer.tokenize(sentence_marked)
indexed_tokens = bert_tokenizer.convert_tokens_to_ids(tokenized_text)
print (indexed_tokens)
>> prints [7592]
The output is a token ([7592]), which should be the input of the for the BERT model.
The third element of the BERT model's output is a tuple which consists of output of embedding layer as well as the intermediate layers hidden states. From documentation:
hidden_states (tuple(tf.Tensor), optional, returned when config.output_hidden_states=True):
tuple of tf.Tensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
For the bert-base-uncased model, the config.output_hidden_states is by default True. Therefore, to access hidden states of the 12 intermediate layers, you can do the following:
outputs = bert_model(input_ids, attention_mask)
hidden_states = outputs[2][1:]
There are 12 elements in hidden_states tuple corresponding to all the layers from beginning to the last, and each of them is an array of shape (batch_size, sequence_length, hidden_size). So, for example, to access the hidden state of third layer for the fifth token of all the samples in the batch, you can do: hidden_states[2][:,4].
Note that if the model you are loading does not return the hidden states by default, then you can load the config using BertConfig class and pass output_hidden_state=True argument, like this:
config = BertConfig.from_pretrained("name_or_path_of_model",
output_hidden_states=True)
bert_model = TFBertModel.from_pretrained("name_or_path_of_model",
config=config)

How to correct shape of Keras input into a 3D array

I've a Keras model that when I fit fails with this error
> kerasInput = Input(shape=(None, 47))
> LSTM(..)(kerasInput)
...
> model.fit(realInput, ...)
ValueError: Error when checking input: expected input_1 to have 3 dimensions, but got array with shape (10842, 1)
When looking at my input I found it has a shape of (10842, 1) but for each row it's actually a list of list. I can verify with
> pd.DataFrame(realInput[0]).shape
(260, 47)
How I could correct my input shape?
When trying with keras Reshape layer, the creation of the model fails with:
Model inputs must come from `keras.layers.Input` (thus holding past layer metadata), they cannot be the output of a previous non-Input layer. Here, a tensor specified as input to your model was not an Input tensor, it was generated by layer reshape_8.
Note that input tensors are instantiated via `tensor = keras.layers.Input(shape)`.
The tensor that caused the issue was: reshape_8/Reshape:0
You can use numpy.expand_dims method to convert the shape to 3D.
import numpy as np
np.expand_dims(realInput,axis=0)
Reshape layer keras
https://keras.io/layers/core/#reshape
Use the third parameter as 1
# Something Similar to this
X_train = np.reshape(X_train,(X_train.shape[0],X_train.shape[1],1))
Edit: Added np.reshape method
Refer this repository: https://github.com/NilanshBansal/Stock_Price_Prediction/blob/master/Stock_Price_Prediction_20_days_later_4_LSTM.ipynb
As I said before in the comments. You will need to make sure to reshape your data to match what LSTM expects to receive and also make sure the input_shape is correctly set.
I found this post quite helpful when I struggled with inputting to an LSTM layer. I hope it helps you too : Reshape input for LSTM

MultiClass Keras Classifier prediction output meaning

I have a Keras classifier built using the Keras wrapper of the Scikit-Learn API. The neural network has 10 output nodes, and the training data is all represented using one-hot encoding.
According to Tensorflow documentation, the predict function outputs a shape of (n_samples,). When I fitted 514541 samples, the function returned an array with shape (514541, ), and each entry of the array ranged from 0 to 9.
Since I have ten different outputs, does the numerical value of each entry correspond exactly to the result that I encoded in my training matrix?
i.e. if index 5 of my one-hot encoding of y_train represents "orange", does a prediction value of 5 mean that the neural network predicted "orange"?
Here is a sample of my model:
model = Sequential()
model.add(Dropout(0.2, input_shape=(32,) ))
model.add(Dense(21, activation='selu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
There are some issues with your question.
The neural network has 10 output nodes, and the training data is all represented using one-hot encoding.
Since your network has 10 output nodes, and your labels are one-hot encoded, your model's output should also be 10-dimensional, and again hot-encoded, i.e. of shape (n_samples, 10). Moreover, since you use a softmax activation for your final layer, each element of your 10-dimensional output should be in [0, 1], and interpreted as the probability of the output belonging to the respective (one-hot encoded) class.
According to Tensorflow documentation, the predict function outputs a shape of (n_samples,).
It's puzzling why you refer to Tensorflow, while your model is clearly a Keras one; you should refer to the predict method of the Keras sequential API.
When I fitted 514541 samples, the function returned an array with shape (514541, ), and each entry of the array ranged from 0 to 9.
If something like that happens, it must be due to a later part in your code that you do not show here; in any case, the idea would be to find the argument with the highest value from each 10-dimensional network output (since they are interpreted as probabilities, it is intuitive that the element with the highest value would be the most probable). In other words, somewhere in your code there must be something like this:
pred = model.predict(x_test)
y = np.argmax(pred, axis=1) # numpy must have been imported as np
which will give an array of shape (n_samples,), with each y an integer between 0 and 9, as you report.
i.e. if index 5 of my one-hot encoding of y_train represents "orange", does a prediction value of 5 mean that the neural network predicted "orange"?
Provided that the above hold, yes.

Why does Keras to_categorical method not return 3-D tensor when inputting 2-D tensor?

I was trying to build a LSTM neural net with Keras to predict tags for words in a set of sentences.
The implementation is all pretty straight forward, but the surprising thing was that
given the exactly same and otherwise correctly implemented code and
using Tensorflow 1.4.0 with Keras running on Tensorflow Backend,
on some people's computers, it returned tensors with wrong dimensions, while for others it worked perfectly.
The problem occured in the following context:
First, we turned the list of training sentences (sentences as a list of word indeces) into a 2-D matrix using the pad_sequences method from Keras (https://keras.io/preprocessing/sequence/):
def do_padding(sequences, length, padding_value):
return pad_sequences(sequences, maxlen=length, padding='post',
truncating='post', value=padding_value)
train_sents_padded = do_padding(train_sents, MAX_LENGTH,
word_to_id[PAD_TOKEN])
Next, we used our do_padding method on the corresponding training labels to turn them into a padded matrix. At the same time, we used the Keras to_categorical method (https://keras.io/utils/#to_categorical) to add a one-hot encoded vector to the created label matrix (one one-hot vector for each cell in the matrix, that means for word in each training sentence):
train_labels_padded = to_categorical(do_padding(train_labels, MAX_LENGTH,
label_to_id["O"]), NUM_LABELS)
We expected the resulting shape to be 3-D: (len(train_labels), MAX_LENGTH, NUM_LABELS). Yet, we found that the resulting shape was 2-D and basically looked like this: ((len(train_labels) x MAX_LENGTH), NUM_LABELS), meaning the numbers on the two expected dimensions len(train_labels) and MAX_LENGTH were multiplied and flattened into one dimension.
Interestingly, this problem as said before only occured for about 50% of the people, using Tensorflow 1.4.0 and Keras running on Tensorflow Backend.
We managed to solve the problem by reshaping the label matrix manually:
train_labels_padded = np.reshape(train_labels_padded, (len(train_labels),
MAX_LENGTH, NUM_LABELS))
I was just wondering if any of you have experienced a similar problem and have figured out the reason why this happens.