I am fairly new to TF, Keras and ML in general.
I am trying to implement a very simple MLP with an input shape of (batch_size,3,2) and an output shape of (batch_size,3), that is (if I got it right): for every 3x2 feature, there is a corresponding 3 value array label.
Here is how I create the model:
model = tf.keras.Sequential([
tf.keras.layers.Dense(50,tf.keras.activations.relu,input_shape=((3,2)),
tf.keras.layers.Dense(3)
])
and these are the X and y shapes:
X_train.shape,y_train.shape
TensorShape([64,3,2]),TensorShape([64,3])
On model.fit I am facing a weird error I cannot understand:
ValueError: Dimensions must be equal, but are 3 and 32 for ... with input shapes: [32,3,3] and [32,3]
I have no clue what's going on, I understand the batch size is 32, but where does that [32,3,3] comes from?
Moreover, if from the original 64, I lower the number (shapes) of X_train and y_train, say, to: (19,3,2) and (19,3), I get the following error instead:
InvalidArgumentError: required broadcastable shapes at loc(unknown)
What's even more weird for me is that if I specify a single unit for the output (last) layer, instead of 3 like this:
model = tf.keras.Sequential([
tf.keras.layers.Dense(50,tf.keras.activations.relu,input_shape=((3,2)),
tf.keras.layers.Dense(1)
])
model.fit works, but the predictions have shape (1,3,1) instead of my expected (3,)
I am very confused.
Whenever you have not any idea about the journey of data throughout your model, use model.summary() to see the details and what happens to the shape of data in each layer.
In this case, the input is a 2D array, and the output is a 1D array, and you just used dense layers. Dense layers can not handle 2d features in nature. For example for an image as input, you can not feed it directly to a dense layer. Instead you should use other layers such as Conv2D or Flatten your input (make it 1D) before feeding your data to the dense layer. Otherwise you will get the other dimension in the output.
Inference: If your input dimension and output dimension differs, somewhere in your model, the shape need to be changed. Most common ways to do so, is using a Flatten layer or GlobalAveragePooling and so on.
When you pass an input to a dense layer, the input should be flattened first. There are 2 ways to deal with this:
Way 1: Adding a flatten input as a first layer of your model:
model = Sequential()
model.add(Flatten(input_shape=(3,2)))
model.add(Dense(50, 'relu'))
model.add(Dense(3))
Way 2: Converting the 2D array to 1D before passing the inputs to your model:
X_train = tf.reshape(X_train, shape=([6]))
or
X_train = tf.reshape(X_train, shape=((6,)))
Then change the input shape of the first layer as:
model.add(Dense(50, 'relu', input_shape=(6,))
Related
I'm learning tensorflow 2 working through the text classification with TF hub tutorial. It used an embedding module from TF hub. I was wondering if I could modify the model to include a LSTM layer. Here's what I've tried:
train_data, validation_data, test_data = tfds.load(
name="imdb_reviews",
split=('train[:60%]', 'train[60%:]', 'test'),
as_supervised=True)
embedding = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"
hub_layer = hub.KerasLayer(embedding, input_shape=[],
dtype=tf.string, trainable=True)
model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Embedding(10000, 50))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(1))
model.summary()
model.compile(optimizer='adam',
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_data.shuffle(10000).batch(512),
epochs=10,
validation_data=validation_data.batch(512),
verbose=1)
results = model.evaluate(test_data.batch(512), verbose=2)
for name, value in zip(model.metrics_names, results):
print("%s: %.3f" % (name, value))
I don't know how to get the vocabulary size from the hub_layer. So I just put 10000 there. When run it, it throws this exception:
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[480,1] = -6 is not in [0, 10000)
[[node sequential/embedding/embedding_lookup (defined at .../learning/tensorflow/text_classify.py:36) ]] [Op:__inference_train_function_36284]
Errors may have originated from an input operation.
Input Source operations connected to node sequential/embedding/embedding_lookup:
sequential/embedding/embedding_lookup/34017 (defined at Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py:112)
Function call stack:
train_function
I stuck here. My questions are:
how should I use the embedding module from TF hub to feed an LSTM layer? it looks like embedding lookup has some issues with the setting.
how do I get the vocabulary size from the hub layer?
Thanks
Finally figured out the way to link pre-trained embeddings to LSTM or other layers. Just post the steps here in case anyone feels helpful.
Embedding layer has to be the first layer in the model. (hub_layer is the same as Embedding layer.) The not very intuitive part is that any text input to the hub layer will be converted to only one vector of shape [embedding_dim]. You need to do sentence splitting and tokenization to make sure whatever input to the model is a sequence in the form of array of arrays. e.g., "Let us prepare the data." should be converted to [["let"],["us"],["prepare"], ["the"], ["data"]]. You will also need to pad the sequences if you are using batch mode.
In addition, you will need to convert your target tokens to int if your training labels are strings. The input to the model is array of strings with shape [batch, seq_length], the hub embedding layer converts it to [batch, seq_length, embed_dim]. (If you add a LSTM or other RNN layer, the output from the layer is [batch, seq_length, rnn_units]. ) The output dense layer will output index of text instead of actual text. The index of text is stored in the downloaded tfhub directory as "tokens.txt". You can load the file and convert text to the corresponding index. Otherwise you cannot compute the loss.
I am training in a dataset in which (some of) the neighboring features exhibit very strong correlations. In order to help the neural network, I am thinking of adding some 1D convolutions as the first layers. Even though 1D convolutions are mostly used to time series/nlp data, I see no theoretical reason why they cannot be used vector-wise in any type of data.
But I am not able to make keras.layers.Conv1D work, since its apparently designed for time-series data. A MRV example is the following:
model = keras.Sequential([
keras.layers.Input(10,),
keras.layers.Conv1D(filters=64, kernel_size=3, activation='relu', name="conv_1"),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(2, activation='softmax')
])
model.compile(optimizer='adam', loss=losses.categorical_crossentropy, metrics=['accuracy'])
ValueError: Input 0 of layer conv_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 10]
In that, I believe the "found ndim=2" corresponds to a tensor of [batch_size, 10] while it expects a tensor of shape [series_length, batch_size, 10] (or some other way around).
My question is: Is there a way to make 1D convolutions work in this situation in keras?
Note 1: this SO question has the same problem, though without elaborating and the accepted answer does not solve the problem.
Note 2: I suppose I can convert each datapoint of my dataset to a 2D tensor of two rows where the second would be just 0's and use Conv2D's, but I would like to avoid that.
In all Ccnv layers in Keras there is one dimension defined for the number of channels. For example you can have an image which has 2 Dimensions but Conv2D needs 3 dimension (without batch). The reason is simply because the image can have one channel (gray scale) or 3 for example (colored). the same is true for a 1D signal which can be any signal with any number of channels. you can simply add one dimension to you data. if you have an numpy array:
data = data[:, np.newaxis, :] and setting channels_first keras.layers.Conv1D(filters=64, kernel_size=3, activation='relu', name="conv_1, data_format="channels_first"). you can do the same through adding extra dimension at the end and setting `data_format="channels_last"
Let us say that I build an extreamly simple CNN with Keras to classify vectors.
My input (X_train) is a matrix in which each row is a vector and each column is a feature. My input labels (y_train) is matrix where each line is a one hot encoded vector. This is a binary classifier.
my CNN is built as follows:
model = Sequential()
model.add(Conv1D(64,3))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dense(2))
model.add(Activation('sigmoid'))
model.compile(loss = 'binary_crossentropy', optimizer = 'adam', matrics =
['accuracy'])
model.fit(X_train,y_train,batch_size = 32)
But when I try to run this code, I get back this error message:
Input 0 is incompatible with layer conv1d_23: expected ndim=3, found
ndim=2
why would keras expect 3 dims? one dim for samples, and one for features. And more importantly, how can I fix this?
X_train is suppose to have the shape: (batch_size, steps, input_dim), see documentation. It seems like you are missing one of the dimensions.
I would guess input_dim in your case is 1 and that is why it is missing. If so, change the
model.fit
line to
model.fit(tf.expand_dims(X_train,-1), y_train,batch_size = 32)
Your code is not a minimal working example, so I am not able to verify if that is the only problem, but this should hopefully fix your current error message.
A Conv1D layer expects an input with shape (samples, width, channels), so this does not match your input data, producing an error.
The convolution operation is done on the width dimension, so assuming that you want to do convolution on what you call features, then you should reshape your data to add a dummy channels dimension with a value of one:
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
I want to feed a sparse tensor into a dense layer
inputs1 = tf.sparse_placeholder(tf.float32, shape=[None, 500], name='input1')
model1 = tf.layers.dense(inputs=inputs1, units=128, name='dense1')
When I execute this I get the following error
ValueError: The last dimension of the inputs to `Dense` should be defined. Found `None`
If I change sparse_placeholder to regular place_holder I don't get this error.
I recommend you use FeatureColumn when you try to do this. First create a column representing your sparse tensor, then build an input layer. Finally, feed this input layer to your dense layer. This will help your code make your intention clear; do you want this to be a one-hot tensor? do you want embeddings? etc.
I would like to feed a neural net inputs of following shape:
Each training entry is a 2D array with dimensions 700x10. There are in total 204 training entries.
Labels is just 1-dimensional array of size 204 (binary output)
I tried to just use Dense layers:
model = Sequential()
model.add(Dense(300, activation='relu', input_shape=(700, 10)))
model.add(Dense(1, activation='sigmoid'))
But then I am getting following error (not related to input_shape on the first layer, but during validation of output):
ValueError: Error when checking target: expected dense_2 to have 3 dimensions, but got array with shape (204, 1)
204 - amount of training data.
Stacktrace:
model.fit(xTrain, yTrain, epochs=4, batch_size=6)
File "keras\models.py", line 867, in fit
initial_epoch=initial_epoch)
File "keras\engine\training.py", line 1522, in fit
batch_size=batch_size)
File "keras\engine\training.py", line 1382, in _standardize_user_data
exception_prefix='target')
File "keras\engine\training.py", line 132, in _standardize_input_data
What I found out while debugging Keras code:
It fails during validation before training. It validates output array.
According to the neural network structure, first Dense layer produces somehow 700, 1 dimensional output and it fails afterwards, since my output is just 1-d array with 204 in it.
How do I overcome this issue? I tried to add Flatten() after Dense() layer, but it probably affects accuracy in a bad way: I would like to keep information specific to one point from 700 array grouped.
The Dense layers works on only one dimension, the last.
If you're inputting (700,10) to it, it will output (700,units). Check your model.summary() to see this.
A simple solution is to flatten your data before applying dense:
model.add(Flatten(input_shape=(700,10)))
model.add(Dense(300,...))
model.add(Dense(1,...))
This way, the Dense layer will see a simple (7000,) input.
Now if you do want your model to understand those 2 dimensions separately, you should perhaps try more elaborated structures. What to do will depend a lot on what your data is and what you want to do, how you want your model to understand it, etc.