Tensorflow with Keras: ValueError - expected dense_84 to have 2 dimensions, but got array with shape (100, 9, 1) - tensorflow

I am trying to use Tensorflow through Keras to build a network that uses time-series data to predict the next value, but I'm getting this error:
ValueError: Error when checking target: expected dense_84 to have 2 dimensions, but got array with shape (100, 9, 1)
What is causing this? I've tried reshaping the data as other posts have suggested, but to no avail so far. Here is the code:
import keras
import numpy as np
import os
from keras import losses
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten, Dropout
from keras.layers.convolutional import Conv1D, Conv2D
# add the desktop to our path so we can access the data
os.path.join("C:\\Users\\user\\Desktop")
# import data
data = np.genfromtxt("C:\\Users\\user\\Desktop\\aapl_blocks_10.csv",
delimiter=',')
# separate into inputs and outputs
X = data[:, :9]
X = np.expand_dims(X, axis=2) # reshape (409, 9) to (409, 9, 1) for network
Y = data[:, 9]
# separate into test and train data
X_train = X[:100]
X_test = X[100:]
Y_train = Y[:100]
Y_test = Y[100:]
# set parameters
batch_size = 20;
# define model
model = Sequential()
model.add(Conv1D(filters=20,
kernel_size=5,
input_shape=(9, 1),
padding='causal'))
model.add(Flatten())
model.add(Dropout(rate=0.3))
model.add(Dense(units=10))
model.add(Activation('relu'))
model.add(Dense(units=1))
model.compile(loss=losses.mean_squared_error,
optimizer='sgd',
metrics=['accuracy'])
# train model
model.fit(X_train, Y_train, epochs=10, batch_size=batch_size)
# evaluate model
model.evaluate(X_test, Y_test, batch_size=batch_size)
And here is the model summary:
Layer (type) Output Shape Param #
=================================================================
conv1d_43 (Conv1D) (None, 9, 20) 120
_________________________________________________________________
flatten_31 (Flatten) (None, 180) 0
_________________________________________________________________
dropout_14 (Dropout) (None, 180) 0
_________________________________________________________________
dense_83 (Dense) (None, 10) 1810
_________________________________________________________________
activation_29 (Activation) (None, 10) 0
_________________________________________________________________
dense_84 (Dense) (None, 1) 11
=================================================================
Total params: 1,941
Trainable params: 1,941
Non-trainable params: 0
If there's a proper way to be formatting the data, or maybe a proper way to stack these layers, I would love to know.

I suspect you need to squeeze the channel dimension from the output, i.e. the labes are shape (batch_size, 9) and you're comparing that against the output of a dense layer with 1 channel which has size (batch_size, 9, 1). Solution: squeeze/flatten before calculating the loss.
...
model.add(Activation('relu'))
model.add(Dense(units=1))
model.add(Flatten())
model.compile(loss=losses.mean_squared_error,
optimizer='sgd',
metrics=['accuracy'])
A note on squeeze vs Flatten: in this case, the result of squeezing (removing an axis of dimension 1) and flattening (making something of shape (batch_size, n, m, ...) into shape (batch_size, nm...) will be the same. Squeeze might be slightly more appropriate in this case, since if you accidentally squeeze an axis without dimension 1 you'll get an error (a good thing), as opposed to having your program run with unexpected behaviour. I don't use keras much though and couldn't find a 'Squeeze' layer - just a squeeze function - and I'm not entirely sure how to integrate it.

Related

Dimension of output in Dense layer Keras

I have the sample following model
from tensorflow.keras import models
from tensorflow.keras import layers
sample_model = models.Sequential()
sample_model.add(layers.Dense(32, input_shape=(4,)))
sample_model.add(layers.Dense(16, input_shape = (44,)))
sample_model.compile(loss="binary_crossentropy",
optimizer="adam", metrics = ["accuracy"])
IP for the model:
sam_x = np.random.rand(10,4)
sam_y = np.array([0,1,1,0,1,0,0,1,0,1,])
sample_model.fit(sam_x,sam_y)
The confusion is the fit should have thrown an error of shape mismatch as the expected_input_shape for the 2nd Dense Layer is given as (None,44) but the output for the 1st Dense Layer (which is the input of the 2nd Dense Layer) will be of shape (None,32). But it ran successfully.
I don't understand why there was no error. Any clarifications will be helpful
The input_shape keyword argument has an effect only on the first layer of a Sequential. The shape of the input of the other layers will be derived from their previous layer.
That behaviour is hinted in the doc of tf.keras.layers.InputShape:
When using InputLayer with Keras Sequential model, it can be skipped by moving the input_shape parameter to the first layer after the InputLayer.
And in the Sequential Model guide.
The behaviour can be confirmed by looking at the source of the Sequential.add method:
if not self._layers:
if isinstance(layer, input_layer.InputLayer):
# Case where the user passes an Input or InputLayer layer via `add`.
set_inputs = True
else:
batch_shape, dtype = training_utils.get_input_shape_and_dtype(layer)
if batch_shape:
# Instantiate an input layer.
x = input_layer.Input(
batch_shape=batch_shape, dtype=dtype, name=layer.name + '_input')
# This will build the current layer
# and create the node connecting the current layer
# to the input layer we just created.
layer(x)
set_inputs = True
If there is no layers yet in the model, then an Input will be added to the model with the shape derived from the first layer of the model. This is done only if no layer is present yet in the model.
That shape is either fully known (if input_shape has been passed to the first layer of the model) or will be fully known once the model is built (for example, with a call to model.build(input_shape)).
The thing is after checking the input shape of the model from the first layer, it won't check or deal with other declared input shape inside that same model. For example, if you write your model the following way
sample_model.add(layers.Dense(32, input_shape=(4,)))
sample_model.add(layers.Dense(16, input_shape = (44,)))
sample_model.add(layers.Dense(8, input_shape = (32,)))
The program will always check the first declared input shape layer and discard the rest. So, if you start your first layer with input_shape = (44,), you need to pass exact feature numbers to your model as input such as:
sam_x = np.random.rand(10,44)
sam_y = np.array([0,1,1,0,1,0,0,1,0,1,])
sample_model.fit(sam_x,sam_y)
Additionally, if you look at the Functional API, unlike the Sequential model, you must create and define a standalone Input layer that specifies the shape of input data. It's not learnable but simply a spec layer. It's a kind of gateway of the input data for the model. That means even if we define input_shape inside the other layers, they all will be discarded. For example:
nputs = keras.Input(shape=(4,))
dense = layers.Dense(64, input_shape=(8,)) # dicard input_shape
x = dense(inputs)
x = layers.Dense(64, input_shape=(16,))(x) # dicard input_shape
outputs = layers.Dense(10)(x)
model = keras.Model(inputs=inputs, outputs=outputs, name="mnist_model")
Here is a more complex example with Conv2D and MNIST.
encoder_input = keras.Input(shape=(28, 28, 1),)
x = layers.Conv2D(16, 3, activation="relu", input_shape=[32,32,3])(encoder_input)
x = layers.Conv2D(32, 3, activation="relu", input_shape=[64,64,3])(x)
x = layers.MaxPooling2D(3)(x)
x = layers.Conv2D(32, 3, activation="relu", input_shape=[224,321,3])(x)
x = layers.Conv2D(16, 3, activation="relu", input_shape=[420,32,3])(x)
x = layers.GlobalMaxPooling2D()(x)
out = layers.Dense(10, activation='softmax')(x)
encoder = keras.Model(encoder_input, out, name="encoder")
encoder.summary()
Model: "encoder"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_15 (InputLayer) [(None, 28, 28, 1)] 0
_________________________________________________________________
conv2d_8 (Conv2D) (None, 26, 26, 16) 160
_________________________________________________________________
conv2d_9 (Conv2D) (None, 24, 24, 32) 4640
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 8, 8, 32) 0
_________________________________________________________________
conv2d_10 (Conv2D) (None, 6, 6, 32) 9248
_________________________________________________________________
conv2d_11 (Conv2D) (None, 4, 4, 16) 4624
_________________________________________________________________
global_max_pooling2d_2 (Glob (None, 16) 0
_________________________________________________________________
dense_56 (Dense) (None, 10) 170
=================================================================
Total params: 18,842
Trainable params: 18,842
Non-trainable params: 0
def pre_process(image, label):
return (image / 256)[...,None].astype('float32'),
tf.keras.utils.to_categorical(label, num_classes=10)
(x, y), (_, _) = tf.keras.datasets.mnist.load_data('mnist')
encoder.compile(
loss = tf.keras.losses.CategoricalCrossentropy(),
metrics = tf.keras.metrics.CategoricalAccuracy(),
optimizer = tf.keras.optimizers.Adam())
encoder.fit(x, y, batch_size=256)
4s 14ms/step - loss: 1.4303 - categorical_accuracy: 0.5279
I think Keras will create (or preserves to create) an additional Input Layer - but as the second dense layer is added using model.add() it will automatically be connected to the layer before, and thus the extra input layer stays unconnected and is not part of the model.
(I agree that it would be nice of Keras to hint at unconnected layers, I sometimes created unconnected layers when using the functional API and changed the inputs. Keras doesn't remind me that I had jumped several layers, I just wondered why the summary() was so short...)

CRF layer implementation with BiLSTM-CRF in TensorFlow 1.15

I implemented a bidirectional Long Short-Term Memrory Neural Network with a Conditional Random Field Layer (BiLSTM-CRF) using keras & keras_contrib (the latter for implementing the CRF, which is not part of native keras functionality. The task was Named Entity Recognition classification into one of 6 classes. The input to the network is a sequence of 300-dimensional pretrained GloVe word embeddings. This is my model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 648) 0
_________________________________________________________________
embedding_1 (Embedding) (None, 648, 300) 1500000
_________________________________________________________________
bidirectional_1 (Bidirection (None, 648, 10000) 3204000
_________________________________________________________________
crf_1 (CRF) (None, 648, 6) 6054
=================================================================
Now I want to implement the same model in TensorFlow 1.15. Since the keras_contrib CRF module only works in keras but not TensorFlow, I used the CRF implementation built for TensorFlow 1.X from this repo. The repo includes two nice example implementations of the CRF here, but each produces a different error when trained on my data.
Implementation 1
from tensorflow.keras.layers import Bidirectional, Embedding, LSTM, TimeDistributed
from tensorflow.keras.models import Sequential
from tf_crf_layer.layer import CRF
from tf_crf_layer.loss import crf_loss
from tf_crf_layer.metrics import crf_accuracy
MAX_WORDS = 50000
EMBEDDING_LENGTH = 300
MAX_SEQUENCE_LENGTH = 648
HIDDEN_SIZE = 512
model = Sequential()
model.add(Embedding(MAX_WORDS, EMBEDDING_LENGTH, input_length=MAX_SEQUENCE_LENGTH, mask_zero=True, weights=[embedding_matrix], trainable=False))
model.add(Bidirectional(LSTM(HIDDEN_SIZE, return_sequences=True)))
model.add(CRF(len(labels)))
model.compile('adam', loss=crf_loss, metrics=[crf_accuracy])
This is the error I get when I try to compile the model:
File "/.../tf_crf_layer/metrics/crf_accuracy.py", line 48, in crf_accuracy
crf, idx = y_pred._keras_history[:2]
AttributeError: 'Tensor' object has no attribute '_keras_history'
The error arises when computing crf_accuracy from the repo mentioned above.
def crf_accuracy(y_true, y_pred):
"""
Get default accuracy based on CRF `test_mode`.
"""
import pdb; pdb.set_trace()
crf, idx = y_pred._keras_history[:2]
if crf.test_mode == 'viterbi':
return crf_viterbi_accuracy(y_true, y_pred)
else:
return crf_marginal_accuracy(y_true, y_pred)
Apparently this kind of error happens when a tensor object is not the output of a keras layer, as per this thread. Why does this error surface here?
Implementation 2
from tf_crf_layer.layer import CRF
from tf_crf_layer.loss import crf_loss, ConditionalRandomFieldLoss
from tf_crf_layer.metrics import crf_accuracy
from tf_crf_layer.metrics.sequence_span_accuracy import SequenceSpanAccuracy
model = Sequential()
model.add(Embedding(MAX_WORDS, EMBEDDING_LENGTH, input_length=MAX_SEQUENCE_LENGTH, mask_zero=True, weights=[embedding_matrix], trainable=False))
model.add(Bidirectional(LSTM(HIDDEN_SIZE, return_sequences=True)))
model.add(CRF(len(labels), name="crf_layer"))
model.summary()
crf_loss_instance = ConditionalRandomFieldLoss()
model.compile(loss={"crf_layer": crf_loss_instance}, optimizer='adam', metrics=[SequenceSpanAccuracy()])
Here the model compiles, but as soon as the first epoch of training begins, this error surfaces:
InvalidArgumentError: Expected begin and size arguments to be 1-D tensors of size 3, but got shapes [2] and [2] instead.
[[{{node loss_4/crf_layer_loss/Slice_1}}]]
I'm training the model using mini batches, could that explain the error? I also noticed that my model summary for the CRF layer lacks a dimension (compare the CRF layer specification in the summary above and in the summary below), although the number of parameters for that layer is the same as above. Why is causing this mismatch and how can it be fixed?
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_5 (Embedding) (None, 648, 300) 1500000
_________________________________________________________________
bidirectional_5 (Bidirection (None, 648, 1000) 3204000
_________________________________________________________________
crf_layer (CRF) (None, 648) 6054
=================================================================

Transfer learning for video classification

How can I use pre-trained models to train video classification model? My dataset shape is (4000,10,150,150,1), I try to classify human action recognition with Conv2D TimeDistributed.
I can train without transfer learning but I get a poor accuracy.
What I have tried:
from keras.applications import VGG16
conv_base = VGG16(weights='imagenet',
include_top=False,
input_shape=(150, 150, 3))
model = models.Sequential()
model.add(conv_base)
model.add(TimeDistributed(Conv2D(96, (3, 3), padding='same',
input_shape=x_train.shape[1:])))
model.add(TimeDistributed(Activation('relu')))
model.add(TimeDistributed(Conv2D(128, (3, 3))))
model.add(TimeDistributed(Activation('relu')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Dropout(0.35)))
.
.
.
.
But I got ValueError: strides should be of length 1, 1 or 3 but was 2
Someone has any idea?
I'm assuming you have 10 frames for each video. It's a simple model which uses VGG16 features (GloabAveragePooling) for each frame, and LSTM to classify the frame sequences.
You can experiment by adding a few more layers, changing hyperparameters.
N.B: There are many inconsistencies in your model including passing 5-d data to VGG16 directly which expects 4-d data.
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam
import tensorflow as tf
import numpy as np
from tensorflow.keras.applications import VGG16
conv_base = VGG16(weights='imagenet',
include_top=False,
input_shape=(150, 150, 3))
IMG_SIZE=(150,150,3)
num_class = 3
def create_base():
conv_base = VGG16(weights='imagenet',
include_top=False,
input_shape=(150, 150, 3))
x = GlobalAveragePooling2D()(conv_base.output)
base_model = Model(conv_base.input, x)
return base_model
conv_base = create_base()
ip = Input(shape=(10,150,150,3))
t_conv = TimeDistributed(conv_base)(ip) # vgg16 feature extractor
t_lstm = LSTM(10, return_sequences=False)(t_conv)
f_softmax = Dense(num_class, activation='softmax')(t_lstm)
model = Model(ip, f_softmax)
model.summary()
Model: "model_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_32 (InputLayer) [(None, 10, 150, 150, 3)] 0
_________________________________________________________________
time_distributed_4 (TimeDist (None, 10, 512) 14714688
_________________________________________________________________
lstm_1 (LSTM) (None, 10) 20920
_________________________________________________________________
dense (Dense) (None, 3) 33
=================================================================
Total params: 14,735,641
Trainable params: 14,735,641
Non-trainable params: 0
________________________

Adapt Text Classifier Neural Net To Accept Multiple Categories

I'm trying to adjust the text classifier neural net in this Keras/Tensorflow tutorial to output multiple (more than 2 categories). I think I can change the output layer to use a 'softmax' activation but I'm not sure how to adjust the input layer.
Tutorial Link:
https://www.tensorflow.org/tutorials/keras/basic_text_classification
The tutorial is using movie review data and the only two categories are positive or negative so the model only uses an output layer with activation set to 'sigmoid'.
I have 16 categories represented using one-hot encoding.
Tutorial Example:
vocab_size = 10000
model = keras.Sequential()
model.add(keras.layers.Embedding(vocab_size, 16))
model.add(keras.layers.GlobalAveragePooling1D())
model.add(keras.layers.Dense(16, activation=tf.nn.relu))
model.add(keras.layers.Dense(1, activation=tf.nn.sigmoid))
My Attempt:
model.add(keras.layers.Embedding(10000, 16))
model.add(keras.layers.GlobalAveragePooling1D())
model.add(keras.layers.Dense(16, activation=tf.nn.relu))
model.add(keras.layers.Dense(16, activation='softmax'))
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['acc'])
history = model.fit(x_train[:5000],
y_train[:5000],
epochs=1,
batch_size=256,
validation_data=(x_train[5000:], y_train[5000:]),
verbose=1)
Error:
ValueError: A target array with shape (5000, 1) was passed for an output of shape (None, 16) while using as loss binary_crossentropy. This loss expects targets to have the same shape as the output.
Data Shapes:
x_train[:5000] (5000, 2000)
y_train[:5000] (5000,16)
x_train[5000:] (1934, 2000)
y_train[5000:] (1934,16)
Model Summary:
Model: "sequential_16"
Layer (type) Output Shape Param #
embedding_15 (Embedding) (None, None, 16) 160000
global_average_pooling1d_15 (None, 16) 0
dense_30 (Dense) (None, 16) 272
dense_31 (Dense) (None, 16) 272
Total params: 160,544
Trainable params: 160,544
Non-trainable params: 0
Binary_crossentropy is for binary classification, but what you're looking for is categorical_crossentropy. Binary_crossentropy expects your y matrix to be (n_samples x 1), with values of 0 or 1. Categorical_crossentropy expects your y matrix to be (n_samples x n_categories), with the correct category labeled as 1 and the other categories labeled as 0. It sounds like the way you did one-hot-encoding would be correct, so you probably just need to change your loss function.
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

Keras, calculating gradients of the loss wrt the input on an LSTM

I am quite new to machine learning and I was messing around with adversarial-examples.
I am trying to fool a binary character-level LSTM text classifier.
Thus I need the gradient of the loss w.r.t. the input.
The gradients function although returns None.
I already tried to get the gradients, like in this post
or this post, but the gradients function still returns None.
EDIT: I wanted to do something similar than in this git repo.
I was thinking that the problem might be that it was an LSTM classifier.
I am not sure at this point. But I think that it should be possible to get these gradients even from an LSTM classifier right?
Here is my code:
import numpy as np
from keras.preprocessing import sequence
from keras.models import load_model
import data
import pickle
import keras.backend as K
def adversary():
model, valid_chars = loadModel()
model.summary()
#load data
X, y, maxlen, _ , max_features, indata = prepare_data(valid_chars)
target = y[0]
# Get the loss and gradient of the loss wrt the inputs
target = np.asarray(target).astype('float32').reshape((-1,1))
loss = K.binary_crossentropy(target, model.output)
print(target)
print(model.output)
print(model.input)
print(loss)
grads = K.gradients(loss, model.input)
#f = K.function([model.input], [loss, grads])
#print(f(X[1:2]))
print(model.predict(X[0:1]))
print(grads)
The output looks like this:
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 74, 128) 5120
_________________________________________________________________
lstm_1 (LSTM) (None, 128) 131584
_________________________________________________________________
dropout_1 (Dropout) (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 1) 129
_________________________________________________________________
activation_1 (Activation) (None, 1) 0
=================================================================
Total params: 136,833
Trainable params: 136,833
Non-trainable params: 0
_________________________________________________________________
Maxlen: 74
Data preparing finished
[[0.]]
Tensor("activation_1/Sigmoid:0", shape=(?, 1), dtype=float32)
Tensor("embedding_1_input:0", shape=(?, 74), dtype=float32)
Tensor("logistic_loss_1:0", shape=(?, 1), dtype=float32)
[[1.1397913e-13]]
[None]
I was hoping to get the gradients of the loss w.r.t. the input data to see which of the characters has the most impact on the output.
Thus I could fool the classifier by modifying the respective characters.
Is this possible? If yes, what is wrong with my approach?
Thank you for your time.
Gradients can only be computed for "trainable" tensors, so you might want to wrap your input into tf.Variable().
As soon as you want to work with gradient, I would suggest doing it using tensorflow, which nicely integrates with Keras. Below is my example of doing it, note that it works in eager execution mode (default in tensorflow 2.0).
def train_actor(self, sars):
obs1, actions, rewards, obs2 = sars
with tf.GradientTape() as tape:
would_do_actions = self.compute_actions(obs1)
score = tf.reduce_mean(self.critic(observations=obs1, actions=would_do_actions))
loss = - score
grads = tape.gradient(loss, self.actor.trainable_weights)
self.optimizer.apply_gradients(zip(grads, self.actor.trainable_weights))
I just found this thread.
The gradients function returns None because the embedding layer in not differentiable.
The embedding layer is implemented as K.gather which is not differentiable, so there is no gradient.