Tensorflow input shape incompatible with layer - tensorflow

I'm trying to build a Sequential model with tensorflow.
import tensorflow as tf
import keras
from tensorflow.keras import layers
from keras import optimizers
import numpy as np
model = keras.Sequential (name="model")
model.add(keras.Input(shape=(786,)))
model.add(layers.Dense(2048, activation="relu", name="layer1"))
model.add(layers.Dense(786, activation="relu", name="layer2"))
model.add(layers.Dense(786, activation="relu", name="layer3"))
output = model.add(layers.Dense(786, activation="relu", name="output"))
model.summary()
model.compile(
optimizer=tf.optimizers.Adam(), # Optimizer
loss=keras.losses.CategoricalCrossentropy(),
metrics=[keras.metrics.SparseCategoricalAccuracy()],
)
history = model.fit(
x_train,
y_train,
batch_size=1,
epochs=5,
)
The input shape is a vector with length of 768 (so the input shape is (768,) right?), representing a chess board:
def get_dataset():
container = np.load('/content/drive/MyDrive/test_data_vector.npz')
b, v = container['arr_0'], container['arr_1']
v = np.asarray(v / abs(v).max() / 2 + 0.5, dtype=np.float32) # normalization (0 - 1)
return b, v
xtrain, ytrain = get_dataset()
print(xtrain.shape)
print(ytrain.shape)
>> (37, 786) #there are 37 samples
>> (37, 786)
But I always get the error:
ValueError: Input 0 of layer model is incompatible with the layer: expected axis -1 of input shape to have value 786 but received input with shape (1, 1, 768)
I tried with np.expand_dims(), which ended in the same Error.

The error is just a typo, as the user mentioned the issue is resolved by changing the output shape from 786 to 768 and the issue is resolved.
One suggestion based on the model structure.
The number of units are not related to your input shape, you don't have to match that number.
The number of units like 2048 and 786 in dense layer is too large and this may not help the model to learn better.
Try with smaller numbers like 32,64 etc, you can refer some of the examples in the tensorflow document.

Related

Keras incompatible shapes NN

So I have this neural network and I am feeding examples "X" and labels "Y" whose shapes are:
X.shape = (10,10,2)
Y.shape = (10,10,2)
The code for the model looks like:
import tensorflow as tf
from convert import process
import numpy as np
X, Y, rate = process('songs/song1.wav')
X = np.array(X[:10])
Y = np.array(Y[:10])
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128))
model.add(tf.keras.layers.Dense(128))
model.add(tf.keras.layers.Dense(20))
model.compile(optimizer='adam', loss='categorical_crossentropy')
model.fit(X, Y, epochs=2)
Now for some reason once I run this i get the error:
ValueError: Shapes (None, 10, 2) and (None, 20) are incompatible
I am confused because I fed it data where each example of both "X" and "Y" have shapes (10, 2). So why is it saying that I passed it (None, 10, 2) and (None, 20)
Your last layer uses linear activation whereas you choose categorical_crossentropy loss. Set either
model.add(tf.keras.layers.Dense(20, activations='softmax'))
....loss='categorical_crossentropy')
or,
model.add(tf.keras.layers.Dense(20))
....loss='mse')
Also check your data shape, especially the label (y).

Implementing Variational Auto Encoder using Tensoroflow_Probability NOT on MNIST data set

I know there are many questions related to Variational Auto Encoders. However, this question in two aspects differs from the existing ones: 1) it is implemented using Tensforflow V2 and Tensorflow_probability; 2) It does not use MNIST or any other image data set.
As about the problem itself:
I am trying to implement VAE using Tensorflow_probability and Keras. and I want to train and evaluate it on some synthetic data sets --as part of my research. I provided the code below.
Although the implementation is done and during the training, the loss value decreases but once I want to evaluate the trained model on my test set I face different errors.
I am somehow confident that the issue is related to input/output shape but unfortunately I did not manage the solve it.
Here is the code:
import numpy as np
import tensorflow as tf
import tensorflow.keras as tfk
import tensorflow_probability as tfp
from tensorflow.keras import layers as tfkl
from sklearn.datasets import make_classification
from tensorflow_probability import layers as tfpl
from sklearn.model_selection import train_test_split
tfd = tfp.distributions
n_epochs = 5
n_features = 2
latent_dim = 1
n_units = 4
learning_rate = 1e-3
n_samples = 400
batch_size = 32
# Generate synthetic data / load data sets
x_in, y_in = make_classification(n_samples=n_samples, n_features=n_features, n_informative=2, n_redundant=0,
n_repeated=0, n_classes=2, n_clusters_per_class=2, weights=[0.5, 0.5],
flip_y=0.01, class_sep=1.0, hypercube=True,
shift=0.0, scale=1.0, shuffle=False, random_state=42)
x_in = x_in.astype('float32')
y_in = y_in.astype('float32') # .reshape(-1, 1)
x_train, x_test, y_train, y_test = train_test_split(x_in, y_in, test_size=0.4, random_state=42, shuffle=True)
x_test, x_val, y_test, y_val = train_test_split(x_test, y_test, test_size=0.5, random_state=42, shuffle=True)
print("shapes:", x_train.shape, y_train.shape, x_test.shape, y_test.shape, x_val.shape, y_val.shape)
prior = tfd.Independent(tfd.Normal(loc=[tf.zeros(latent_dim)], scale=1.), reinterpreted_batch_ndims=1)
train_dataset = tf.data.Dataset.from_tensor_slices(x_train).batch(batch_size)
valid_dataset = tf.data.Dataset.from_tensor_slices(x_val).batch(batch_size)
test_dataset = tf.data.Dataset.from_tensor_slices(x_test).batch(batch_size)
encoder = tf.keras.Sequential([
tfkl.InputLayer(input_shape=[n_features, ], name='enc_input'),
tfkl.Lambda(lambda x: tf.cast(x, tf.float32)), # - 0.5
tfkl.Dense(n_units, activation='relu', name='enc_dense1'),
tfkl.Dense(int(n_units / 2), activation='relu', name='enc_dense2'),
tfkl.Dense(tfpl.MultivariateNormalTriL.params_size(latent_dim),
activation=None, name='mvn_triL1'),
tfpl.MultivariateNormalTriL(
# weight >> num_train_samples or some thing except 1 to convert VAE to beta-VAE
latent_dim, activity_regularizer=tfpl.KLDivergenceRegularizer(prior, weight=1.), name='bottleneck'),
])
decoder = tf.keras.Sequential([
tfkl.InputLayer(input_shape=latent_dim, name='dec_input'),
# tfkl.Dense(n_units, activation='relu', name='dec_dense1'),
# tfkl.Dense(int(n_units * 2), activation='relu', name='dec_dense2'),
tfpl.IndependentBernoulli([n_features], tfd.Bernoulli.logits, name='dec_output'),
])
vae = tfk.Model(inputs=encoder.inputs, outputs=decoder(encoder.outputs), name='VAE')
print("enoder:", encoder)
print(" ")
print("encoder.inputs:", encoder.inputs)
print(" ")
print(" encoder.outputs:", encoder.outputs)
print(" ")
print("decoder:", decoder)
print(" ")
print("decoder:", decoder.inputs)
print(" ")
print("decoder.outputs:", decoder.outputs)
print(" ")
# negative log likelihood i.e the E_{S(eps)} [p(x|z)];
# because the KL term was added in the last layer of the encoder, i.e., via activity_regularizer.
# this loss function takes two arguments, namely the original data points x, and the output of the model,
# which we call it rv_x (because it is a random variable)
negloglik = lambda x, rv_x: -rv_x.log_prob(x)
vae.compile(optimizer=tf.optimizers.Adam(learning_rate=learning_rate),
loss=negloglik,)
vae.summary()
history = vae.fit(train_dataset, epochs=n_epochs, validation_data=valid_dataset,)
print("x.shape:", x_test.shape)
x_hat = vae(x_test)
print("original:")
print(x_test)
print(" ")
print("Decoded Random Samples:")
print(x_hat.sample())
print(" ")
print("Decoded Means:")
print(x_hat.mean())
The Questions:
With the above code I receive the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 80 values, but the requested shape has 160 [Op:Reshape]
As far I know we can add as many layers as I want in the decoder model before its output layer --as it is done a convolutional VAEs, am I right?
If I uncomment the following two lines of code in decoder:
# tfkl.Dense(n_units, activation='relu', name='dec_dense1'),
# tfkl.Dense(int(n_units * 2), activation='relu', name='dec_dense2'),
I see the following warnings and the upcoming error:
WARNING:tensorflow:Gradients do not exist for variables ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] when minimizing the loss.
And the error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 640 values, but the requested shape has 160 [Op:Reshape]
Now the question is why the decoder layers are not used during the training as it is mentioned in the warning.
PS, I also tried to pass the x_train, x_valid, x_test directly during the training and evaluation process but it does not help.
Any helps would be indeed appreciated.

Get gradients with respect to inputs in Keras ANN model

bce = tf.keras.losses.BinaryCrossentropy()
ll=bce(y_test[0], model.predict(X_test[0].reshape(1,-1)))
print(ll)
<tf.Tensor: shape=(), dtype=float32, numpy=0.04165391>
print(model.input)
<tf.Tensor 'dense_1_input:0' shape=(None, 195) dtype=float32>
model.output
<tf.Tensor 'dense_3/Sigmoid:0' shape=(None, 1) dtype=float32>
grads=K.gradients(ll, model.input)[0]
print(grads)
None
So here i have Trained a 2 hidden layer neural network, input has 195 features and output is 1 size. I wanted to feed the neural network with validation instances named as X_test one by one with their correct labels in y_test and for each instance calculate the gradients of the output with respect to input, the grads upon printing gives me a None. Your help is appreciated.
One can do this using tf.GradientTape. I wrote the following code to learn a sin wave, and get its derivative in the spirit of this question. I think, it should be possible to extend the following codes in order to compute partial derivatives.
Importing the needed libraries:
import numpy as np
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import losses
import tensorflow as tf
Create the data:
x = np.linspace(0, 6*np.pi, 2000)
y = np.sin(x)
Defining a Keras NN:
def model_gen(Input_shape):
X_input = Input(shape=Input_shape)
X = Dense(units=64, activation='sigmoid')(X_input)
X = Dense(units=64, activation='sigmoid')(X)
X = Dense(units=1)(X)
model = Model(inputs=X_input, outputs=X)
return model
Training the model:
model = model_gen(Input_shape=(1,))
opt = Adam(lr=0.01, beta_1=0.9, beta_2=0.999, decay=0.001)
model.compile(loss=losses.mean_squared_error, optimizer=opt)
model.fit(x,y, epochs=200)
To obtain the gradient of the network w.r.t. the input:
x = list(x)
x = tf.constant(x)
with tf.GradientTape() as t:
t.watch(x)
y = model(x)
dy_dx = t.gradient(y, x)
dy_dx.numpy()
One can further visualise dy_dx to make sure of how smooth the derivative is. Finally, note that one get a smoother derivative when one uses a smooth activation (e.g. sigmoid) instead of Relu as noted here.

Keras : Error when checking target: expected dense_1 to have shape (10,) but got array with shape (1,) - MNIST

I'm trying to classify mnist's handwritten digits but I keep getting the same error from keras.
import tensorflow as tf
import numpy as np
import tensorflow.keras as keras
from tensorflow.keras.datasets import mnist
model = keras.Sequential()
model.add(keras.layers.Dense(15, input_shape=(784,), activation='relu'))
model.add(keras.layers.Dense(10, activation='softmax'))
(data, label), (val_data, val_label) = mnist.load_data()
data = data.reshape(data.shape[0],data.shape[1]*data.shape[2])
val_data = val_data.reshape(val_data.shape[0],val_data.shape[1]*val_data.shape[2])
model.compile(optimizer=tf.train.GradientDescentOptimizer(0.01),
loss='mse',
metrics=['acc'])
model.fit(data,label,batch_size=30,epochs=10,validation_data=(val_data,val_label))
The softmax layer expects a tensor of size (None, 10). So, you have to encode your label data using one hot encoder. It can be done in the following way:
label = keras.utils.to_categorical(label, num_classes = 10)
val_label = keras.utils.to_categorical(val_label, num_classes = 10)
If you are not familiar with one hot encoding, you can refer it here: One hot encoding in python

ValueError: Error when checking target: expected activation_6 to have shape(None,2) but got array with shape (5760,1)

I am trying to adapt Python code for a Convolutional Neural Network (in Keras) with 8 classes to work on 2 classes. My problem is that I get the following error message:
ValueError: Error when checking target: expected activation_6 to have
shape(None,2) but got array with shape (5760,1).
My Model is as follows (without the indentation issues):
class MiniVGGNet:
#staticmethod
def build(width, height, depth, classes):
# initialize the model along with the input shape to be
# "channels last" and the channels dimension itself
model = Sequential()
inputShape = (height, width, depth)
chanDim = -1
# if we are using "channels first", update the input shape
# and channels dimension
if K.image_data_format() == "channels_first":
inputShape = (depth, height, width)
chanDim = 1
# first CONV => RELU => CONV => RELU => POOL layer set
model.add(Conv2D(32, (3, 3), padding="same",
input_shape=inputShape))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(32, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# second CONV => RELU => CONV => RELU => POOL layer set
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# first (and only) set of FC => RELU layers
model.add(Flatten())
model.add(Dense(512))
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# softmax classifier
model.add(Dense(classes))
model.add(Activation("softmax"))
# return the constructed network architecture
return model
Where classes = 2, and inputShape=(32,32,3).
I know that my error has something to do with my classes/use of binary_crossentropy and occurs in the model.fit line below, but haven't been able to figure out why it is problematic, or how to fix it.
By changing model.add(Dense(classes)) above to model.add(Dense(classes-1)) I can get the model to train, but then my labels size and target_names are mismatched, and I have only one category which everything is categorized as.
# import the necessary packages
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from pyimagesearch.nn.conv import MiniVGGNet
from pyimagesearch.preprocessing import ImageToArrayPreprocessor
from pyimagesearch.preprocessing import SimplePreprocessor
from pyimagesearch.datasets import SimpleDatasetLoader
from keras.optimizers import SGD
#from keras.datasets import cifar10
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
help="path to input dataset")
ap.add_argument("-o", "--output", required=True,
help="path to the output loss/accuracy plot")
args = vars(ap.parse_args())
# grab the list of images that we'll be describing
print("[INFO] loading images...")
imagePaths = list(paths.list_images(args["dataset"]))
# initialize the image preprocessors
sp = SimplePreprocessor(32, 32)
iap = ImageToArrayPreprocessor()
# load the dataset from disk then scale the raw pixel intensities
# to the range [0, 1]
sdl = SimpleDatasetLoader(preprocessors=[sp, iap])
(data, labels) = sdl.load(imagePaths, verbose=500)
data = data.astype("float") / 255.0
# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
(trainX, testX, trainY, testY) = train_test_split(data, labels,
test_size=0.25, random_state=42)
# convert the labels from integers to vectors
trainY = LabelBinarizer().fit_transform(trainY)
testY = LabelBinarizer().fit_transform(testY)
# initialize the label names for the items dataset
labelNames = ["mint", "used"]
# initialize the optimizer and model
print("[INFO] compiling model...")
opt = SGD(lr=0.01, decay=0.01 / 10, momentum=0.9, nesterov=True)
model = MiniVGGNet.build(width=32, height=32, depth=3, classes=2)
model.compile(loss="binary_crossentropy", optimizer=opt,
metrics=["accuracy"])
# train the network
print("[INFO] training network...")
H = model.fit(trainX, trainY, validation_data=(testX, testY),
batch_size=64, epochs=10, verbose=1)
print ("Made it past training")
# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=64)
print(classification_report(testY.argmax(axis=1),
predictions.argmax(axis=1), target_names=labelNames))
# plot the training loss and accuracy
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, 10), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, 10), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, 10), H.history["acc"], label="train_acc")
plt.plot(np.arange(0, 10), H.history["val_acc"], label="val_acc")
plt.title("Training Loss and Accuracy on items dataset")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend()
plt.savefig(args["output"])
I have looked at these questions already, but cannot workout how to get around this problem based on the responses.
Stackoverflow Question 1
Stackoverflow Question 2
Stackoverflow Question 3
Any advice or help would be much appreciated, as I've spent the last couple of days on this.
Matt's comment was absolutely correct in that the problem lay with using LabelBinarizer and this hint led me to a solution that did not require me to give up using softmax, or change the last layer to have classes = 1. For posterity and for others, here's the section of code that I altered and how I was able to avoid LabelBinarizer:
from keras.utils import np_utils
from sklearn.preprocessing import LabelEncoder
# load the dataset from disk then scale the raw pixel intensities
# to the range [0,1]
sp = SimplePreprocessor (32, 32)
iap = ImageToArrayPreprocessor()
# encode the labels, converting them from strings to integers
le=LabelEncoder()
labels = le.fit_transform(labels)
data = data.astype("float") / 255.0
labels = np_utils.to_categorical(labels,2)
# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
....
I believe the problem lies in the use of LabelBinarizer.
From this example:
>>> lb = preprocessing.LabelBinarizer()
>>> lb.fit_transform(['yes', 'no', 'no', 'yes'])
array([[1],
[0],
[0],
[1]])
I gather that the output of your transformation has the same format, i. e. a single 1 or 0 encoding "is new" or "is used".
If your problem only calls for classification among these two classes, that format is preferable because it contains all the information and uses less space than the alternative, i. e. [1,0], [0,1], [0,1], [1,0].
Therefore, using classes = 1 would be correct, and the output should be a float indicating the network's confidence in a sample being in the first class. Since these values have to sum to one, the probability of it being in the second class could easily be inferred by subtracting from 1.
You would need to replace softmax with any other activation, because softmax on a single value always returns 1. I'm not completely sure about the behaviour of binary_crossentropy with a single-valued result, and you may want to try mean_squared_error as the loss.
If you are looking to expand your model to cover more than two classes, you would want to convert your target vector to a One-hot encoding. I believe inverse_transform from LabelBinarizer would do this, although that would seem to be quite a roundabout way to get there. I see that sklearn also has OneHotEncoder which may the more appropriate replacement.
NB: You can specify the activation function for any layer more easily with, for example:
Dense(36, activation='relu')
This may be helpful in keeping your code to a manageable size.